In [26]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
import os

# Load and split PDF
loader = PyPDFLoader('rag_vs_fine_tuning.pdf')
data = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
docs = splitter.split_documents(data)

# Embed and persist
embedding_function = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(docs, embedding=embedding_function, persist_directory=os.getcwd())

# ✅ Define retriever
#retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10})
retriever = vectorstore.as_retriever(
    search_type="mmr",  # <--- not "similarity"
    search_kwargs={"k": 10}
)

In [22]:
query = "What is the difference between retrieval-augmented generation and fine-tuning?"
# Use invoke instead of deprecated method
relevant_docs = retriever.invoke(query)

# Print contents
#for i, doc in enumerate(relevant_docs):
#    print(f"\n--- Document {i+1} ---\n")
#    print(doc.page_content)
for i, doc in enumerate(relevant_docs):
    print(f"\n--- Retrieved Chunk {i} ---\n")
    print(doc.page_content)


--- Retrieved Chunk 0 ---

RAG vs. ﬁne-tuning 
Retrieval augmented generation (RAG) and ﬁne-tuning are two methods enterprises can 
use to get more value out of large language models (LLMs). Both work by tailoring the LLM 
to the speciﬁc use cases, but the methodologies behind them diƯer signiﬁcantly.

--- Retrieved Chunk 1 ---

RAG vs. ﬁne-tuning 
Retrieval augmented generation (RAG) and ﬁne-tuning are two methods enterprises can 
use to get more value out of large language models (LLMs). Both work by tailoring the LLM 
to the speciﬁc use cases, but the methodologies behind them diƯer signiﬁcantly.

--- Retrieved Chunk 2 ---

RAG vs. ﬁne-tuning 
Retrieval augmented generation (RAG) and ﬁne-tuning are two methods enterprises can 
use to get more value out of large language models (LLMs). Both work by tailoring the LLM 
to the speciﬁc use cases, but the methodologies behind them diƯer signiﬁcantly.

--- Retrieved Chunk 3 ---

Watch the latest podcast episodes  
What is retrieval augmen

In [31]:
query = "How does the RAG model generate answers? Give stage 4 of the enumeration"
# Use invoke instead of deprecated method
relevant_docs = retriever.invoke(query)

# Print contents
for i, doc in enumerate(relevant_docs):
    print(f"\n--- Retrieved Chunk {i} ---\n")
    print(doc.page_content)


--- Retrieved Chunk 0 ---

relevant responses. 
RAG models generate answers via a four-stage process: 
1. Query: A user submits a query, which initializes the RAG system. 
2. Information retrieval: Complex algorithms comb the organization’s knowledge 
bases in search of relevant information. 
3. Integration: The retrieved data is combined with the user’s query and given to the 
RAG model to answer. Up to this point, the LLM has not processed the query.

--- Retrieved Chunk 1 ---

relevant responses. 
RAG models generate answers via a four-stage process: 
1. Query: A user submits a query, which initializes the RAG system. 
2. Information retrieval: Complex algorithms comb the organization’s knowledge 
bases in search of relevant information.

--- Retrieved Chunk 2 ---

relevant responses. 
RAG models generate answers via a four-stage process: 
1. Query: A user submits a query, which initializes the RAG system. 
2. Information retrieval: Complex algorithms comb the organization’s knowle

In [25]:
for i, doc in enumerate(docs):
    if "Response" in doc.page_content:
        print(f"\n--- Chunk {i} ---\n")
        print(doc.page_content)


--- Chunk 4 ---

4. Response: Blending the retrieved data with its own training and stored knowledge, 
the LLM generates a contextually accurate response. 
When searching through internal documents, RAG systems use semantic search. Vector 
databases organize data by similarity, thus enabling searches by meaning, rather than by 
keyword. Semantic search techniques enable RAG algorithms to reach past keywords to 
the intent of a query and return the most relevant data. 
RAG systems require extensive data architecture construction and maintenance. Data 
engineers must build the data pipelines needed to connect their organization’s data 
lakehouses with the LLM. 
To conceptualize RAG, imagine a gen AI model as an amateur home cook. They know the 
basics of cooking, but lack the expert knowledge—an organization’s proprietary 
database—of a chef trained in a particular cuisine. RAG is like giving the home cook a 
cookbook for that cuisine. By combining their general knowledge of cooking wit

In [34]:
query = "Why are RAG and fine-tuning important?"
# Use invoke instead of deprecated method
relevant_docs = retriever.invoke(query)

# Print contents
for i, doc in enumerate(relevant_docs):
    print(f"\n--- Retrieved Chunk {i} ---\n")
    print(doc.page_content)


--- Retrieved Chunk 0 ---

workﬂows and stay ahead of competitors, they often struggle with getting their chatbots 
and other models to reliably generate accurate answers. 
What’s the diƯerence between RAG and ﬁne-tuning? 
The diƯerence between RAG and ﬁne-tuning is that RAG augments a natural language

--- Retrieved Chunk 1 ---

workﬂows and stay ahead of competitors, they often struggle with getting their chatbots 
and other models to reliably generate accurate answers. 
What’s the diƯerence between RAG and ﬁne-tuning? 
The diƯerence between RAG and ﬁne-tuning is that RAG augments a natural language

--- Retrieved Chunk 2 ---

workﬂows and stay ahead of competitors, they often struggle with getting their chatbots 
and other models to reliably generate accurate answers. 
What’s the diƯerence between RAG and ﬁne-tuning? 
The diƯerence between RAG and ﬁne-tuning is that RAG augments a natural language

--- Retrieved Chunk 3 ---

to it. RAG models can return more accurate answers with t