# **🔷🔷Retrieval-Augmented Generation (RAG) with LangChain🔷🔷**

## **⭐01: Introduction to RAG**

### LLM Limitation: Knowledge Constraints
Large Language Models (LLMs) are limited by the data they were trained on. They cannot dynamically pull in real-time or external knowledge.

### What is Retrieval-Augmented Generation?
RAG integrates external data sources with LLMs to overcome this limitation. It retrieves relevant documents or information based on user queries and uses that as context for LLMs to generate responses.

### Standard RAG Workflow
1. **User Query Input**
2. **Retriever fetches relevant documents** from vector store
3. **Context + Query is passed to the LLM**
4. **LLM generates answer** using retrieved context

### Preparing Data for Retrieval
To use RAG effectively, the documents must be ingested, split into manageable chunks, embedded, and stored in a vector database.

![img_2](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0402.jpeg)

## **⭐02: Document Loaders**

LangChain provides loaders for various file formats.

```python
from langchain_community.document_loaders import (
    TextLoader,
    CSVLoader,
    JSONLoader,
    DirectoryLoader,
    PyPDFLoader,
    PDFPlumberLoader,
    PyMuPDFLoader,
    PDFMinerLoader,
    WebBaseLoader,
    UnstructuredURLLoader,
    RecursiveURLLoader,
    SitemapLoader,
    S3DirectoryLoader,
    AzureBlobStorageLoader,
    GoogleDriveLoader,
    ArxivLoader,
    YoutubeAudioLoader,
    NotionDirectoryLoader
)

```

In [None]:
from langchain_community.document_loaders import CSVLoader

path_to_csv =  r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\fifa_countries_audience.csv"
# Load the CSV file using the CSVLoader

csv_loader = CSVLoader(file_path= path_to_csv)
documents = csv_loader.load()

print("Content: ", documents[0].page_content, "\n")
print("Metadata:", documents[0].metadata)

Content:  country: united states
confederation: concacaf
population_share: 4.5
tv_audience_share: 4.3
gdp_weighted_share: 11.3 

Metadata: {'source': 'E:\\01_Github_Repo\\GenAI-with-Langchain-and-Huggingface\\_Developing_LLMs_Applications_with_LangChain\\_data\\fifa_countries_audience.csv', 'row': 0}


In [35]:
from langchain_community.document_loaders import PyPDFLoader

path_to_pdf =  r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\RAG.pdf"

pdf_loader = PyPDFLoader(file_path= path_to_pdf)
documents = pdf_loader.load()

print("Content: ", documents[0].page_content, "\n")
print("Metadata:", documents[0].metadata)

Content:  Retrieval Argument Generation: Enhancing Language Model 
 Capabilities Through External Knowledge Integration 
 1. Introduction to Retrieval Argument Generation (RAG) 
 Retrieval-Augmented Generation (RAG) represents a paradigm shift in how large 
 language models (LLMs) operate, moving beyond the constraints of their pre-trained 
 knowledge by incorporating information from external, authoritative knowledge bases 
 during the response generation process.  1  This  technique fundamentally optimizes the 
 output of LLMs, ensuring that the generated content is not solely reliant on the 
 model's internal parameters but is also grounded in a broader, often more current and 
 specific, set of information.  1  In the realm of natural  language processing (NLP), RAG 
 serves as a powerful tool to enhance text generation by seamlessly integrating data 
 from diverse knowledge repositories, including databases, digital asset libraries, and 
 comprehensive document repositories.  3  T

In [None]:
from langchain_community.document_loaders import UnstructuredHTMLLoader

path_to_html = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\white_house_executive_order_nov_2023.html"

html_loader = UnstructuredHTMLLoader(file_path=path_to_html, encoding='utf-8')
documents = html_loader.load()

print("Content: ", documents[0].page_content, "\n")
print("Metadata:", documents[0].metadata)

## **⭐03: Text Splitting**


Split large documents into smaller chunks for effective embedding and retrieval.

```python 
from langchain_text_splitters import (
    CharacterTextSplitter,
    TokenTextSplitter,
    RecursiveCharacterTextSplitter,
    SentenceTransformersTextSplitter,
    SpacyTextSplitter,
    NLTKTextSplitter,
    MarkdownTextSplitter,
    HTMLTextSplitter,
    LatexTextSplitter,
    JSONTextSplitter
)

```

![img_3](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0403.jpeg)

In [41]:
from langchain_text_splitters import CharacterTextSplitter

text = """Machine learning is a fascinating field.
    It involves algorithms and models that can learn from data.
    These models can then make predictions or decisions without 
    being explicitly programmed to perform the task.
    This capability is increasingly valuable in 
    various industries, from finance to healthcare.

    There are many types of machine learning, 
    including supervised, unsupervised, and reinforcement learning.
    Each type has its own 
    strengths and applications."""

text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=100,
    chunk_overlap=10
)

chunks = text_splitter.split_text(text)
print(chunks)
print([len(chunk) for chunk in chunks])

Created a chunk of size 323, which is longer than the specified 100


['Machine learning is a fascinating field.\n    It involves algorithms and models that can learn from data.\n    These models can then make predictions or decisions without \n    being explicitly programmed to perform the task.\n    This capability is increasingly valuable in \n    various industries, from finance to healthcare.', 'There are many types of machine learning, \n    including supervised, unsupervised, and reinforcement learning.\n    Each type has its own \n    strengths and applications.']
[323, 169]


- `"\n\n"` (Double Newline) –> First, the text is split at paragraph breaks (double newlines), keeping sections intact.
- `"\n"` (Single Newline) –> If chunks are still too large, the splitter moves to sentence-level splitting.
- `" "` (Space) –> If the previous splits are insufficient, it breaks at word boundaries.
- `""` (Empty String) –> As a last resort, it splits character-by-character.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=100,
    chunk_overlap=10
    )

chunks = splitter.split_text(text)
print(chunks)
print([len(chunk) for chunk in chunks])

['Machine learning is a fascinating field.', 'It involves algorithms and models that can learn from data.', 'These models can then make predictions or decisions without', 'being explicitly programmed to perform the task.', 'This capability is increasingly valuable in', 'various industries, from finance to healthcare.', 'There are many types of machine learning,', 'including supervised, unsupervised, and reinforcement learning.\n    Each type has its own', 'strengths and applications.']
[40, 59, 59, 48, 43, 47, 41, 89, 27]


In [None]:
from langchain_community.document_loaders import PyPDFLoader

path_to_pdf =  r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\RAG.pdf"

loader = PyPDFLoader(file_path=path_to_pdf)
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=1000,
    chunk_overlap=200
    )

chunks = splitter.split_documents(documents)
print(chunks)
print([len(chunk.page_content) for chunk in chunks])

## **⭐04: Embedding and Storage**

Embedding represents chunks in vector form to enable similarity search. LangChain supports OpenAI and ChromaDB.

![img_1](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0401.jpeg)

In [50]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
from dotenv import load_dotenv

load_dotenv()

# Initialize the embedding model (Google's embedding model)
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Assume `chunks` is a list of documents (strings or LangChain Document objects)
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model
    )


## **⭐05: Building LCEL Retrieval Chain**

LangChain Expression Language (LCEL) allows declarative pipeline construction.

![img_4](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0404.jpeg)

In [None]:
# Converts the vector store into a retriever
# Uses similarity search
# Returns the top 2 most relevant documents

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2}
)

- This line creates a **retriever object** from a `vector_store`.
- This is common in **LangChain** or **vector database** workflows for **retrieval-based applications** like RAG (Retrieval-Augmented Generation).

---

- 🔹 What is `vector_store.as_retriever()`?
  - `as_retriever()` is a method that converts a vector store (e.g., `FAISS`, `Chroma`, `Pinecone`, etc.) into a **retriever object**. 
  - A retriever is used to **fetch relevant documents** based on a query vector — typically derived from user input.

---

- 🔹 `search_type="similarity"`
  - This defines the **type of search** the retriever will perform.

  - **Common `search_type` values (varies by implementation):**
    | Search Type                    | Description                                                                                              |
    | ------------------------------ | -------------------------------------------------------------------------------------------------------- |
    | `"similarity"`                 | Retrieves documents most similar to the query vector using cosine similarity or another distance metric. |
    | `"mmr"`                        | Maximal Marginal Relevance — balances similarity and diversity in retrieved results.                     |
    | `"similarity_score_threshold"` | Only returns results with a similarity score above a given threshold.                                    |
    | `"exact"` or `"filtered"`      | Returns results that exactly match a condition. (Not available in all vector stores.)                    |

    - The actual available options may depend on which vector store you’re using (e.g., `FAISS`, `Pinecone`, `Chroma`, `Weaviate`, etc.)

---

- 🔹 `search_kwargs={"k": 2}`
  - This is a dictionary of **additional parameters** passed to the search method. Here:
    -  `k` means **"return the top-k most relevant results"**
    -  So `k=2` means it will return the **2 most similar documents** based on the query.
    - Other possible `search_kwargs` (depending on the vector store):

    | Key               | Description                                                               |
    | ----------------- | ------------------------------------------------------------------------- |
    | `k`               | Number of results to return.                                              |
    | `score_threshold` | Only return documents with a similarity score above this threshold.       |
    | `fetch_k`         | Total number of vectors to consider before filtering (used with MMR).     |
    | `lambda_mult`     | Controls trade-off between similarity and diversity in MMR.               |
    | `filter`          | Apply metadata filters (e.g., only retrieve documents with `topic="AI"`). |



In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
    Use the following pieces of context to answer the question at the end.
    If you don't know the answer, say that you don't know.
    Context: {context}
    Question: {question}
""")

In [None]:
from langchain_core.runnables import RunnablePassthrough   # Passes input through without modification
from langchain_core.output_parsers import StrOutputParser  # Parses the LLM output into a string
from langchain_google_genai import ChatGoogleGenerativeAI  # Imports Gemini chat model wrapper

# Initialize Gemini Flash (chat model) with specified parameters
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",         # Use the Gemini 1.5 Flash model
    max_output_tokens=50,             # Limit output to 50 tokens
    temperature=0.3                   # Low temperature for more deterministic output
)

# Create a chain: injects context and question into prompt, sends to LLM, then parses the response
chain = (
    {"context": retriever, "question": RunnablePassthrough()}  # Prepare input dict with context and raw question
    | prompt                                                   # Format input using a prompt template
    | llm                                                      # Generate response using the Gemini model
    | StrOutputParser()                                        # Extract and return the final string output
)

# Invoke the chain on the inputs provided
print(chain.invoke({
    "context": "The first image of a black hole was captured by the Event Horizon Telescope in 2019.",
    "question": "When was the first image of a black hole captured?"
}))

# 🧩 ***Full RAG Pipeline*** (`Google` + `Chroma` + `PDF`)

Full working code for a complete Retrieval-Augmented Generation (RAG) pipeline using:

✅ PyPDFLoader to load a PDF

✅ RecursiveCharacterTextSplitter to split text into chunks

✅ GoogleGenerativeAIEmbeddings for embedding text

✅ Chroma vector store to store and retrieve chunks

✅ Gemini 1.5 Flash as the LLM

✅ A prompt + LangChain chain to handle queries


In [3]:
# -------------------------------
# 📦 Import required libraries
# -------------------------------
from dotenv import load_dotenv  # Loads environment variables from .env file

# LangChain components
from langchain_community.document_loaders import PyPDFLoader         # For loading PDF files
from langchain_text_splitters import RecursiveCharacterTextSplitter  # To split large text into manageable chunks
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI  # Embeddings & LLM from Google Gemini
from langchain_chroma import Chroma                                  # Vector store for storing and retrieving embeddings
from langchain_core.prompts import ChatPromptTemplate                # To format prompts to LLMs
from langchain_core.runnables import RunnablePassthrough             # Utility for passing inputs unchanged in chain
from langchain_core.output_parsers import StrOutputParser            # Converts LLM output to plain string

# -------------------------------
# 🌐 Load environment variables
# -------------------------------
load_dotenv()

# -------------------------------
# 📄 Step 1: Load and split PDF
# -------------------------------
path_to_pdf = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\RAG.pdf"

# Load the PDF document
loader = PyPDFLoader(file_path=path_to_pdf)
documents = loader.load()

# Split the text into chunks (1000 characters each with 200 characters overlap)
splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],  # Define how to split text
    chunk_size=1000,
    chunk_overlap=200
    )

chunks = splitter.split_documents(documents)

# Print number and size of chunks
print("Number of chunks:", len(chunks))
print([len(chunk.page_content) for chunk in chunks])

# -------------------------------
# 🧠 Step 2: Create embeddings & vector store
# -------------------------------
# Initialize Google embedding model
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Store document chunks in a Chroma vector store
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model
)

# -------------------------------
# 🔍 Step 3: Create retriever from vector store
# -------------------------------
retriever = vector_store.as_retriever(
    search_type="similarity",      # Use similarity search
    search_kwargs={"k": 2}         # Return top 2 similar chunks
)

# -------------------------------
# 📝 Step 4: Create a prompt template
# -------------------------------
prompt = ChatPromptTemplate.from_template("""
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say you don't know — don't try to make up an answer.
Context: {context}
Question: {question}
""")

# -------------------------------
# 🤖 Step 5: Initialize Gemini LLM
# -------------------------------
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    max_output_tokens=512,
    temperature=0.3  # Controls randomness in output (lower = more deterministic)
)

# -------------------------------
# 🔗 Step 6: Build the RAG chain
# -------------------------------
# Chain execution: Question → Pass-through → Prompt → LLM → Output Parser
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# -------------------------------
# ❓ Step 7: Ask a question
# -------------------------------
question = "What is retrieval-augmented generation (RAG)?"

# Run the RAG chain with the input question
response = chain.invoke(question)

# -------------------------------
# 📤 Output the response
# -------------------------------
print("\n📄 Answer:")
print(response)

Number of chunks: 176
[945, 943, 935, 614, 978, 916, 981, 788, 940, 986, 949, 425, 993, 978, 991, 377, 939, 986, 936, 286, 919, 970, 978, 522, 976, 976, 939, 353, 971, 930, 973, 383, 976, 935, 963, 420, 954, 949, 921, 529, 959, 938, 921, 376, 999, 986, 976, 301, 963, 942, 952, 499, 958, 941, 956, 385, 942, 958, 945, 339, 986, 951, 924, 345, 988, 930, 951, 273, 923, 990, 993, 212, 946, 996, 949, 399, 968, 944, 960, 392, 928, 970, 937, 511, 933, 941, 986, 213, 973, 924, 963, 386, 977, 943, 935, 447, 992, 921, 978, 194, 931, 959, 975, 416, 977, 932, 966, 357, 945, 929, 992, 397, 949, 928, 987, 386, 956, 983, 963, 254, 959, 929, 937, 255, 999, 975, 951, 264, 987, 978, 947, 377, 966, 969, 976, 248, 987, 938, 964, 333, 917, 984, 972, 933, 981, 942, 539, 916, 957, 929, 409, 871, 981, 669, 984, 918, 974, 538, 947, 947, 920, 417, 994, 941, 961, 341, 997, 963, 996, 518, 969, 991, 993, 946, 929, 883]

📄 Answer:
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of 