# **Retrieval-Augmented Generation (RAG)**

## ⭐Introduction

RAG is a powerful technique used in Large Language Model (LLM) applications to **enhance responses** by **retrieving relevant context** from external sources like `document stores`, `knowledge bases`, or `databases`.

### 🧠 **RAG Workflow (inshort)**

1. 🟡 **User Prompt**
   * The user provides a natural language question or prompt.

2. 🔤 **Embedding Model**
   * The prompt is passed into an **embedding model**, which converts it into a **vector** (numerical representation of semantic meaning).

3. 📦 **Vector Database**
   * The vector is sent to a **vector database** that contains pre-embedded document chunks.
   * It performs a **similarity search** to find the **most relevant document chunks**.

4. 📄 **Most Similar Documents**
   * The top-matching chunks (based on vector similarity) are retrieved.

5. 🧾 **Prompt Template Construction**
   * A **prompt template** is created that includes:

     * 🔹 Instructions (optional)
     * 🔹 The original **user prompt**
     * 🔹 The **retrieved document chunks** (context)

6. 🤖 **LLM (Large Language Model)**
   * The complete prompt is passed to the **LLM**.
   * The LLM uses both the prompt and the retrieved context to generate a **context-aware response**.

7. ✅ **Output**
   * The final, enriched answer is returned to the user.

---

### 🔁 **Feedback Loop (Optional)**

* The output can also be fed back to the system for:

  * Further refinement
  * Re-ranking
  * Memory-based updates



> This workflow helps RAG **bridge the gap between static LLM knowledge and dynamic, up-to-date external information**, improving factual accuracy and relevance.

![RAG Workflow](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0301.jpeg)


### 🧩 **Key Components of a RAG Workflow**

---

#### 📌 1. Embeddings for Semantic Retrieval

- 🔸 Converts text into **dense vector representations** using an embedding model.
- 🔸 These vectors capture **semantic meaning**, not just keywords.
- 🔸 Used to compare and retrieve similar content from a document store.

##### ✅ Common Embedding Models:
- `OpenAI` → `text-embedding-3-small`
- `Gemini` → `gemini-1.5-flash`
- `SentenceTransformers` → `all-MiniLM-L6-v2`
- `Hugging Face` → `intfloat/e5-large`, `BAAI/bge-large-en`

##### 🧪 Code Example:
```python
from langchain.embeddings import OpenAIEmbeddings
embedding = OpenAIEmbeddings()
query_vector = embedding.embed_query("User prompt")

from langchain_google import GeminiEmbeddings
embedding = GeminiEmbeddings(model="gemini-1.5-flash")
query_vector = embedding.embed_query("User prompt")
````

---

#### 🗃️ 2. Vector Database for Storage & Retrieval

* 🔸 Stores document embeddings.
* 🔸 Supports **similarity search** to fetch the most relevant chunks.

##### ✅ Popular Vector Stores:

* `Chroma` (LangChain-native)
* `FAISS` (open-source, local)
* `Pinecone` (cloud, scalable)
* `Weaviate`, `Qdrant`, `Milvus` (advanced features)

##### 🧪 Code Example:

```python
from langchain.vectorstores import FAISS
docsearch = FAISS.from_documents(documents, embedding)
results = docsearch.similarity_search("User prompt", k=5)
```

---

#### 🧾 3. Prompt Engineering & Chaining

* 🔸 Retrieved documents are merged with the user's question.
* 🔸 A **prompt template** is used to structure this input to the LLM.

##### 🧱 Prompt Template Example:

```python
from langchain.prompts import PromptTemplate

template = PromptTemplate(
    input_variables=["context", "question"],
    template="Answer based on context:\n{context}\n\nQuestion: {question}"
)
```

##### 🔗 Chain With LLM:

```python
from langchain.chains import LLMChain
chain = LLMChain(llm=OpenAI(), prompt=template)
response = chain.run({"context": retrieved_docs, "question": user_prompt})
```

---

#### 📂 4. Document Loaders

* 🔸 Load documents of various formats into LangChain.
* 🔸 Extract text for further processing and embedding.

##### ✅ Supported Formats:

`.pdf`, `.csv`, `.docx`, `.html`, `.md`, `.pptx`, `.email`, etc.

##### 🧪 Code Example:

```python
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("document.pdf")
documents = loader.load()
```

---

#### ✂️ 5. Document Splitting (Chunking)

* 🔸 LLMs have limited context windows — so long documents must be split.
* 🔸 Splitters create **overlapping chunks** to preserve context.

##### ✅ Common Text Splitters:

* `CharacterTextSplitter`
* `RecursiveCharacterTextSplitter` (preferred for preserving semantics)
* `TokenTextSplitter` (based on token count)

##### 🧪 Code Example:

```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)
```

**Key Topics Covered:**
- Document loaders
- Text splitting strategies
- Vector databases
- Prompt chaining
- End-to-end RAG example

## ⭐1. Integrating Document Loaders

**Concept Overview:**
- LangChain provides document loader classes to load data from different formats.
- Supported formats include `.pdf`, `.csv`, `.html`, and third-party sources.
- This step is essential for feeding content into your LLM applications.

**Supported Loaders:**
- `PyPDFLoader` for PDF files
- `CSVLoader` for CSV files
- `UnstructuredHTMLLoader` for HTML files

In [None]:
# PDF Loader Example
from langchain_community.document_loaders import PyPDFLoader


# Use r"..." for Windows paths to handle backslashes, or use forward slashes /.
path = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\attention_is_all_you_need.pdf"

# Load the PDF document
loader = PyPDFLoader(path)

data = loader.load()
print(data[0])

In [None]:
# CSV Loader Example
from langchain_community.document_loaders.csv_loader import CSVLoader

path = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\fifa_countries_audience.csv"

loader = CSVLoader(path)
data = loader.load()
print(data[0])

In [None]:
# HTML Loader Example
# Requires: pip install unstructured
from langchain_community.document_loaders import UnstructuredHTMLLoader

path = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\white_house_executive_order_nov_2023.html"

loader = UnstructuredHTMLLoader(path)
data = loader.load()
print(data[0])
print(data[0].metadata)

## ⭐2. Splitting External Data for Retrieval



![img_2](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0302.jpeg)

**Concept Overview:**
- Documents must be split into chunks to fit the LLM's context window.
- Use `CharacterTextSplitter` or `RecursiveCharacterTextSplitter`.
- `chunk_size` and `chunk_overlap` control chunk boundaries.

### 🔹 Why we use `chunk_size`, `chunk_overlap`, and `separator`


#### ✅ `chunk_size`
- Sets the **maximum number of characters** in each chunk.
- Ensures the text is split into **small, manageable pieces** for tasks like embedding or retrieval.

---

#### ✅ `chunk_overlap`
- Ensures that each chunk **shares a few characters** with the next one (3 in this example).
- Helps **preserve context** between chunks, which improves semantic understanding.
  
![Chunk Overlap Example](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0303.jpeg)

---

#### ✅ `separator='.'`
- Tells the splitter to try breaking the text at **sentence boundaries** (periods).
- Ensures chunks **end naturally**, ideally at the end of a sentence.

---

### 🔸 Text Splitter: RecursiveCharacterTextSplitter

This splitter tries to **preserve meaning** by recursively splitting the text at increasingly granular levels:

1. **Paragraph**: `"\n\n"`
2. **Sentence**: `"\n"`
3. **Word**: `" "`
4. **Character**: `"."` *(fallback if nothing else works)*

🔁 If the text cannot be split cleanly at one level, it **falls back to the next**.

✅ **Goal**: Create chunks that are both **semantically meaningful** and within the size limit.


In [3]:
from langchain_text_splitters import CharacterTextSplitter

quote = '''One machine can do the work of fifty ordinary humans.\nNo machine can do the work of one extraordinary human.'''
chunk_size = 24
chunk_overlap = 3

ct_splitter = CharacterTextSplitter(separator='.',
                                    chunk_size=chunk_size,
                                    chunk_overlap=chunk_overlap)

docs = ct_splitter.split_text(quote)

print(docs)
print([len(doc) for doc in docs])

Created a chunk of size 52, which is longer than the specified 24


['One machine can do the work of fifty ordinary humans', 'No machine can do the work of one extraordinary human']
[52, 53]


In [29]:
from langchain.document_loaders import UnstructuredHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load the HTML document into memory
path = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\white_house_executive_order_nov_2023.html"
loader = UnstructuredHTMLLoader(path)

# Define variables
chunk_size = 300
chunk_overlap = 100

# Split the HTML
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separators=["."]  # Splitting on periods
)

docs = splitter.split_documents(data)
print(docs)

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'E:\\01_Github_Repo\\GenAI-with-Langchain-and-Huggingface\\_Developing_LLMs_Applications_with_LangChain\\_data\\attention_is_all_you_need.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google'), Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter


from langchain_community.document_loaders import PyPDFLoader
path = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\attention_is_all_you_need.pdf"
loader = PyPDFLoader(path)

data = loader.load()

chunk_size = 24
chunk_overlap = 10

rc_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separators=["\n\n", "\n", " ", ""]
)

docs = rc_splitter.split_documents(data)

print(docs[0])
print(docs[0].metadata)
print(docs[0].page_content[:1000]) # Print first 100 characters of the first document)
print([len(doc.page_content) for doc in docs])  # Print length of each document's content

page_content='Provided proper' metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'E:\\01_Github_Repo\\GenAI-with-Langchain-and-Huggingface\\_Developing_LLMs_Applications_with_LangChain\\_data\\attention_is_all_you_need.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}
{'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'E:\\01_Github_Repo\\GenAI-with-Langchain-and-Huggingface\\_Developing_

## ⭐3. RAG Storage and Retrieval Using Vector Databases

![img_4](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0304.jpeg)


**Concept Overview:**
- Vector databases store and retrieve document embeddings.
- Common choices include Chroma, Pinecone, FAISS, etc.
- Use retrievers to query documents most similar to a user input.

In [None]:
# Using Chroma Vector Store with Google Gemini Embeddings

# Import necessary modules for LLM, embeddings, vector store, document handling, and environment variable loading
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document 
from dotenv import load_dotenv

# Load environment variables from a .env file
load_dotenv()

# Initialize the Google Gemini 1.5 Flash language model
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    # temperature=0.2,  # (optional) control creativity
    # max_tokens=50     # (optional) limit output length
)

# Initialize the embedding function using Google's embedding model
embedding_function = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001"
)

# Create example documents with metadata
docs = [
    Document(
        page_content="In all marketing copy, TechStack should always be written with the T and S capitalized.",
        metadata={"guideline": "brand-capitalization"}  # Tagging the guideline type
    ),
    Document(
        page_content="Our users should be referred to as techies in both internal and external communications.",
        metadata={"guideline": "referring-to-users"}  # Tagging the guideline type
    )
]

# Create a Chroma vector store from the documents, using the embedding function
# Persist the data to a local directory for future retrieval
vectorstore = Chroma.from_documents(
    docs,
    embedding=embedding_function,
    persist_directory=" "  # Save vector DB locally
)

# Create a retriever from the vector store to fetch the top 2 most similar documents based on embeddings
retriever = vectorstore.as_retriever(
    search_type="similarity",         # Use similarity search
    search_kwargs={"k": 2}            # Retrieve top 2 similar docs
    )

## ⭐4. Chaining it All Together with Prompt Templates

**Concept Overview:**
- Use LangChain's `ChatPromptTemplate` to build reusable prompt structures.
- Chain retriever output with LLM using `RunnablePassthrough`.

In [33]:
from langchain_core.prompts import ChatPromptTemplate

message = '''
        Review and fix the following TechStack marketing copy with the following guidelines in consideration:

        Guidelines:
        {guidelines}

        Copy:
        {copy}

        Fixed Copy:
        '''

prompt_template = ChatPromptTemplate.from_messages([("human", message)])

In [None]:
from langchain_core.runnables import RunnablePassthrough

rag_chain = ({"guidelines": retriever, "copy": RunnablePassthrough()} 
             |prompt_template 
             |llm)

response = rag_chain.invoke("Here at techstack, our users are the best in the world!")
print(response.content)

Here at TechStack, our users are the best in the world!


---

## Full c⭕de 

In [1]:
# This script:
# 1. Loads a PDF and splits it into chunks
# 2. Embeds the chunks using Google Gemini embeddings
# 3. Stores them in a Chroma vector store
# 4. Sets up a retriever for similarity search
# 5. Creates a prompt template and links it to an RAG chain using Gemini LLM

import os
from dotenv import load_dotenv

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.document_loaders import PyPDFLoader
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Load environment variables from .env file (e.g., API keys)
load_dotenv()

# --- Step 1: Load the PDF file ---
pdf_path = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\attention_is_all_you_need.pdf"
loader = PyPDFLoader(pdf_path)
raw_documents = loader.load()

# --- Step 2: Split the document into smaller overlapping text chunks ---
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
documents = splitter.split_documents(raw_documents)

# --- Step 3: Generate vector embeddings using Gemini ---
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# --- Step 4: Store vectors in a Chroma vector database ---
vector_store = Chroma.from_documents(
    documents,
    embedding=embedding_model,
    persist_directory=os.getcwd()  # Save in current working directory
)

# --- Step 5: Create a retriever for similarity-based search ---
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 most relevant chunks
)

# --- Step 6: Define prompt template for the RAG chain ---
prompt_template = ChatPromptTemplate.from_messages([
    ("human", """
      Answer the following question using only the provided context.

    Context:
    {context}

    Question:
    {question}

    Answer:
    """)
    ])

# --- Step 7: Initialize the Gemini LLM for response generation ---
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

# --- Step 8: Compose a Retrieval-Augmented Generation (RAG) chain ---
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()} |
    prompt_template |
    llm
    )

# --- Step 9: Ask a relevant question about the paper ---
question = "What is the main innovation introduced by the 'Attention is All You Need' paper?"

# --- Step 10: Get and print the response from the RAG pipeline ---
response = rag_chain.invoke(question)
print(response.content)

The provided text does not explicitly state the main innovation of the "Attention is All You Need" paper.  While it mentions self-attention and multi-head attention, it doesn't identify either as the *main* innovation.
