## **Learning Objectives**

By completing these exercises, you will:

- Understand Retrieval-Augmented Generation (RAG) and its components.
- Load, preprocess, and handle PDF documents effectively.
- Convert textual data into embeddings for efficient retrieval.
- Implement and test document retrieval systems using LangChain and FAISS.
- Integrate retrieval systems with free Language Models (LLMs) from ChatGroq .
- Build an interactive chat-based Q&A system.

---

## **Exercise 1: Setup and Warm-up**

In this exercise, you'll set up your environment and select a suitable language model.

**Steps:**

1. **Load Environment Variables:** Ensure your environment variables (e.g., API keys, tokens) are securely stored and loaded.
2. **Choose LLM:** Select a free LLM model from from ChatGroq. 
3. **Instantiate the Model:** Create an instance of your chosen model.


In [4]:
# ================================
# Exercise 1 ‚Äì Setup (Clean Version)
# ================================

import os
from pathlib import Path
from dotenv import load_dotenv
from langchain_groq import ChatGroq

# Load .env from repo root
env_path = Path("..") / ".env"
load_dotenv(dotenv_path=env_path)

# Verify key
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
if not GROQ_API_KEY:
    raise RuntimeError(f"GROQ_API_KEY not found in {env_path.resolve()}")

print("GROQ key loaded:", True)

# Use supported Groq model
llm = ChatGroq(
    model="llama-3.3-70b-versatile",  # updated model
    temperature=0.0,
    api_key=GROQ_API_KEY,
)

# Test call
response = llm.invoke("Reply with exactly: ok")
print("Model response:", response.content)

GROQ key loaded: True
Model response: ok


---

## **Exercise 2: Data Ingestion**

In this exercise, you'll learn to load PDF data into a Python environment.

**Steps:**

1. **Import PDF Loader:** Use LangChain‚Äôs `PyPDFLoader`.
2. **Load PDF File:** Create a function to read the PDF file.
3. **Display PDF Content:** Print the number of pages and first page content.

In [6]:
# Import PyPDFLoader
from langchain_community.document_loaders import PyPDFLoader

# Example function to load PDF

def load_pdf(pdf_path):
    pass  # Implement PDF loading here

In [9]:
from pathlib import Path
from typing import List

from langchain_community.document_loaders import PyPDFLoader
from langchain_core.documents import Document

def load_pdf(pdf_path: str | Path) -> List[Document]:
    pdf_path = Path(pdf_path)
    if not pdf_path.exists():
        raise FileNotFoundError(f"PDF not found: {pdf_path.resolve()}")
    return PyPDFLoader(str(pdf_path)).load()

# Correct folder
documents_dir = Path("..") / "documents"
pdfs = sorted(documents_dir.glob("*.pdf"))

print("PDFs found:", [p.name for p in pdfs])

# Pick one to test
pdf_path = pdfs[0]
docs = load_pdf(pdf_path)

print("\nLoaded:", pdf_path.name)
print("Pages:", len(docs))
print("\nFirst page preview:\n", docs[0].page_content[:500])
print("\nMetadata sample:\n", docs[0].metadata)

PDFs found: ['paracetamol.pdf', 'react_paper.pdf']

Loaded: paracetamol.pdf
Pages: 3

First page preview:
 202211
178 mm
422 mm
178 mm
422 mm
Front Side Back Side
 Paracetamol 500mg Tablets
178 x 422mm
178 x 30mm
358
202211
NA
Printed LeaÔ¨Çet for  Paracetamol 500mg Tablets, Open size: 178 x 422mm, Folding Size : 178x30mm 
SpeciÔ¨Åcation: 40GSM Bible Paper - Fairmed/Apohilft-Germany 
P4S Complete Solutions
01
Black
Fairmed/Apohilft-Germany 
30mm
Gebrauchsinformation: Information f√ºr den Anwender
Paracetamol 500 mg Die Apotheke hilft 
Schmerztabletten
Zur Anwendung bei Kindern ab 4 Jahren, Jugendlichen un

Metadata sample:
 {'producer': 'Adobe PDF Library 16.0', 'creator': 'Adobe InDesign 16.4 (Windows)', 'creationdate': '2021-10-12T16:15:55+02:00', 'moddate': '2021-10-12T16:15:57+02:00', 'trapped': '/False', 'source': '../documents/paracetamol.pdf', 'total_pages': 3, 'page': 0, 'page_label': '1'}


# Load your PDF and print out content here
PDFs found: ['paracetamol.pdf', 'react_paper.pdf']

Loaded: paracetamol.pdf
Pages: 3

First page preview:
 202211
178 mm
422 mm
178 mm
422 mm
Front Side Back Side
 Paracetamol 500mg Tablets
178 x 422mm
178 x 30mm
358
202211
NA
Printed LeaÔ¨Çet for  Paracetamol 500mg Tablets, Open size: 178 x 422mm, Folding Size : 178x30mm 
SpeciÔ¨Åcation: 40GSM Bible Paper - Fairmed/Apohilft-Germany 
P4S Complete Solutions
01
Black
Fairmed/Apohilft-Germany 
30mm
...
Zur Anwendung bei Kindern ab 4 Jahren, Jugendlichen un

Metadata sample:
 {'producer': 'Adobe PDF Library 16.0', 'creator': 'Adobe InDesign 16.4 (Windows)', 'creationdate': '2021-10-12T16:15:55+02:00', 'moddate': '2021-10-12T16:15:57+02:00', 'trapped': '/False', 'source': '../documents/paracetamol.pdf', 'total_pages': 3, 'page': 0, 'page_label': '1'}
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

## Ex 2 ‚Äî PDF ingestion

- Located PDFs in `../documents/` (not `/data/`).
- Loaded with `PyPDFLoader`.
- Confirmed page count + previewed first page.
- Important: notebook runs from `/notebooks`, so repo root = `..`.

At this stage:
Raw PDF ‚Üí LangChain `Document` objects.
No chunking yet, just structured pages.

---

## **Exercise 3: Document Chunking**

This exercise introduces splitting large documents into manageable text chunks.

**Steps:**

1. **Import Text Splitter:** Use `RecursiveCharacterTextSplitter`.
2. **Chunk Document:** Write a function that splits loaded documents into chunks.
3. **Test Function:** Verify by displaying the resulting chunks.


In [11]:
# Import RecursiveCharacterTextSplitter
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Example chunking function
def chunk_documents(documents, chunk_size=200, chunk_overlap=50):
    pass  # Implement your chunking logic

In [13]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

def chunk_documents(documents, chunk_size=900, chunk_overlap=150):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        add_start_index=True,
    )
    return splitter.split_documents(documents)

# Chunk previously loaded docs
chunks = chunk_documents(docs, chunk_size=900, chunk_overlap=150)

print("Total chunks:", len(chunks))
print("\nChunk 0 preview:\n", chunks[0].page_content[:400])
print("\nChunk 0 metadata:\n", chunks[0].metadata)

Total chunks: 29

Chunk 0 preview:
 202211
178 mm
422 mm
178 mm
422 mm
Front Side Back Side
 Paracetamol 500mg Tablets
178 x 422mm
178 x 30mm
358
202211
NA
Printed LeaÔ¨Çet for  Paracetamol 500mg Tablets, Open size: 178 x 422mm, Folding Size : 178x30mm 
SpeciÔ¨Åcation: 40GSM Bible Paper - Fairmed/Apohilft-Germany 
P4S Complete Solutions
01
Black
Fairmed/Apohilft-Germany 
30mm
Gebrauchsinformation: Information f√ºr den Anwender
Paracetamo

Chunk 0 metadata:
 {'producer': 'Adobe PDF Library 16.0', 'creator': 'Adobe InDesign 16.4 (Windows)', 'creationdate': '2021-10-12T16:15:55+02:00', 'moddate': '2021-10-12T16:15:57+02:00', 'trapped': '/False', 'source': '../documents/paracetamol.pdf', 'total_pages': 3, 'page': 0, 'page_label': '1', 'start_index': 0}


# Execute your chunking function and display results here

Total chunks: 29

Chunk 0 preview:
 202211
178 mm
422 mm
178 mm
422 mm
Front Side Back Side
 Paracetamol 500mg Tablets
178 x 422mm
178 x 30mm
358
202211
NA
Printed LeaÔ¨Çet for  Paracetamol 500mg Tablets, Open size: 178 x 422mm, Folding Size : 178x30mm 
SpeciÔ¨Åcation: 40GSM Bible Paper - Fairmed/Apohilft-Germany 
P4S Complete Solutions
01
Black
Fairmed/Apohilft-Germany 
30mm
Gebrauchsinformation: Information f√ºr den Anwender
Paracetamo

Chunk 0 metadata:
 {'producer': 'Adobe PDF Library 16.0', 'creator': 'Adobe InDesign 16.4 (Windows)', 'creationdate': '2021-10-12T16:15:55+02:00', 'moddate': '2021-10-12T16:15:57+02:00', 'trapped': '/False', 'source': '../documents/paracetamol.pdf', 'total_pages': 3, 'page': 0, 'page_label': '1', 'start_index': 0}

## Ex 3 ‚Äî Chunking

- Split pages into overlapping chunks (900 / 150 overlap).
- Overlap prevents answers breaking across boundaries.
- Chunk size is a retrieval trade-off:
  - Too small ‚Üí lose context.
  - Too large ‚Üí retrieval gets noisy.

Output:
Pages ‚Üí multiple semantic chunks ready for embedding.


---

## **Exercise 4: Embedding and Storage**

In this exercise, you will create embeddings from text chunks and store them efficiently.

**Steps:**

1. **Choose Embedding Model:** Use `sentence-transformers/all-mpnet-base-v2` from Hugging Face.
2. **Generate Embeddings:** Transform document chunks into embeddings.
3. **Store Embeddings:** Save these embeddings using FAISS locally.


In [14]:
# Import libraries
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

# Example function for embeddings and storage
def embed_and_store(chunks):
    pass  # Implement your embedding creation and storage logic

In [16]:
# Generate embeddings and save them locally
from pathlib import Path
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

FAISS_DIR = Path("..") / "faiss_index"

# Embedding model
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Build FAISS index from chunks
vectorstore = FAISS.from_documents(chunks, embeddings)

# Persist locally
vectorstore.save_local(str(FAISS_DIR))

print("Saved FAISS index to:", FAISS_DIR.resolve())
print("Chunks embedded:", len(chunks))

Saved FAISS index to: /Users/keith/GitHub/ds-rag-pipeline-16-02-2026/faiss_index
Chunks embedded: 29


## Ex 4 ‚Äî Embeddings + FAISS

- Embedded chunks using `all-mpnet-base-v2`.
- Built a FAISS vector index from those embeddings.
- Saved the index locally to `../faiss_index`.

Key point:
This is the ‚Äúknowledge base‚Äù step. After this, retrieval is fast and repeatable.

---

## **Exercise 5: Retrieval from FAISS**

Here, you will learn how to retrieve documents from a vector database using embeddings.

**Steps:**

1. **Load Embeddings:** Load stored embeddings from the FAISS database.
2. **Implement Retrieval:** Create logic to retrieve relevant chunks based on queries.
3. **Test Retriever:** Execute retrieval using sample queries.

In [17]:
from pathlib import Path
from langchain_community.vectorstores import FAISS

FAISS_DIR = Path("..") / "faiss_index"

# Reload index (proves persistence works)
vs = FAISS.load_local(
    str(FAISS_DIR),
    embeddings,
    allow_dangerous_deserialization=True,
)

# Create retriever
retriever = vs.as_retriever(search_kwargs={"k": 4})

# Test retrieval
query = "What is this document mainly about?"
hits = retriever.get_relevant_documents(query)

print("Query:", query)
print("Hits:", len(hits))

for i, d in enumerate(hits, start=1):
    src = d.metadata.get("source", "unknown")
    page = d.metadata.get("page", "n/a")
    print(f"\n--- Hit {i} | source={Path(src).name} | page={page} ---")
    print(d.page_content[:400])

  hits = retriever.get_relevant_documents(query)


Query: What is this document mainly about?
Hits: 4

--- Hit 1 | source=paracetamol.pdf | page=0 ---
Nehmen Sie dieses Arzneimittel immer genau wie in dieser Packungsbeilage beschrieben bzw. genau nach Anweisung Ihres Arztes 
oder Apothekers ein.
‚Ä¢ Heben Sie die Packungsbeilage auf. Vielleicht m√∂chten Sie diese sp√§ter nochmals lesen.
‚Ä¢ Fragen Sie Ihren Apotheker, wenn Sie weitere Informationen oder einen Rat ben√∂tigen.
‚Ä¢ Wenn Sie Nebenwirkungen bemerken, wenden Sie sich an Ihren Arzt oder Apothek

--- Hit 2 | source=paracetamol.pdf | page=0 ---
202211
178 mm
422 mm
178 mm
422 mm
Front Side Back Side
 Paracetamol 500mg Tablets
178 x 422mm
178 x 30mm
358
202211
NA
Printed LeaÔ¨Çet for  Paracetamol 500mg Tablets, Open size: 178 x 422mm, Folding Size : 178x30mm 
SpeciÔ¨Åcation: 40GSM Bible Paper - Fairmed/Apohilft-Germany 
P4S Complete Solutions
01
Black
Fairmed/Apohilft-Germany 
30mm
Gebrauchsinformation: Information f√ºr den Anwender
Paracetamo

--- Hit 3 | source=paracetamol.pdf

## Ex 5 ‚Äî Retrieval

- Loaded FAISS index back from disk (no re-embedding).
- Embedded the query and retrieved top-k similar chunks.
- Printed snippets + metadata to sanity-check relevance.

Key point:
If retrieval misses the right chunk, the model can‚Äôt answer ‚Äî this is the real bottleneck.

---

## **Exercise 6: Connecting Retrieval with LLM**

You'll now connect document retrieval with the Language Model.

**Steps:**

1. **Create Retrieval Chain:** Link your retrieval system to your instantiated LLM.
2. **Test the Chain:** Confirm it works by generating answers from retrieved documents.

In [18]:
# Write a function to create retrieval and document processing chains
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

# Prompt: force context-only answers + clear fallback
prompt = ChatPromptTemplate.from_messages(
    [
        ("system",
         "Answer using ONLY the provided context. "
         "If the answer is not in the context, say: Not found in the provided documents. "
         "Keep it short."),
        ("human", "Context:\n{context}\n\nQuestion:\n{input}")
    ]
)

# "Stuff" = paste retrieved docs into one prompt
doc_chain = create_stuff_documents_chain(llm, prompt)

# Retrieval chain = retriever -> doc_chain -> answer
rag_chain = create_retrieval_chain(retriever, doc_chain)

# Test
test_q = "Give me a 2 sentence summary of the document."
out = rag_chain.invoke({"input": test_q})

print("Q:", test_q)
print("A:", out["answer"])

Q: Give me a 2 sentence summary of the document.
A: The document is a package insert for Paracetamol 500 mg tablets, providing instructions and information for the user on how to take the medication and what to expect. It advises users to read the entire insert carefully, follow the instructions, and consult their doctor or pharmacist if they have any questions or experience side effects.


In [None]:
# Invoke your chain with a sample question
Q: Give me a 2 sentence summary of the document.
A: The document is a package insert for Paracetamol 500 mg tablets, providing instructions and information for the user on how to take the medication and what to expect. It advises users to read the entire insert carefully, follow the instructions, and consult their doctor or pharmacist if they have any questions or experience side effects.

## Ex 6 ‚Äî Retrieval + LLM

- Built a retrieval chain: retrieve top-k chunks ‚Üí inject into prompt ‚Üí generate answer.
- Prompt forces ‚Äúcontext-only‚Äù and a hard fallback when context is missing.
- This is the full RAG loop in one place.

If answers look wrong, it‚Äôs usually retrieval/chunking, not the model.

---

## **Exercise 7: Interactive Chat System**

In the final exercise, build an interactive chat-based query system.

**Steps:**

1. **Create Chat Interface:** Develop a simple function for interactive querying.
2. **Run the Chat:** Allow users to ask questions and receive immediate responses.


In [None]:
def rag_chat(chain):
    print("RAG chat ready. Type 'exit' to quit.")
    while True:
        q = input("\nYou> ").strip()
        if not q:
            continue
        if q.lower() in {"exit", "quit", "q"}:
            break

        res = chain.invoke({"input": q})
        print("\nAssistant>", res.get("answer", "").strip())

rag_chat(rag_chain)

RAG chat ready. Type 'exit' to quit.


## Ex 7 ‚Äî Interactive chat

- Wrapped the RAG chain in a simple input loop.
- Each question runs: embed ‚Üí retrieve ‚Üí inject context ‚Üí generate.
- Basic, but proves the whole pipeline works end-to-end.

---

## **Conclusion & Reflection**

After completing these exercises:

- Summarize key concepts learned.
- Reflect on the effectiveness and limitations of the free LLM and RAG system you've built.
- Consider how you might improve or extend your system in practical applications.

---

## Conclusion & Reflection

### Key Concepts Learned

RAG is not magic ‚Äî it‚Äôs architecture.

The core shift is separating **knowledge from reasoning**.  
Instead of asking the model to remember everything, we:

- Load documents
- Chunk them properly
- Embed them into vectors
- Store them in FAISS
- Retrieve relevant chunks per query
- Inject those into a constrained prompt

The model doesn‚Äôt ‚Äúknow‚Äù the documents. It searches them.

Chunking quality, embedding quality, and retrieval strategy directly determine answer quality.  
The LLM is just the final reasoning layer.

---

### Effectiveness of the Free LLM + RAG Setup

It works surprisingly well.

When retrieval is strong, answers are:
- More grounded
- Less hallucinated
- More specific
- More trustworthy

But limits are obvious:

- Free models are weaker at nuanced reasoning.
- If retrieval pulls mediocre chunks, the output degrades fast.
- If the answer isn‚Äôt in top-k, it won‚Äôt be found.
- Long PDFs are messy and parsing isn‚Äôt always clean.

This is not a silver bullet. It‚Äôs a structured system with trade-offs.

---

### How I Would Improve / Extend This

If this were production:

1. Improve retrieval quality  
   - Tune chunk size  
   - Hybrid search (semantic + BM25)  
   - Add re-ranking  

2. Add structure  
   - Enforce strict output schema  
   - Add fallback when confidence is low  

3. Add evaluation  
   - Measure retrieval accuracy  
   - Track hallucination rate  
   - Monitor top-k coverage  

4. Improve grounding  
   - Metadata filtering  
   - Better document preprocessing  
   - Structured document indexing  

5. Upgrade LLM layer  
   - Stronger reasoning model  
   - Lower temperature  
   - Deterministic prompting  

The key lesson:

RAG is only as strong as its weakest layer ‚Äî chunking, embeddings, retrieval, or generation.

The LLM is not the system.  
The pipeline is the system.

# RAG Notebook 1 ‚Äì End-to-End RAG Pipeline (LangChain + FAISS + ChatGroq)

---

## üéØ Objective

Build a complete Retrieval-Augmented Generation (RAG) system:

PDF ‚Üí Chunk ‚Üí Embed ‚Üí Store (FAISS)  
Query ‚Üí Retrieve ‚Üí Augment ‚Üí Generate ‚Üí Answer

This notebook implements the full ingestion and inference pipeline.

---

# 1Ô∏è‚É£ Learning Objectives

By completing this notebook, I can:

- Explain RAG architecture and why it‚Äôs needed
- Load and preprocess PDF documents
- Split documents into context-aware chunks
- Generate embeddings using HuggingFace models
- Store and query embeddings in FAISS
- Connect retrieval to a free LLM (ChatGroq)
- Build an interactive document Q&A system

---

# 2Ô∏è‚É£ Architecture Overview

## Two Main Stages

### Ingestion (One-Time Setup)

1. Load documents
2. Chunk documents
3. Create embeddings
4. Store in vector database (FAISS)

### Inference (Per Query)

1. Embed user query
2. Retrieve top-k relevant chunks
3. Inject retrieved context into prompt
4. Generate grounded answer

---

# 3Ô∏è‚É£ Exercise Breakdown

---

## Exercise 1 ‚Äì Setup & Warm-Up

### What Happens Here

- Load `.env`
- Securely load `GROQ_API_KEY`
- Select free ChatGroq LLM
- Instantiate the model

### Why It Matters

This creates the LLM layer that will later be connected to retrieval.

---

## Exercise 2 ‚Äì Data Ingestion

### Tool Used
`PyPDFLoader`

### Steps

- Load PDF
- Inspect number of pages
- Examine raw text

### Concept

Documents are converted into structured `Document` objects that LangChain can process.

---

## Exercise 3 ‚Äì Document Chunking

### Tool Used
`RecursiveCharacterTextSplitter`

### Key Parameters

- `chunk_size`
- `chunk_overlap`

### Why Chunking Is Critical

- LLMs have context window limits
- Smaller chunks improve retrieval precision
- Overlap preserves semantic continuity

Output: List of text chunks ready for embedding.

---

## Exercise 4 ‚Äì Embedding & Storage

### Embedding Model

`sentence-transformers/all-mpnet-base-v2`

### What Happens

- Convert chunks ‚Üí numerical vectors
- Store vectors in FAISS
- Persist locally for reuse

### Why Embeddings Matter

Embeddings capture semantic meaning.
FAISS enables fast similarity search.

This builds the searchable knowledge base.

---

## Exercise 5 ‚Äì Retrieval from FAISS

### Retrieval Logic

- Embed user query
- Perform similarity search
- Retrieve top-k relevant chunks

### Core Concept

Semantic similarity (cosine similarity)

The model does not "know" the documents ‚Äî it searches them.

---

## Exercise 6 ‚Äì Retrieval + LLM Integration

### What Happens

- Create retrieval chain
- Inject retrieved documents into system prompt
- Instruct model to answer only from context

### Why This Is RAG

The LLM no longer relies purely on training data.
It reasons over retrieved evidence.

---

## Exercise 7 ‚Äì Interactive Chat System

### Implementation

- Wrap retrieval chain in loop
- Accept user input
- Generate grounded responses

### What This Simulates

A production-ready knowledge assistant.

---

# 4Ô∏è‚É£ Conceptual Flow

PDF
‚Üí Load
‚Üí Chunk
‚Üí Embed
‚Üí Store (FAISS)

User Query
‚Üí Embed
‚Üí Retrieve
‚Üí Inject Context
‚Üí LLM Generate
‚Üí Response

---

# 5Ô∏è‚É£ Key Takeaways

- RAG separates knowledge storage from reasoning.
- Retrieval quality determines output quality.
- Chunk size affects precision.
- Free LLMs may limit reasoning depth.
- Hallucinations are reduced but not eliminated.

---

# 6Ô∏è‚É£ Limitations

- Poor chunking = poor retrieval
- Weak embeddings = weak search
- Free models may hallucinate despite grounding
- Parsing PDFs can be messy

---

# 7Ô∏è‚É£ Possible Improvements

- Hybrid search (BM25 + semantic)
- Re-ranking retrieved chunks
- Metadata filtering
- Structured output enforcement
- Evaluation metrics for retrieval quality
- Caching layer
- Better prompt engineering

---

# 8Ô∏è‚É£ Mental Model

RAG = Retrieval + Grounded Prompt + Controlled Generation

Instead of:
Model ‚Üí Guess

We now have:
Search ‚Üí Inject Evidence ‚Üí Generate

This notebook implements a complete working RAG system from ingestion to interactive inference.