![RAG Architecture](RAG_Architecture.png)

### RAG Components
1. Document Loading
2. Document Splitting
3. Vectorstores and Embedding
4. Retrieval
5. Question Answering Chain
6. Conversational Retrieval Chain

## 1) Loaders
In LangChain, a document loader is a utility that helps you load data from different sources into a standardized document format so that it can be processed further (cleaned, split, embedded, retrieved, etc.).

#### Examples:
1. Text Loader
2. CSV Loader
3. PDF Loader
4. YouTube Loader
5. WebBase Loader

In [25]:
# Install dependencies
!pip install -qU langchain
!pip install -qU langchain-community
!pip install -qU pypdf


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### PDF Loader

In [26]:
from langchain_community.document_loaders import PyPDFLoader

In [27]:
loader = PyPDFLoader("MachineLearning-Lecture01.pdf")
pages = loader.load()

In [28]:
print(type(pages))

<class 'list'>


In [29]:
print(type(pages[1]))

<class 'langchain_core.documents.base.Document'>


In [30]:
print(pages[1])

page_content='many biologers are there here? Wow, just a few, not many. I'm surprised. Anyone from 
statistics? Okay, a few. So where are the rest of you from?  
Student : iCME.  
Instructor (Andrew Ng) : Say again?  
Student : iCME.  
Instructor (Andrew Ng) : iCME. Cool.  
Student : [Inaudible].  
Instructor (Andrew Ng) : Civi and what else?  
Student : [Inaudible]  
Instructor (Andrew Ng) : Synthesis, [inaudible] systems. Yeah, cool.  
Student : Chemi.  
Instructor (Andrew Ng) : Chemi. Cool.  
Student : [Inaudible].  
Instructor (Andrew Ng) : Aero/astro. Yes, right. Yeah, okay, cool. Anyone else?  
Student : [Inaudible].  
Instructor (Andrew Ng) : Pardon? MSNE. All right. Cool. Yeah.  
Student : [Inaudible].  
Instructor (Andrew Ng) : Pardon?  
Student : [Inaudible].  
Instructor (Andrew Ng) : Endo —  
Student : [Inaudible].  
Instructor (Andrew Ng) : Oh, I see, industry. Okay. Cool. Great, great. So as you can 
tell from a cross-section of this class, I think we're a very diverse au

In [31]:
print(pages[1].metadata)

{'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'PScript5.dll Version 5.2.2', 'creationdate': '2008-07-11T11:25:23-07:00', 'author': '', 'moddate': '2008-07-11T11:25:23-07:00', 'title': '', 'source': 'MachineLearning-Lecture01.pdf', 'total_pages': 22, 'page': 1, 'page_label': '2'}


#### Youtube Loader

In [32]:
!pip install -qU langchain-yt-dlp


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [33]:
from langchain_yt_dlp.youtube_loader import YoutubeLoaderDL

# Basic transcript loading
loader = YoutubeLoaderDL.from_youtube_url(
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ", add_video_info=True
)

In [34]:
#documents = loader.load()
# Started giving error recently, apparently some breaking change introduced in Langchain latest version

In [35]:
#print(documents)

#### WebBase Loader


In [36]:
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://github.com/sanketana/GenAI-Foudations/blob/main/Week06_RAG_1/notes.md")
docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [37]:
#print(docs[0])

## 2) Splitters
A document splitter is a utility that takes a large document (or multiple documents) and breaks it into smaller, more manageable chunks of text.

#### Types of Splitters in Langchain
1. CharacterTextSplitter
2. RecursiveCharacterTextSplitter
3. TokenTextSplitter
4. Markdown / Code Splitters

![Example Splitter](Example_Splitter.png)

| Feature                  | CharacterTextSplitter                        | RecursiveCharacterTextSplitter                          |
|---------------------------|-----------------------------------------------|---------------------------------------------------------|
| Splitting method          | Fixed-size, raw character cuts               | Tries hierarchical separators (para → sentence → word → char) |
| Preserves semantic meaning| ❌ Often cuts in middle of words/sentences    | ✅ Keeps chunks aligned to natural text boundaries       |
| Default separators        | ["\n\n"]                       | `["\n\n", "\n", " ", ""]` (paragraph, line, space, char)|
| Chunk size handling       | Strict cutoff at `chunk_size`                | Tries largest separator where chunk ≤ `chunk_size`      |
| Chunk overlap             | ✅ Supported                                 | ✅ Supported                                            |
| Output consistency        | More predictable (always equal-sized chunks) | More variable (chunks may differ in size depending on separators) |
| Performance               | Faster, simpler                             | Slightly slower due to recursive splitting logic        |
| Readability of chunks     | Poor (fragments of sentences)                 | Better (complete sentences/paragraphs where possible)   |
| Best use case             | Very short/simple text; testing              | Long docs, PDFs, transcripts, RAG pipelines             |

In [38]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

In [39]:
# Example text
text = """Artificial Intelligence is changing the world. It is being used in healthcare, education, and entertainment. 

However, AI also raises ethical concerns. Bias, privacy, and misuse are important issues."""

In [40]:
chunk_size =50
chunk_overlap = 10

In [41]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

In [42]:
char_chunks = c_splitter.split_text(text)
print(char_chunks)

Created a chunk of size 109, which is longer than the specified 50


['Artificial Intelligence is changing the world. It is being used in healthcare, education, and entertainment.', 'However, AI also raises ethical concerns. Bias, privacy, and misuse are important issues.']


In [43]:
for chunk in char_chunks:
    print(chunk)

Artificial Intelligence is changing the world. It is being used in healthcare, education, and entertainment.
However, AI also raises ethical concerns. Bias, privacy, and misuse are important issues.


In [44]:
rec_chunks = r_splitter.split_text(text)
print(rec_chunks)

['Artificial Intelligence is changing the world. It', 'world. It is being used in healthcare, education,', 'and entertainment.', 'However, AI also raises ethical concerns. Bias,', 'Bias, privacy, and misuse are important issues.']


In [45]:
for chunk in rec_chunks:
    print(chunk)

Artificial Intelligence is changing the world. It
world. It is being used in healthcare, education,
and entertainment.
However, AI also raises ethical concerns. Bias,
Bias, privacy, and misuse are important issues.


## 3) Vectorstores and Embeddings

#### Combining loading and splitting

In [None]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("MachineLearning-Lecture01.pdf")
docs = loader.load()
print(docs)

In [47]:
import pprint
pprint.pp(docs[0].metadata)

{'producer': 'Acrobat Distiller 8.1.0 (Windows)',
 'creator': 'PScript5.dll Version 5.2.2',
 'creationdate': '2008-07-11T11:25:23-07:00',
 'author': '',
 'moddate': '2008-07-11T11:25:23-07:00',
 'title': '',
 'source': 'MachineLearning-Lecture01.pdf',
 'total_pages': 22,
 'page': 0,
 'page_label': '1'}


In [48]:
# Bulk Loading PDF
loaders = [
    PyPDFLoader("MachineLearning-Lecture01.pdf"),
    PyPDFLoader("MachineLearning-Lecture01.pdf"),
    PyPDFLoader("MachineLearning-Lecture02.pdf"),    
    PyPDFLoader("MachineLearning-Lecture03.pdf"),    
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [49]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

In [50]:
splits = text_splitter.split_documents(docs)

In [51]:
print(len(splits))

208


### Embeddings

In [52]:
!pip install -qU langchain-openai
!pip install -qU python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [53]:
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv()

# Access your API key
api_key = os.getenv("OPENAI_API_KEY")
print("API Key:", api_key[:5] + "*****")  # just to verify it’s loaded

API Key: sk-pr*****


In [54]:
embedding = OpenAIEmbeddings(
    model="text-embedding-3-small"
)
# Can also explicitly pass key as openai_api_key=api_key

In [55]:
coffee1 = "I enjoy drinking coffee in the morning."
coffee2 = "I love having a cup of filter coffee when I wake up"
market = "The stock market had a big crash yesterday."
mug = "I crashed the stock of my coffee mug yesterday."

In [56]:
coffee1_embedding = embedding.embed_query(coffee1)
coffee2_embedding = embedding.embed_query(coffee2)
market_embedding = embedding.embed_query(market)
mug_embedding = embedding.embed_query(mug)

In [57]:
import numpy as np

In [58]:
np.dot(np.array(coffee1_embedding), np.array(coffee2_embedding))

np.float64(0.627351559087536)

In [59]:
np.dot(np.array(coffee1_embedding), np.array(market_embedding))

np.float64(0.07100796107510074)

### Vectorstores

In [60]:
!pip install -qU chromadb


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [61]:
from langchain.vectorstores import Chroma

In [62]:
persist_directory = 'docs/chroma'
!rm -rf ./docs/chroma

In [63]:
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory=persist_directory
)

In [64]:
print(vectordb._collection.count())

208


### Similarity Search

In [65]:
question = "is there an email i can ask for help"
docs = vectordb.similarity_search(question, k=3)

In [66]:
len(docs)

3

In [67]:
docs[0].page_content

"cs229-qa@cs.stanford.edu. This goes to an account that's read by all the TAs and me. So \nrather than sending us email individually, if you send email to this account, it will \nactually let us get back to you maximally quickly with answers to your questions.  \nIf you're asking questions about homework problems, please say in the subject line which \nassignment and which question the email refers to, since that will also help us to route \nyour question to the appropriate TA or to me appropriately and get the response back to \nyou quickly.  \nLet's see. Skipping ahead — let's see — for homework, one midterm, one open and term \nproject. Notice on the honor code. So one thing that I think will help you to succeed and \ndo well in this class and even help you to enjoy this class more is if you form a study \ngroup.  \nSo start looking around where you're sitting now or at the end of class today, mingle a \nlittle bit and get to know your classmates. I strongly encourage you to form st

### Failure Cases - Duplicate Chunks in Search Results

In [68]:
question = "what did they say about the matlab"
docs = vectordb.similarity_search(question, k=5)
docs[0]

Document(metadata={'page_label': '9', 'page': 8, 'creator': 'PScript5.dll Version 5.2.2', 'source': 'MachineLearning-Lecture01.pdf', 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'author': '', 'moddate': '2008-07-11T11:25:23-07:00', 'total_pages': 22, 'creationdate': '2008-07-11T11:25:23-07:00', 'title': ''}, page_content='those homeworks will be done in either MATLAB or in Octave, which is sort of — I \nknow some people call it a free version of MATLAB, which it sort of is, sort of isn\'t.  \nSo I guess for those of you that haven\'t seen MATLAB before, and I know most of you \nhave, MATLAB is I guess part of the programming language that makes it very easy to \nwrite codes using matrices, to write code for numerical routines, to move data around, to \nplot data. And it\'s sort of an extremely easy to learn tool to use for implementing a lot of \nlearning algorithms.  \nAnd in case some of you want to work on your own home computer or something if you \ndon\'t have a MATLAB license

In [69]:
docs[1]

Document(metadata={'author': '', 'total_pages': 22, 'source': 'MachineLearning-Lecture01.pdf', 'page_label': '9', 'moddate': '2008-07-11T11:25:23-07:00', 'page': 8, 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creationdate': '2008-07-11T11:25:23-07:00', 'title': '', 'creator': 'PScript5.dll Version 5.2.2'}, page_content='those homeworks will be done in either MATLAB or in Octave, which is sort of — I \nknow some people call it a free version of MATLAB, which it sort of is, sort of isn\'t.  \nSo I guess for those of you that haven\'t seen MATLAB before, and I know most of you \nhave, MATLAB is I guess part of the programming language that makes it very easy to \nwrite codes using matrices, to write code for numerical routines, to move data around, to \nplot data. And it\'s sort of an extremely easy to learn tool to use for implementing a lot of \nlearning algorithms.  \nAnd in case some of you want to work on your own home computer or something if you \ndon\'t have a MATLAB license

### Failure Cases - Semantic Lookup ignoring Metadata

In [70]:
question = "what did they say about regression in the third lecture"
docs = vectordb.similarity_search(question, k=5)
for doc in docs:
    print(doc.metadata)

{'creator': 'PScript5.dll Version 5.2.2', 'moddate': '2008-07-11T11:25:03-07:00', 'page_label': '1', 'page': 0, 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creationdate': '2008-07-11T11:25:03-07:00', 'total_pages': 16, 'title': '', 'author': '', 'source': 'MachineLearning-Lecture03.pdf'}
{'moddate': '2008-07-11T11:25:03-07:00', 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'total_pages': 16, 'creationdate': '2008-07-11T11:25:03-07:00', 'author': '', 'page_label': '15', 'source': 'MachineLearning-Lecture03.pdf', 'title': '', 'page': 14, 'creator': 'PScript5.dll Version 5.2.2'}
{'author': '', 'page': 6, 'creationdate': '2008-07-11T11:25:03-07:00', 'page_label': '7', 'creator': 'PScript5.dll Version 5.2.2', 'title': '', 'source': 'MachineLearning-Lecture03.pdf', 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'moddate': '2008-07-11T11:25:03-07:00', 'total_pages': 16}
{'source': 'MachineLearning-Lecture03.pdf', 'creationdate': '2008-07-11T11:25:03-07:00', 'producer': 'Acrobat Dis

## 4) Retrieval
Fetching the most relevant pieces of external information (chunks of documents, knowledge base, etc.) to provide extra context to the LLM before it generates an answer.

#### Types of Retrieval


| Attribute            | Vector Similarity                          | BM25 / Keyword                  | Hybrid Search                     | Re-ranking                          | Structured Retrieval                  |
|----------------------|-------------------------------------------|---------------------------------|----------------------------------|------------------------------------|--------------------------------------|
| How It Works          | Embed query & docs, find nearest vectors | TF-IDF based keyword match       | Combines vector + keyword search  | Retrieve many, rerank with model   | Query structured DB or API            |
| Strengths             | Captures semantic meaning                 | Exact keyword matching           | Balances semantic & lexical      | High precision ranking             | Accurate for structured facts         |
| Weaknesses            | Misses exact keywords                     | Fails on semantic similarity     | More complex infra               | Expensive at scale                  | Needs schema alignment                |
| When to Use           | General semantic search                   | Legal, technical, IDs           | Production-grade RAG             | Customer-facing apps, high accuracy | Enterprise + DB + knowledge graph    |

#### Maximum Marginal Relevance (MMR)
- You may not always want to choose the most similar responses
- Could give a very narrow view of the topic
- Eg:
1. **Wikipedia / Photosynthesis**  
   - **Without MMR**: Top-k retrieves 3 paragraphs all about the light reaction.  
   - **With MMR**: You get **light reaction + Calvin cycle + chloroplast structure** — comprehensive and non-repetitive.  

2. **News Articles / Election Results**  
   - **Without MMR**: Top-k might select 3 snippets all repeating **who won**.  
   - **With MMR**: You get **winner + voter turnout + reactions from parties and citizens** — broader context.  

3. **Product Reviews / Smartphone Pros**  
   - **Without MMR**: Top-k selects 3 reviews all saying **camera is good**.  
   - **With MMR**: You get **camera + battery + display quality** — highlights diverse advantages. 

In [71]:
question = "what did they say about matlab?"
docs_ss = vectordb.similarity_search(question,k=3)
docs_ss[0].page_content[:100]

'those homeworks will be done in either MATLAB or in Octave, which is sort of — I \nknow some people c'

In [72]:
docs_ss[1].page_content[:100]

'those homeworks will be done in either MATLAB or in Octave, which is sort of — I \nknow some people c'

In [73]:
docs_mmr = vectordb.max_marginal_relevance_search(question,k=3)
docs_mmr[0].page_content[:100]

'those homeworks will be done in either MATLAB or in Octave, which is sort of — I \nknow some people c'

In [74]:
docs_mmr[1].page_content[:100]

'least squares regression being a bad idea for classification problems and then I did a \nbunch of mat'

#### Metadata Based Search
Augumenting similarity search with exact metadata wherever possible
Eg: Search for regression in 3rd lecture transcript which is MachineLearning-Lecture03.pdf (metadata filer Source: MachineLearning-Lecture03.pdf)

In [75]:
question = "what did they say about regression in the third lecture?"

docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"MachineLearning-Lecture03.pdf"}
)

In [76]:
for d in docs:
    print(d.metadata)

{'creationdate': '2008-07-11T11:25:03-07:00', 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'author': '', 'total_pages': 16, 'moddate': '2008-07-11T11:25:03-07:00', 'page_label': '1', 'page': 0, 'title': '', 'source': 'MachineLearning-Lecture03.pdf', 'creator': 'PScript5.dll Version 5.2.2'}
{'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'author': '', 'creationdate': '2008-07-11T11:25:03-07:00', 'title': '', 'page': 14, 'total_pages': 16, 'creator': 'PScript5.dll Version 5.2.2', 'source': 'MachineLearning-Lecture03.pdf', 'page_label': '15', 'moddate': '2008-07-11T11:25:03-07:00'}
{'page': 6, 'moddate': '2008-07-11T11:25:03-07:00', 'source': 'MachineLearning-Lecture03.pdf', 'total_pages': 16, 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'PScript5.dll Version 5.2.2', 'author': '', 'creationdate': '2008-07-11T11:25:03-07:00', 'title': '', 'page_label': '7'}


#### Metadata Self Query Retriever

SelfQueryRetriever uses an LLM to dynamically translate your natural language question into both semantic + metadata queries so that your vector DB returns the most relevant and context-aware chunks.

##### What it does internally:
- Determines semantic intent
- Determines metadata filters if applicable
- Sends the query to vector DB with filters

##### Why this is powerful
- You don’t need to manually write filters or craft queries.
- The LLM automatically “understands” the schema and selects relevant docs.
- Works well for RAG pipelines, especially with metadata-rich corpora (like CS229 transcripts with topics, lecture numbers, etc.).

In [77]:
from langchain_openai import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

In [78]:
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from, should be one of `MachineLearning-Lecture01.pdf`, `MachineLearning-Lecture02.pdf`, or `MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]

In [79]:
document_content_description = "Lecture notes"
llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

In [80]:
question = "what did they say about regression in the third lecture?"

In [81]:
docs = retriever.invoke(question)

In [82]:
for d in docs:
    print(d.metadata)

{'title': '', 'page_label': '3', 'author': '', 'source': 'MachineLearning-Lecture03.pdf', 'page': 2, 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'moddate': '2008-07-11T11:25:03-07:00', 'creator': 'PScript5.dll Version 5.2.2', 'creationdate': '2008-07-11T11:25:03-07:00', 'total_pages': 16}
{'title': '', 'total_pages': 16, 'page': 10, 'moddate': '2008-07-11T11:25:03-07:00', 'creator': 'PScript5.dll Version 5.2.2', 'page_label': '11', 'source': 'MachineLearning-Lecture03.pdf', 'author': '', 'creationdate': '2008-07-11T11:25:03-07:00', 'producer': 'Acrobat Distiller 8.1.0 (Windows)'}
{'title': '', 'source': 'MachineLearning-Lecture03.pdf', 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'page': 11, 'moddate': '2008-07-11T11:25:03-07:00', 'creationdate': '2008-07-11T11:25:03-07:00', 'total_pages': 16, 'page_label': '12', 'creator': 'PScript5.dll Version 5.2.2', 'author': ''}
{'source': 'MachineLearning-Lecture03.pdf', 'creationdate': '2008-07-11T11:25:03-07:00', 'page': 6, 'author': '

## 5) Question Answering

![RAG Question Answering](RAG_Question_Answering.png)

![RAG_RetrievalQA_chain](RAG_RetrievalQA_chain.png)

#### RetrievalQA Methods
1. Stuff
2. Map Reduce
3. Refine
4. Map Rerank

![RAG_RetrievalQA_methods](RAG_RetrievalQA_methods.png)

#### Retrieval Strategies Cheatsheet

| Strategy      | How it Works | Pros | Cons | Best When | Example |
|---------------|--------------|------|------|-----------|---------|
| **Stuff**     | Put all retrieved docs directly into the prompt | Simple, fast, keeps full context | Limited by LLM context window, irrelevant info may confuse | Few and short docs | FAQ bot: “What’s your refund policy?” |
| **Map Reduce**| LLM processes each doc separately (map), then combines summaries (reduce) | Handles many docs, avoids overflow | Slower (many LLM calls), may lose detail | Summarizing large sets of long docs | Research assistant summarizing 100 news articles |
| **Refine**    | Start with one doc → draft answer, then iteratively refine with others | Produces coherent, detailed answers; balances context | Sequential (slower), depends heavily on first doc | Each doc adds incremental clarification | Contract review: “What are the late payment penalties?” |
| **Map Rerank**| LLM processes each doc separately, scores relevance, selects best one(s) | Precision-focused, filters noise | May miss synthesis across docs | Many docs but only one/few truly relevant | Fact Q&A: “Who won the Nobel Prize in Physics in 2024?” |

In [83]:
import os
from langchain_openai import OpenAI
import sys
from dotenv import load_dotenv

load_dotenv()

True

In [84]:
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
persist_directory = 'docs/chroma'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

  vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)


In [85]:
print(vectordb._collection.count())

208


In [86]:
!pip install -qU langchain-openai

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",  # if you prefer to pass api key in directly instaed of using env vars
    # base_url="...",
    # organization="...",
    # other params...
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [87]:
from langchain.chains import RetrievalQA

In [88]:
# Chain Type = Stuff (default)
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

question = "What is the content about"
result = qa_chain({"query": question})
print(result)

  result = qa_chain({"query": question})


{'query': 'What is the content about', 'result': 'The content provides information about the online resources and communication methods for a class, likely a computer science course (CS229) at Stanford University. It mentions the course homepage (http://cs229.stanford.edu) where homework assignments, solutions, and detailed lecture notes are posted. It also discusses a class newsgroup (su.class.cs229) for student discussions and forming study groups, which is not monitored by the teaching staff. For contacting the teaching staff, students are advised to use the email address cs229-qa@cs.stanford.edu. The content also encourages forming study groups to tackle difficult problem sets and enhance the learning experience.'}


In [89]:
# Chain Type = Map Reduce --> Suitable for large corpus where retriever return document bigger than context window
mr_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type = "map_reduce"
)

question = "What is the content about"
result = mr_chain({"query": question})
print(result)

{'query': 'What is the content about', 'result': 'The content is about the online resources available for a class. It includes information on the class homepage where homework assignments and solutions are posted, detailed lecture notes, a newsgroup for class discussions, and a shared email account for contacting the teaching staff. It also advises students on how to contact the teaching staff effectively and encourages the formation of study groups to tackle difficult problem sets.'}


### Prompt Templates

A prompt template is a reusable, structured instruction with placeholders that guides an LLM on how to format and answer a question using provided inputs.

#### Benefits
- **Control the Answer Style**  
  Ensure answers have a consistent tone, length, or format.

- **Guide the LLM to Use Only Retrieved Context**  
  Prevents hallucination by instructing the model to rely solely on the retrieved documents.

- **Support Special Tasks**  
  Enables tasks like summarization, comparison, or structured output.

- **Make It Domain-Specific**  
  Tailor instructions for specific industries (legal, medical, educational) for more accurate responses.

- **Improve Consistency for Evaluation**  
  Forces outputs into predictable formats (JSON, markdown, bullet points), making grading or analysis easier.

In [92]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [94]:
# Run Chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type_kwargs={"prompt":QA_CHAIN_PROMPT}
)

In [95]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

'Yes, probability is a class topic, as it is mentioned that discussion sections will cover prerequisites like probability and statistics. Thanks for asking!'

In [None]:
print(result)

In [96]:
question = "Why are those prerequisites needed?"
result = qa_chain({"query": question})
result["result"]

'The prerequisites are needed to ensure that students have the foundational knowledge required to tackle the challenging problem sets and concepts in the course. This background helps students engage more effectively with the material, participate in study groups, and apply advanced algorithms efficiently. Thanks for asking!'

#### RetrievalQA to LCEL Upgrade in Langchain
LCEL = Langchain Expression Language
- LCEL stands for LangChain Expression Language.
- It is a declarative layer (or DSL-style syntax) for orchestrating LangChain “chains” (i.e. compositions of prompts, models, transformations, etc.)
- The idea is: instead of manually wiring chain steps with imperative code, you declare what you want with composable “runnables” and use operators (like the pipe |) to define the flow.
- Under the hood, components in LCEL are built upon a Runnable interface — every step is a Runnable (or composed of Runnables) that supports invoke, batch, stream, etc.

https://python.langchain.com/docs/versions/migrating_chains/retrieval_qa/

https://lilianweng.github.io/posts/2023-06-23-agent/

## 5) Conversational Retrieval

It is a special LangChain chain designed for conversational RAG use cases.

Unlike RetrievalQA, which just answers standalone queries, ConversatinalRetrievalChain is chat aware - it takes into account the conversation history along with the retrieved documents when answering. It maintains a conversation memory.

For eg: If a user asks "Who founded it?", it knows "it" refers to the company mentioned in the last turn. 

#### Setup Embeddings

In [None]:
!pip install -qU langchain-openai
!pip install -qU python-dotenv

In [99]:
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv

In [102]:
embedding = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

#### Load Vectorstore

In [None]:
!pip install -qU chromadb

In [100]:
from langchain.vectorstores import Chroma

In [101]:
persist_directory = 'docs/chroma'

In [103]:
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

#### Run basic similarity search

In [104]:
question = "Who is the main instructor for the course?"
docs = vectordb.similarity_search(question, k=3)

In [109]:
print(docs[0])

page_content='MachineLearning-Lecture01  
Instructor (Andrew Ng): Okay. Good morning. Welcome to CS229, the machine 
learning class. So what I wanna do today is just spend a little time going over the logistics 
of the class, and then we'll start to talk a bit about machine learning.  
By way of introduction, my name's Andrew Ng and I'll be instructor for this class. And so 
I personally work in machine learning, and I've worked on it for about 15 years now, and 
I actually think that machine learning is the most exciting field of all the computer 
sciences. So I'm actually always excited about teaching this class. Sometimes I actually 
think that machine learning is not only the most exciting thing in computer science, but 
the most exciting thing in all of human endeavor, so maybe a little bias there.  
I also want to introduce the TAs, who are all graduate students doing research in or 
related to the machine learning and all aspects of machine learning. Paul Baumstarck 
works in ma

#### Setup Generator

In [110]:
!pip install -qU langchain-openai

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",  # if you prefer to pass api key in directly instaed of using env vars
    # base_url="...",
    # organization="...",
    # other params...
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### Adding Memory

Stores the entire conversation history in memory, and gives the chatbot a "short-term memory" of the conversation.

In [122]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

#### Create a Conversational Retrieval Chain

In [123]:
from langchain.chains import ConversationalRetrievalChain

retriever = vectordb.as_retriever()

qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [115]:
question = "Is probability a class topic?"
result = qa({"question": question})

In [116]:
result["answer"]

'Yes, familiarity with basic probability and statistics is assumed as a prerequisite for the class. The class expects students to know concepts such as random variables, expectation, and variance. However, some discussion sections will serve as a refresher for these topics if needed.'

In [118]:
question = "Why are those prerequisites needed?"
result = qa({"question": question})
result["answer"]

'Familiarity with basic probability and statistics is required as a prerequisite for the class because the course likely involves concepts and techniques that rely on understanding random variables, expectation, variance, and other statistical principles. These are foundational elements in many areas of study, including machine learning, where statistical methods are used to analyze data and make predictions. The course assumes that students have this background knowledge to effectively engage with the material and apply it in programming tasks, primarily in MATLAB or Octave.'

In [None]:
# Chat loop
print("Chatbot ready! Type 'exit' to stop.\n")
while True:
    question = input("You: ")
    if question.lower() in ["exit", "quit", "bye"]:
        print("Chatbot: Goodbye 👋")
        break
    result = qa({"question": question})
    print("Chatbot:", result["answer"])

Chatbot ready! Type 'exit' to stop.



You:  Hi


Chatbot: Hello! How can I assist you today?


You:  who is the instructor of this course


Chatbot: The instructor of the course is Andrew Ng.


You:  tell me something about him


Chatbot: Andrew Ng is an instructor for the CS229 machine learning class. He has worked in the field of machine learning for about 15 years and is very passionate about it, considering it the most exciting field in computer science and possibly in all of human endeavor.


You:  tell me about some of the companies he founded


Chatbot: I don't know.


You:  what topics in machine learning has he covered


Chatbot: The context provided does not specify the exact topics in machine learning that Andrew Ng has covered in his lecture. It mainly discusses the logistics of the class, introduces the teaching assistants, and mentions some student projects related to machine learning. If you are looking for specific topics, you might want to refer to the course syllabus or lecture notes from CS229.


# 🔹 LangChain Memory Types Comparison

| Memory Type | How It Works | ✅ Pros | ❌ Cons | 🔥 Best Use Case |
|-------------|--------------|---------|---------|------------------|
| **ConversationBufferMemory** | Stores the full conversation transcript (all turns). | Simple, easy to use, preserves entire chat history. | Grows too large → may hit token limits on long chats. | Small chatbots, short Q&A sessions. |
| **ConversationBufferWindowMemory** | Stores only the **last N turns**. | Controls token usage, fast. | Loses older context (may forget important details). | Customer support chat, where only recent turns matter. |
| **ConversationSummaryMemory** | Summarizes conversation into shorter text. | Efficient, compresses history, avoids token overflow. | Summaries may lose details / nuance. | Long conversations (e.g., tutoring, therapy bots). |
| **ConversationSummaryBufferMemory** | Keeps last N turns + summary of older turns. | Balance between detail + efficiency. | Slightly more complex to configure. | Multi-session chat assistants. |
| **VectorStoreRetrieverMemory** | Embeds past interactions in a vector DB and retrieves relevant history by similarity. | Semantic recall (remembers even if phrasing differs). | Needs a vector DB + embeddings, retrieval can be slower. | Knowledge-grounded agents, FAQ bots, personal assistants. |
| **ConversationKGMemory** | Extracts entities + relations into a knowledge graph. | Structured recall of facts, good for reasoning. | More complex setup, can miss nuance. | Assistants that need to track facts (e.g., “Alice works at Microsoft”). |
| **EntityMemory** | Tracks facts about specific entities (people, orgs, places). | Great for personalization, fact persistence. | Limited to entity-centric information. | Personalized assistants, role-playing bots. |
| **CombinedMemory** | Mixes multiple memory strategies. | Very flexible, covers multiple needs. | More setup, risk of redundancy. | Advanced agents needing hybrid strategies. |
| **Custom Memory** | User-defined state storage (dicts, databases, workflows). | Fully flexible, app-specific. | You have to build & maintain it. | Non-conversational agents, task workflows, profile management. |

---

# 🔹 Quick Rule of Thumb

- ✅ **Short conversations** → `ConversationBufferMemory`  
- ✅ **Medium conversations** → `ConversationBufferWindowMemory`  
- ✅ **Long conversations** → `ConversationSummaryMemory` or `ConversationSummaryBufferMemory`  
- ✅ **Semantic recall** → `VectorStoreRetrieverMemory`  
- ✅ **Fact-tracking** → `EntityMemory` or `ConversationKGMemory`  
- ✅ **Advanced assistants** → `CombinedMemory`  

---

# 🔹 LangChain Chain Types Comparison

| Chain Type | How It Works | ✅ Pros | ❌ Cons | 🔥 Best Use Case |
|------------|--------------|---------|---------|------------------|
| **LLMChain** | Prompt template → LLM → Output | Simple, flexible, building block for everything else | Limited (single-step only) | Text generation, summarization, classification, extraction |
| **SimpleSequentialChain** | Passes output of one chain as input to the next | Easy to set up multi-step flow | Only single input/output per step | Step-by-step pipelines (e.g., extract → summarize) |
| **SequentialChain** | More flexible sequential flow with multiple inputs/outputs | Can handle complex workflows | Setup more verbose than SimpleSequentialChain | Multi-step tasks with branching or multiple variables |
| **RouterChain** | Routes inputs to different chains based on conditions | Dynamic, modular | Requires good routing logic (classifier/prompt) | Multi-domain assistants (e.g., coding vs FAQ vs math) |
| **RetrievalQA** | Retrieves docs → stuffs into prompt → LLM answers | Straightforward, works well for Q&A | No chat history, single query only | One-off document Q&A, FAQ bots |
| **ConversationalRetrievalChain** | Retrieves docs + adds chat history via memory → LLM answers | Maintains context, supports multi-turn chat | More complex, higher token usage | Chatbot-style assistants over custom knowledge |
| **Summarization Chains** (`stuff`, `map_reduce`, `refine`) | Summarize across multiple docs/chunks | Handles long documents, different strategies available | May be slower/expensive (map_reduce/refine) | Summarizing books, PDFs, transcripts |
| **HyDE (Hypothetical Document Embedding) Chain** | Generates hypothetical answer → embeds → retrieves relevant docs | Improves retrieval quality in some cases | Extra LLM call (slower, costlier) | When retriever struggles with recall |
| **Transform Chains** | Pre-process or transform input before passing to LLM | Useful for data cleaning or translations | Usually part of larger pipeline, not standalone | Language translation, text cleaning, entity extraction |
| **Custom Chains** | Build your own chain by subclassing `Chain` | Fully flexible, app-specific | You need to code it yourself | Special workflows not covered by built-ins |

---

# 🔹 Quick Rule of Thumb

- ✅ **Basic prompt → output** → `LLMChain`  
- ✅ **Multi-step pipelines** → `SequentialChain` or `SimpleSequentialChain`  
- ✅ **Document Q&A (single query)** → `RetrievalQA`  
- ✅ **Chatbot over docs** → `ConversationalRetrievalChain`  
- ✅ **Summarizing large docs** → Summarization Chains (`map_reduce`, `refine`)  
- ✅ **Specialized routing** → `RouterChain`  
- ✅ **When recall is poor** → `HyDE`  
- ✅ **Custom workflows** → `Custom Chain`  

---