# RAG Demo
Using RAG to power up an LLM. We will use Langchain for our example. Langchain framework makes build LLM apps super easy.

![./flow.png](./flow.png)

## ✅ Updated for LangChain 0.3.x

This notebook has been migrated to use the latest LangChain APIs:

- **Document Loaders**: Now using `langchain-community`
- **Text Splitters**: Updated to `langchain-text-splitters` 
- **LLMs**: Migrated to `langchain-openai`
- **Retrievers**: Updated to new API methods
- **Chains**: Added modern LCEL approach alongside legacy chains
- **Method calls**: Updated from `()` to `.invoke()` for consistency

The legacy approach is maintained for educational purposes, with modern LCEL examples added for production use.

## Install Packages

In [1]:
%pip install -q chromadb pypdf tqdm tiktoken \
        langchain langchain-openai langchain-chroma langchain-community \
        langchain-huggingface langchain-text-splitters langchain-core \
        sentence_transformers
        # Additional packages for modern features: sentence_transformers

Note: you may need to restart the kernel to use updated packages.


## Steps
### Step 1
Load a document and extract the contents. For our example, I added a sample PDF from my article in docs folder.

In [2]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/GenAI-Part1.pdf")
pages = loader.load()

# Look into the doc
second_page = pages[1]
print("Total pages: ", len(pages))
print("First 100 chars of 2nd page: ", second_page.page_content[:50])
print("Metadata: ", second_page.metadata)

  from .autonotebook import tqdm as notebook_tqdm


Total pages:  4
First 100 chars of 2nd page:  dataset. Each model fits for few use cases.
Autoen
Metadata:  {'producer': 'macOS Version 14.2 (Build 23C5055b) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20231201164323Z00'00'", 'title': 'Generative AI project — Part 1', 'author': 'Muthu Arumugam', 'subject': 'Generative AI project', 'moddate': "D:20231201164323Z00'00'", 'keywords': 'AI, LLM', 'aapl:keywords': "['AI', 'LLM']", 'source': 'docs/GenAI-Part1.pdf', 'total_pages': 4, 'page': 1, 'page_label': '2'}


### Step 2
Now split the document contents into smaller chunks.

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.text_splitter import TokenTextSplitter

# Character Splitter
c_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separator = '\n',
    length_function = len
)

c_docs = c_splitter.split_documents(pages)
print("Total documents on Charter Splitter: ", len(c_docs))

# Recursive Character Splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150, 
    separators=["\n\n", "\n", " ", ""]
)

r_docs = r_splitter.split_documents(pages)
print("Total documents on Recursive Charter Splitter: ", len(r_docs))

# Token Splitter
t_splitter = TokenTextSplitter(
    chunk_size=150,
    chunk_overlap=10
)

t_docs = t_splitter.split_documents(pages)
print("Total documents on Token Splitter: ", len(t_docs))

Total documents on Charter Splitter:  6
Total documents on Recursive Charter Splitter:  6
Total documents on Token Splitter:  8


### Step 3
Let's take our splits and embed them and then store them into a vector store. We will use [ChromaDB](https://python.langchain.com/docs/integrations/vectorstores/chroma) which is an in-memory DB.

In [4]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)

embedding = OpenAIEmbeddings()
persist_directory = 'docs/chroma/'

In [5]:
!rm -rf ./docs/chroma  # remove old database files if any
!mkdir -p ./docs/chroma  # create a directory to store the database files

Let's store the pages of our PDF into Vector Store with Embeddings using OpenAI Embeddings.

In [6]:
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings

# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

vectordb = Chroma.from_documents(pages, embedding_function)

print("Total documents on Vector Store: ", vectordb._collection.count())

  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


Total documents on Vector Store:  4


We will do some search and see some relevant content in this vector db.

### Step 4
Let's retrieve with different methods

In [7]:
question = "What is DDP?"

# Using Similarity Search
print("\nSimilarity Search")
docs = vectordb.similarity_search_with_score(question)
for d in docs:
    print(f"Page #{d[0].metadata['page']}  : {d[0].page_content[:100]}")

# Using MMR to diversify the results
print("\nMMR Search")
docs = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")


Similarity Search
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”
= # floating point operat
Page #0  : Generative AI project — Part 1Muthu Arumugam
This article will help you understand how you can get i
Page #3  : You can elect to choose a smaller model and then can train it for a specific
field of yours. For exa
Page #1  : dataset. Each model fits for few use cases.
Autoencoding models — BERT/ROBERTA — Sentiment analysis,

MMR Search
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”
= # floating point operat
Page #0  : Generative AI project — Part 1Muthu Arumugam
This article will help you understand how you can get i


We need to do some compression to avoid unnecessary text around the content we are looking for.

In [8]:
from langchain_openai import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")
)

compressed_docs = compression_retriever.invoke(question)
pretty_print_docs(compressed_docs)

Document 1:

There is a paper called “Chinchilla paper” that goes in deep to train LLMs
optimally. Also, you have to consider the size of the training data model
which is the ideal size of ~20x. If the model uses 70B parameters, you need
to feed ~1.4T tokens of a dataset.
----------------------------------------------------------------------------------------------------
Document 2:

Generative AI project — Part 1Muthu Arumugam
This article will help you understand how you can get involved and create
products using Generative AI models. To get a quick intro to Generative AI,
look at my previous articles — Quickies.
For an AI project, these are the following steps equivalent to SDLC.
AI project lifecycle
Use case discovery
You have the option to choose from a variety of tasks from LLMs. You can
choose 1 or many for your project from below:
Essay Writing
Summarization
Translation from language to language
Translation from language to code
Information retrieval
Call external APIs
Model id

### Step 5
Time to call ChatGPT for a response based on our retrieval. We will use Question & Answer to call LLM.

#### Using Retrieval QA Chain

In [9]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

result = qa_chain.invoke({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: What is DDP?
Answer: DDP stands for Distributed Data Parallel. It is a strategy used in training deep learning models that involves distributing the data across multiple GPUs to parallelize the training process. This helps optimize the training process and reduce the time required to train large models.


#### Using Prompt

In [10]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain.invoke({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: What is DDP?
Answer: DDP stands for Distributed Data Parallel, a strategy for splitting tasks into multiple GPUs to optimize training models. Thanks for asking!


#### Modern LCEL Approach (Recommended)
Using the new LangChain Expression Language (LCEL) for better composability and streaming support.

In [11]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Modern LCEL approach with streaming support
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template("""Answer the question based only on the following context:
{context}

Question: {question}

Answer in a concise manner. Always say "thanks for asking!" at the end of the answer.""")

# Create the chain using LCEL
rag_chain = (
    {"context": vectordb.as_retriever() | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Stream the response
print(f"Question: {question}\nAnswer: ", end="")
for chunk in rag_chain.stream(question):
    print(chunk, end="")
print()  # New line at the end

Question: What is DDP?
Answer: DDP stands for Distributed Data Parallel. Thanks for asking!
