# RAG Demo
Using RAG to power up an LLM. We will use Langchain for our example. Langchain framework makes build LLM apps super easy.

![./flow.png](./flow.png)

## ✅ Updated for LangChain 0.3.x (FAISS Version)

This notebook demonstrates RAG with FAISS vector store using the latest LangChain APIs:

- **Document Loaders**: `langchain-community.document_loaders`
- **Text Splitters**: `langchain-text-splitters` package
- **LLMs**: `langchain-openai` for OpenAI models
- **Vector Store**: FAISS with updated method calls
- **Chains**: Modern LCEL approach for better streaming and composability

Both legacy and modern approaches are shown for comparison.

## Install Packages

In [None]:
! pip3 install -q langchain pypdf openai faiss-cpu \
            langchain-openai chromadb langchain-community \
            langchain-text-splitters langchain-core

## Steps
### Step 1: Extract
Load a document and extract the contents. For our example, I added a sample PDF from my article in docs folder.

In [None]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/GenAI-Part1.pdf")
pages = loader.load()

# Look into the doc
second_page = pages[1]
print("Total pages: ", len(pages))
print("First 100 chars of 2nd page: ", second_page.page_content[:50])
print("Metadata: ", second_page.metadata)

### Step 2: Split
Now split the document contents into smaller chunks.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.text_splitter import TokenTextSplitter

# Character Splitter
c_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separator = '\n',
    length_function = len
)

c_docs = c_splitter.split_documents(pages)
print("Total documents on Charter Splitter: ", len(c_docs))

# Recursive Character Splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150, 
    separators=["\n\n", "\n", " ", ""]
)

r_docs = r_splitter.split_documents(pages)
print("Total documents on Recursive Charter Splitter: ", len(r_docs))

# Token Splitter
t_splitter = TokenTextSplitter(
    chunk_size=150,
    chunk_overlap=10
)

t_docs = t_splitter.split_documents(pages)
print("Total documents on Token Splitter: ", len(t_docs))

### Step 3: Vector Store
Let's take our splits and embed them and then store them into a vector store. We will use [FAISS](https://python.langchain.com/docs/integrations/vectorstores/faiss) which is an in-memory DB.

In [None]:
import faiss
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

Let's store the pages of our PDF into Vector Store with Embeddings using OpenAI Embeddings.

In [None]:
vectordb = FAISS.from_documents(c_docs, OpenAIEmbeddings())
print("Total documents on Vector Store: ", len(vectordb.docstore._dict))

### Step 4: Retrieve
Let's retrieve with different methods

In [None]:
question = "What is DDP?"

# Using Similarity Search
print("\nSimilarity Search")
docs = vectordb.similarity_search(question, k=2)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")

# Using MMR to diversify the results
print("\nMMR Search")
docs = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")

We need to do some compression to avoid unnecessary text around the content we are looking for.

In [None]:
from langchain_openai import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")
)

compressed_docs = compression_retriever.invoke(question)
pretty_print_docs(compressed_docs)

### Step 5: Generate
Time to call ChatGPT for a response based on our retrieval. We will use Question & Answer to call LLM.

#### Using Retrieval QA Chain

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

result = qa_chain.invoke({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

#### Using Prompt

In [None]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain.invoke({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

#### Modern LCEL Approach (Recommended)
Using the new LangChain Expression Language (LCEL) for better composability and streaming support.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Modern LCEL approach with streaming support
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template("""Answer the question based only on the following context:
{context}

Question: {question}

Answer in a concise manner. Always say "thanks for asking!" at the end of the answer.""")

# Create the chain using LCEL
rag_chain = (
    {"context": vectordb.as_retriever() | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Stream the response
print(f"Question: {question}\nAnswer: ", end="")
for chunk in rag_chain.stream(question):
    print(chunk, end="")
print()  # New line at the end