# RAG Demo
Using RAG to power up an LLM. We will use Langchain for our example. Langchain framework makes build LLM apps super easy.

![./flow.png](./flow.png)

## ✅ Updated for LangChain 0.3.x (Pinecone Version)

This notebook demonstrates RAG with Pinecone hosted vector store using latest APIs:

- **Document Loaders**: `langchain-community` package
- **Text Splitters**: `langchain-text-splitters` for better modularity
- **LLMs**: `langchain-openai` for all OpenAI integrations
- **Vector Store**: Pinecone with `langchain-pinecone` integration
- **Chains**: LCEL approach for production-ready streaming
- **Method Calls**: Updated to `.invoke()` pattern

Includes both traditional and modern implementation patterns.

## Install Packages

In [None]:
! pip3 install -q langchain chromadb pypdf openai faiss-cpu \
        langchain-openai langchain_pinecone \
        pinecone-client ipywidgets langchain-community \
        langchain-text-splitters langchain-core

## Steps
### Step 1: Extract
Load a document and extract the contents. For our example, I added a sample PDF from my article in docs folder.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    PyPDFLoader("docs/GenAI-Part1.pdf"),
    PyPDFLoader("docs/GenAI-Part2.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

# Look into the doc
print("Total docs: ", len(docs))
print("First doc: ", docs[0].page_content[0:100])
print("Last doc: ", docs[-1].page_content[0:100])

Ignoring wrong pointing object 7 0 (offset 0)


Total docs:  19
First doc:  Generative AI project — Part 1Muthu Arumugam
This article will help you understand how you can get i
Last doc:  The model couldn’t come up with 6-word summary for the same sample.
The sentence prematurely ended. 


### Step 2: Split
Now split the document contents into smaller chunks.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.text_splitter import TokenTextSplitter

# Character Splitter
c_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separator = '\n',
    length_function = len
)

c_docs = c_splitter.split_documents(docs)
print("Total documents on Charter Splitter: ", len(c_docs))

# Recursive Character Splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150, 
    separators=["\n\n", "\n", " ", ""]
)

r_docs = r_splitter.split_documents(docs)
print("Total documents on Recursive Charter Splitter: ", len(r_docs))

# Token Splitter
t_splitter = TokenTextSplitter(
    chunk_size=150,
    chunk_overlap=10
)

t_docs = t_splitter.split_documents(docs)
print("Total documents on Token Splitter: ", len(t_docs))

Total documents on Charter Splitter:  25
Total documents on Recursive Charter Splitter:  25
Total documents on Token Splitter:  44


### Step 3: Vector Store
Let's take our splits and embed them and then store them into a vector store. We will use [Pinecone](https://python.langchain.com/docs/integrations/vectorstores/pinecone) which is hosted.

In [18]:
import os
from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

# initialize pinecone
pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

index_name = "demo1"
# First, check if our index already exists. If it doesn't, we create it
if index_name not in pc.list_indexes():
    # we create a new index
    pc.create_index(
        name=index_name, 
        metric="cosine", 
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
        dimension=1536)


Let's store the pages of our PDF into Vector Store with Embeddings using OpenAI Embeddings.

In [24]:
# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`
index = pc.Index(index_name)
vector_store = PineconeVectorStore(index=index, embedding=OpenAIEmbeddings())
vector_store.add_documents(documents=c_docs)

# print("Total documents on Vector Store: ", len(vectordb.documents))

['3046ba17-5bfd-4504-971e-4f56e1829d46',
 'fcd10af2-a1fe-4130-841b-e37ccfbec1a6',
 'b9ce9fdb-b1f6-4129-949b-736686a55978',
 'ecca860f-2720-4368-88bc-92b8a93efeda',
 'adb03900-b106-4fda-b9ac-f6529624df9a',
 'dbf71ab2-1880-41e3-ac75-76a9643a3db4',
 'bd1f30a7-7f0b-4339-8950-20fc35b3d95c',
 '6709a07e-cd13-473c-85a7-92cc6a4767c0',
 '2e653b3a-8d43-451c-ae59-ad111cbfd39e',
 '3bcc35e0-d664-4df7-aea9-ef53164dbb74',
 '8bd6136c-b754-4d33-b3b1-b8ad56f61983',
 'f4f992e3-5611-4bac-a3ed-b411cd191a0f',
 '7eeabacf-882b-400d-aff8-8382b122e607',
 '412a50c9-d4a9-48d2-a716-2b9e37bc3a5c',
 'edc96b70-744d-4685-ba56-d796227b341c',
 '39c6ffcd-9332-4c20-970e-0e2350e56870',
 'e8f627de-8c31-4254-924f-3a39a3c24a30',
 'e56008cf-b027-4e7e-8f27-54fcb28026e9',
 'e4f6e457-1056-4e80-a79f-cf93abcdaf30',
 '81a6ad5e-1b32-4a4c-85a2-25253a09c544',
 '35a66862-a013-41ee-bdbc-ea812789e444',
 'f78b9cdb-6840-4a99-a226-d598bea1f71d',
 '62e248c1-cfd6-4388-9e76-24be0a31a4d3',
 'c289207b-0a08-4cf3-b18e-08e701cc8f54',
 'be2e49e6-21fc-

### Step 4: Retrieve
Let's retrieve with different methods

In [27]:
question = "Who is the author?"

# Using Similarity Search
print("\nSimilarity Search")
docs = vector_store.similarity_search(question)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")

# Using MMR to diversify the results
print("\nMMR Search")
docs = vector_store.max_marginal_relevance_search(question,k=2, fetch_k=3)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")


Similarity Search
Page #3.0  : Disclaimer: This is not generated by an AI bot. Also, a lot of these were
learned through the DeepLe
Page #14.0  : The model couldn’t come up with 6-word summary for the same sample.
The sentence prematurely ended. 
Page #0.0  : Generative AI project — Part 1Muthu Arumugam
This article will help you understand how you can get i
Page #0.0  : Generative AI Project —  Part 2Muthu Arumugam
To understand how AI projects work, see Part 1.
This p

MMR Search
Page #3.0  : Disclaimer: This is not generated by an AI bot. Also, a lot of these were
learned through the DeepLe
Page #0.0  : Generative AI project — Part 1Muthu Arumugam
This article will help you understand how you can get i


We need to do some compression to avoid unnecessary text around the content we are looking for.

In [None]:
from langchain_openai import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vector_store.as_retriever(search_type="mmr")
)

compressed_docs = compression_retriever.invoke(question)
pretty_print_docs(compressed_docs)

  compressed_docs = compression_retriever.get_relevant_documents(question)


Document 1:

There is a paper called “Chinchilla paper” that goes in deep to train LLMs optimally.
----------------------------------------------------------------------------------------------------
Document 2:

Muthu Arumugam


### Step 5: Generate
Time to call ChatGPT for a response based on our retrieval. We will use Question & Answer to call LLM.

#### Using Retrieval QA Chain

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vector_store.as_retriever()
)

result = qa_chain.invoke({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

  result = qa_chain({"query": question})


Question: Who is the author?
Answer: The author of the articles on Generative AI projects is Muthu Arumugam.


#### Using Prompt

In [None]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain.invoke({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: Who is the author?
Answer: The author of the Generative AI project articles is Muthu Arumugam. Thanks for asking!


#### Modern LCEL Approach (Recommended)
Using the new LangChain Expression Language (LCEL) for better composability and streaming support with Pinecone.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Modern LCEL approach with streaming support
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template("""Answer the question based only on the following context:
{context}

Question: {question}

Answer in a concise manner. Always say "thanks for asking!" at the end of the answer.""")

# Create the chain using LCEL with Pinecone vector store
rag_chain = (
    {"context": vector_store.as_retriever() | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Stream the response
print(f"Question: {question}\nAnswer: ", end="")
for chunk in rag_chain.stream(question):
    print(chunk, end="")
print()  # New line at the end