# RAG Demo
Using RAG to power up an LLM. We will use Langchain for our example. Langchain framework makes build LLM apps super easy.

![./flow.png](./flow.png)

## Install Packages

In [30]:
! pip install langchain pypdf
! pip install pinecone-client

850.70s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
857.34s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


## Steps
### Step 1: Extract
Load a document and extract the contents. For our example, I added a sample PDF from my article in docs folder.

In [36]:
from langchain.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    PyPDFLoader("docs/GenAI-Part1.pdf"),
    PyPDFLoader("docs/GenAI-Part2.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

# Look into the doc
print("Total docs: ", len(docs))
print("First doc: ", docs[0].page_content[0:100])
print("Last doc: ", docs[-1].page_content[0:100])

Total docs:  19
First doc:  Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in
Last doc:  The model couldn’t come up with 6-word summary for the same sample.The sentence prematurely ended. S


### Step 2: Split
Now split the document contents into smaller chunks.

In [38]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter, TokenTextSplitter

# Character Splitter
c_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separator = '\n',
    length_function = len
)

c_docs = c_splitter.split_documents(pages)
print("Total documents on Charter Splitter: ", len(c_docs))

# Recursive Character Splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150, 
    separators=["\n\n", "\n", " ", ""]
)

r_docs = r_splitter.split_documents(pages)
print("Total documents on Recursive Charter Splitter: ", len(r_docs))

# Token Splitter
t_splitter = TokenTextSplitter(
    chunk_size=150,
    chunk_overlap=10
)

t_docs = t_splitter.split_documents(pages)
print("Total documents on Token Splitter: ", len(t_docs))


Total documents on Charter Splitter:  4
Total documents on Recursive Charter Splitter:  6
Total documents on Token Splitter:  8


### Step 3: Vector Store
Let's take our splits and embed them and then store them into a vector store. We will use [Pinecone](https://python.langchain.com/docs/integrations/vectorstores/pinecone) which is hosted.

In [39]:
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone
import os

embedding = OpenAIEmbeddings()

# initialize pinecone
pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"),  # find at app.pinecone.io
    environment=os.getenv("PINECONE_ENV"),  # next to api key in console
)

index_name = "demo1"
# First, check if our index already exists. If it doesn't, we create it
if index_name not in pinecone.list_indexes():
    # we create a new index
    pinecone.create_index(name=index_name, metric="cosine", dimension=1536)


Let's store the pages of our PDF into Vector Store with Embeddings using OpenAI Embeddings.

In [40]:
# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`
vectordb = Pinecone.from_documents(c_docs, embedding, index_name=index_name)

# print("Total documents on Vector Store: ", len(vectordb.documents))

### Step 4: Retrieve
Let's retrieve with different methods

In [49]:
question = "Who is the author?"

# Using Similarity Search
print("\nSimilarity Search")
docs = vectordb.similarity_search(question)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")

# Using MMR to diversify the results
print("\nMMR Search")
docs = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")


Similarity Search
Page #0.0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in
Page #0.0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in
Page #0.0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in
Page #0.0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in

MMR Search
Page #0.0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in
Page #0.0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in


We need to do some compression to avoid unnecessary text around the content we are looking for.

In [50]:
from langchain.llms import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")
)

compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)






### Step 5: Generate
Time to call ChatGPT for a response based on our retrieval. We will use Question & Answer to call LLM.

#### Using Retrieval QA Chain

In [51]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

result = qa_chain({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: Who is the author?
Answer: The author of the article is Muthu Arumugam.


#### Using Prompt

In [52]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: Who is the author?
Answer: The author is Muthu Arumugam. Thanks for asking!
