# RAG Demo
Using RAG to power up an LLM. We will use Langchain for our example. Langchain framework makes build LLM apps super easy.

![./flow.png](./flow.png)

## Install Packages

In [12]:
! pip3 install -q langchain chromadb pypdf openai

## Steps
### Step 1
Load a document and extract the contents. For our example, I added a sample PDF from my article in docs folder.

In [13]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/GenAI-Part1.pdf")
pages = loader.load()

# Look into the doc
second_page = pages[1]
print("Total pages: ", len(pages))
print("First 100 chars of 2nd page: ", second_page.page_content[:50])
print("Metadata: ", second_page.metadata)

Total pages:  4
First 100 chars of 2nd page:  dataset. Each model fits for few use cases.Autoenc
Metadata:  {'source': 'docs/GenAI-Part1.pdf', 'page': 1}


### Step 2
Now split the document contents into smaller chunks.

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter, TokenTextSplitter

# Character Splitter
c_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separator = '\n',
    length_function = len
)

c_docs = c_splitter.split_documents(pages)
print("Total documents on Charter Splitter: ", len(c_docs))

# Recursive Character Splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150, 
    separators=["\n\n", "\n", " ", ""]
)

r_docs = r_splitter.split_documents(pages)
print("Total documents on Recursive Charter Splitter: ", len(r_docs))

# Token Splitter
t_splitter = TokenTextSplitter(
    chunk_size=150,
    chunk_overlap=10
)

t_docs = t_splitter.split_documents(pages)
print("Total documents on Token Splitter: ", len(t_docs))


Total documents on Charter Splitter:  4
Total documents on Recursive Charter Splitter:  6
Total documents on Token Splitter:  8


### Step 3
Let's take our splits and embed them and then store them into a vector store. We will use [ChromaDB](https://python.langchain.com/docs/integrations/vectorstores/chroma) which is an in-memory DB.

In [15]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()
persist_directory = 'docs/chroma/'

In [16]:
!rm -rf ./docs/chroma  # remove old database files if any

Let's store the pages of our PDF into Vector Store with Embeddings using OpenAI Embeddings.

In [17]:
vectordb = Chroma.from_documents(
    documents=pages,
    embedding=embedding,
    persist_directory=persist_directory
)

print("Total documents on Vector Store: ", vectordb._collection.count())

Total documents on Vector Store:  4


We will do some search and see some relevant content in this vector db.

In [18]:
# Save the DB
vectordb.persist()

### Step 4
Let's retrieve with different methods

In [19]:
question = "What is DDP?"

# Using Similarity Search
print("\nSimilarity Search")
docs = vectordb.similarity_search(question, top_k=2)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")

# Using MMR to diversify the results
print("\nMMR Search")
docs = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")


Similarity Search
Page #1  : dataset. Each model fits for few use cases.Autoencoding models — BERT/ROBERTA — Sentiment analysis, 
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati
Page #3  : You can elect to choose a smaller model and then can train it for a specificfield of yours. For exam
Page #0  : Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get in

MMR Search
Page #1  : dataset. Each model fits for few use cases.Autoencoding models — BERT/ROBERTA — Sentiment analysis, 
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati


We need to do some compression to avoid unnecessary text around the content we are looking for.

In [20]:
from langchain.llms import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")
)

compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Number of requested results 20 is greater than number of elements in index 4, updating n_results = 4


Document 1:

DDP — Distributed Data Parallel


### Step 5
Time to call ChatGPT for a response based on our retrieval. We will use Question & Answer to call LLM.

#### Using Retrieval QA Chain

In [21]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

result = qa_chain({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: What is DDP?
Answer: DDP stands for Distributed Data Parallel. It is a strategy used to train deep learning models in a distributed manner across multiple GPUs or machines. DDP divides the training data into smaller batches and distributes them across the available resources. Each resource (GPU or machine) then independently computes the gradients for its batch and communicates them with the other resources. This allows for parallel processing and faster training of the model. DDP is commonly used in large-scale deep learning projects to optimize the use of computing resources.


#### Using Prompt

In [22]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: What is DDP?
Answer: DDP stands for Distributed Data Parallel. It is a strategy for training deep learning models that involves distributing the data across multiple GPUs and parallelizing the computations to speed up the training process. Thanks for asking!
