# RAG Demo
Using RAG to power up an LLM. We will use Langchain for our example. Langchain framework makes build LLM apps super easy.

![./flow.png](./flow.png)

## Install Packages

In [1]:
%pip install -q langchain chromadb pypdf langchain-openai \
        tiktoken langchain-openai langchain-chroma \
        sentence_transformers langchain-community \
        langchain-huggingface

Note: you may need to restart the kernel to use updated packages.


## Steps
### Step 1
Load a document and extract the contents. For our example, I added a sample PDF from my article in docs folder.

In [3]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/GenAI-Part1.pdf")
pages = loader.load()

# Look into the doc
second_page = pages[1]
print("Total pages: ", len(pages))
print("First 100 chars of 2nd page: ", second_page.page_content[:50])
print("Metadata: ", second_page.metadata)

Total pages:  4
First 100 chars of 2nd page:  dataset. Each model fits for few use cases.Autoenc
Metadata:  {'source': 'docs/GenAI-Part1.pdf', 'page': 1}


### Step 2
Now split the document contents into smaller chunks.

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter, TokenTextSplitter

# Character Splitter
c_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separator = '\n',
    length_function = len
)

c_docs = c_splitter.split_documents(pages)
print("Total documents on Charter Splitter: ", len(c_docs))

# Recursive Character Splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150, 
    separators=["\n\n", "\n", " ", ""]
)

r_docs = r_splitter.split_documents(pages)
print("Total documents on Recursive Charter Splitter: ", len(r_docs))

# Token Splitter
t_splitter = TokenTextSplitter(
    chunk_size=150,
    chunk_overlap=10
)

t_docs = t_splitter.split_documents(pages)
print("Total documents on Token Splitter: ", len(t_docs))


Total documents on Charter Splitter:  4
Total documents on Recursive Charter Splitter:  6
Total documents on Token Splitter:  7


### Step 3
Let's take our splits and embed them and then store them into a vector store. We will use [ChromaDB](https://python.langchain.com/docs/integrations/vectorstores/chroma) which is an in-memory DB.

In [5]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)

embedding = OpenAIEmbeddings()
persist_directory = 'docs/chroma/'

In [6]:
!rm -rf ./docs/chroma  # remove old database files if any
!mkdir -p ./docs/chroma  # create a directory to store the database files

Let's store the pages of our PDF into Vector Store with Embeddings using OpenAI Embeddings.

In [15]:
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings

# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(
    model_name="all-MiniLM-L6-v2")

vectordb = Chroma.from_documents(pages, embedding_function)

print("Total documents on Vector Store: ", vectordb._collection.count())

Total documents on Vector Store:  28


We will do some search and see some relevant content in this vector db.

### Step 4
Let's retrieve with different methods

In [16]:
question = "What is DDP?"

# Using Similarity Search
print("\nSimilarity Search")
docs = vectordb.similarity_search_with_score(question)
for d in docs:
    print(f"Page #{d[0].metadata['page']}  : {d[0].page_content[:100]}")

# Using MMR to diversify the results
print("\nMMR Search")
docs = vectordb.max_marginal_relevance_search(question,k=2, fetch_k=3)
for d in docs:
    print(f"Page #{d.metadata['page']}  : {d.page_content[:100]}")


Similarity Search
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati

MMR Search
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati
Page #2  : The measurement used to understand the training time is 1 “petaflop/s-day”= # floating point operati


We need to do some compression to avoid unnecessary text around the content we are looking for.

In [18]:
from langchain_openai import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")
)

compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Document 1:

Generative AI project — Part 1Muthu ArumugamThis article will help you understand how you can get involved and createproducts using Generative AI models. To get a quick intro to Generative AI,look at my previous articles — Quickies.For an AI project, these are the following steps equivalent to SDLC.
AI project lifecycleUse case discoveryYou have the option to choose from a variety of tasks from LLMs. You canchoose 1 or many for your project from below:Essay WritingSummarizationTranslation from language to languageTranslation from language to codeInformation retrievalCall external APIsModel identificationThe existing model may be sufficient or you have to pre-train with your
----------------------------------------------------------------------------------------------------
Document 2:

- Bloomberg published a model which is trained for finance-related LLM.
- Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for fina

### Step 5
Time to call ChatGPT for a response based on our retrieval. We will use Question & Answer to call LLM.

#### Using Retrieval QA Chain

In [19]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

result = qa_chain({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

  warn_deprecated(
  warn_deprecated(


Question: What is DDP?
Answer: I don't have enough information to accurately answer your question about DDP.


#### Using Prompt

In [20]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

result = qa_chain({"query": question})
print(f"Question: {question}\nAnswer: {result['result']}")

Question: What is DDP?
Answer: I'm sorry, I don't have enough information to answer that question. Thanks for asking!
