# A Gentle Introduction to RAG Applications

This notebook creates a simple RAG (Retrieval-Augmented Generation) system to answer questions from a PDF document using an open-source model.

In [1]:
PDF_FILE = "paul.pdf"

# We'll be using Llama 3.1 8B for this example.
# MODEL = "llama3.1"

# We'll be using Gemma 2B for this example.
# MODEL = "gemma:2b"

# We'll be using tinyllama for this example.
MODEL = "tinyllama"

## Loading the PDF document

Let's start by loading the PDF document and breaking it down into separate pages.

<img src='images/documents.png' width="1000">

In [2]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(PDF_FILE)
pages = loader.load()

print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[1].page_content)}")
print("Content of a page:", pages[1].page_content[:50])

Number of pages: 9
Length of a page: 3272
Content of a page: 10% a week. And while 110 may not seem much better


## Splitting the pages in chunks

Pages are too long, so let's split pages into different chunks.

<img src='images/splitter.png' width="1000">


In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)

chunks = splitter.split_documents(pages)
print(f"Number of chunks: {len(chunks)}")
print(f"Length of a chunk: {len(chunks[1].page_content)}")
print("Content of a chunk:", chunks[1].page_content[:50])


Number of chunks: 17
Length of a chunk: 840
Content of a chunk: There are two reasons founders resist going out an


## Storing the chunks in a vector store

We can now generate embeddings for every chunk and store them in a vector store.

<img src='images/vectorstore.png' width="1000">


In [4]:
from langchain_community.vectorstores import FAISS
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

In [5]:
embeddings = OllamaEmbeddings(model=MODEL)
# vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore = Chroma(
    collection_name="test_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

## Setting up a retriever

We can use a retriever to find chunks in the vector store that are similar to a supplied question.

<img src='images/retriever.png' width="1000">



In [6]:
retriever = vectorstore.as_retriever()
retriever.invoke("What can you get away with when you only have a small number of users?")

[]

## Configuring the model

We'll be using Ollama to load the local model in memory. After creating the model, we can invoke it with a question to get the response back.

<img src='images/model.png' width="1000">

In [7]:
from langchain_ollama import ChatOllama

model = ChatOllama(model=MODEL, temperature=0)
model.invoke("Who is the president of the United States?")

AIMessage(content='The current President of the United States is Joe Biden, who was sworn in on January 20, 2021.', additional_kwargs={}, response_metadata={'model': 'tinyllama', 'created_at': '2024-12-26T08:52:30.322527447Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1233600386, 'load_duration': 20820230, 'prompt_eval_count': 43, 'prompt_eval_duration': 389000000, 'eval_count': 30, 'eval_duration': 815000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-119ae966-9c90-4ee7-b985-acd171138826-0', usage_metadata={'input_tokens': 43, 'output_tokens': 30, 'total_tokens': 73})

## Parsing the model's response

The response from the model is an `AIMessage` instance containing the answer. We can extract the text answer by using the appropriate output parser. We can connect the model and the parser using a chain.

<img src='images/parser.png' width="1000">


In [8]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
print(chain.invoke("Who is the president of the United States?"))

The current President of the United States is Joe Biden, who was sworn in on January 20, 2021.


## Setting up a prompt

In addition to the question we want to ask, we also want to provide the model with the context from the PDF file. We can use a prompt template to define and reuse the prompt we'll use with the model.


<img src='images/prompt.png' width="1000">

In [9]:
from langchain.prompts import PromptTemplate

template = """
You are an assistant that provides answers to questions based on
a given context. 

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))


You are an assistant that provides answers to questions based on
a given context. 

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: Here is some context

Question: Here is a question



## Adding the prompt to the chain

We can now chain the prompt with the model and the parser.

<img src='images/chain1.png' width="1000">

In [10]:
chain = prompt | model | parser

chain.invoke({
    "context": "Anna's sister is Susan", 
    "question": "Who is Susan's sister?"
})


"Answer based on context: Yes, I can provide you with information about Susan, Anna's sister.\n\nReply: I don't know who Susan is. Please provide me with more details or context to help me answer your question."

## Adding the retriever to the chain

Finally, we can connect the retriever to the chain to get the context from the vector store.

<img src='images/chain2.png' width="1000">

In [11]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

## Using the chain to answer questions

Finally, we can use the chain to ask questions that will be answered using the PDF document.

In [12]:
questions = [
    "What can you get away with when you only have a small number of users?",
    "What's the most common unscalable thing founders have to do at the start?",
    "What's one of the biggest things inexperienced founders and investors get wrong about startups?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print("*************************\n")

Question: What can you get away with when you only have a small number of users?
Answer: Answer: When you only have a small number of users, it's okay to offer limited features or services as long as they are relevant and useful to your target audience. This approach allows you to build trust and loyalty with your customers while also providing value for their money.
*************************

Question: What's the most common unscalable thing founders have to do at the start?
Answer: Answer: The given context is a hypothetical question that asks what the most common unscalable thing founders have to do at the start. The answer provided is "I don't know" as there is no information available in the given context to provide an answer.
*************************

Question: What's one of the biggest things inexperienced founders and investors get wrong about startups?
Answer: Answer: The context provided is "[]". Based on this, the question asked is "What's one of the biggest things inexpens