# Let's Learn RAG (Retrieval-Augmented Generation) 🚀

## Introduction
We know that **LLM (Large Language Models)** are trained on vast amounts of internet data and can perform most **NLP (Natural Language Processing)** tasks out of the box. However, in many cases, we have our **own data**, and we need LLMs to use this data to answer queries accurately.

## How Can We Provide Our Data to an LLM?
There are two primary ways to send our data to an LLM:

1. **As part of the prompt message** 📝
2. **Using tool calls** 🛠️

These methods work, but they have limitations. In dynamic applications, we often deal with **large amounts of data** that exceed an LLM's context window. Additionally, we might not always know **which specific context** to provide from a large dataset.

## The Solution: Retrieval-Augmented Generation (RAG) 🔍
To handle this challenge, we use a combination of tools to efficiently manage and retrieve relevant data:

### 1. Text Splitter ✂️
- Splits the context (aka data) into **meaningful chunks**.
- The **chunk size** depends on the **semantic meaning** of the content and the **maximum context window** of the model.

### 2. Embedding Models 🧠
- Converts text chunks into **machine-understandable numbers** (vectors).
- These vectors help in searching and retrieving relevant information.

### 3. Vector Store 🏦
- Stores the **vectors** generated by the embedding models.
- When a query is made, it searches for **similar vectors** using **similarity metrics**.
- This ensures that only **relevant chunks** are retrieved, reducing unnecessary data sent to the LLM.

## High-Level Overview 🏗️
At a high level, RAG enhances LLM capabilities by:
1. **Splitting** data into chunks.
2. **Embedding** those chunks into vector representations.
3. **Storing** them in a vector database.
4. **Retrieving** relevant chunks dynamically when querying the LLM.

This way, we **do not need to send the entire dataset** to the LLM, making the process more efficient and scalable.

---
## Let's Start Coding! 💻
Now that we understand the basics, let's dive into some hands-on coding to implement RAG effectively! 🚀



In [1]:
# First of all, let's look into data loaders. we need data, we will get this from here
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("how-mobile-phone-works.pdf") # file that i want to read data from
pages = loader.load() # loads all pages at once

In [2]:
type(pages[0]) # each page is called a document

langchain_core.documents.base.Document

In [3]:
pages[0].metadata # each document will have metdata and content

{'producer': 'PDFium',
 'creator': 'PDFium',
 'creationdate': 'D:20250213113910',
 'source': 'how-mobile-phone-works.pdf',
 'total_pages': 4,
 'page': 0,
 'page_label': '1'}

In [4]:
pages[0].page_content

'Mobile Phone. How it works?  \nA mobile phone is an electronic device used for mobile telecommunications over a cellular network of specialized base \nstations known as cell sites. A cell phone offers full Duplex Communication and transfer the link when the user moves \nfrom one cell to another. As the phone user moves from one cell area to another, the system automatically commands \nthe mobile phone and a cell site with a stronger signal, to switch on to a new frequency in order to keep the link. \nMobile phone is primarily designed for Voice communication. In addition to the standard voice function, new \ngeneration mobile phones support many additional services, and accessories, such as SMS for text messaging, email, \npacket switching for access to the Internet, gaming, Bluetooth, camera with video recorder and MMS for sending and \nreceiving photos and video, MP3 player, radio and GPS. \nSignal Frequency in Cell Phone  \nThe cellular system is the division of an area into small 

In [5]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_nomic.embeddings import NomicEmbeddings

embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5")
vector_search = InMemoryVectorStore.from_documents(pages,embeddings) # here, entire page content is sent to vector store instead of doing chunking. this is not recommended. we will fix this

In [23]:
vector_search.similarity_search("give me some precautions that i need to follow while using mobile phones?",k=2)
# we can see we got lots of context. this is bad. that's why we need chunking

[Document(id='2307f63d-6f6c-4543-a82f-a32c6e6a4f22', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250213113910', 'source': 'how-mobile-phone-works.pdf', 'total_pages': 4, 'page': 3, 'page_label': '4'}, page_content='Precautions \nMobile phone is an excellent communication device . Mobile radiation defects occur only if it is used for  prolonged \ntime . Controlled use for communication purpose is always safe. Mobile phones emitting radiation below 2 watts  is \ncompletely safe. Still, precautionary measures are always good, even though there are fewer case studies in this \nmatter. Try to consider mobile phone as a communication device  and not an entertainment device . Even if you \nare not talking, mobile phone is emitting strong signals to keep link with the base station having strongest signal. \nConsider some of the precautionary measures :  \n1. Do not use mobile phones more than 10 minutes continuously. During conversation, mobile phone will release 

Since, we have data, let's focus on **text splitting**. text splitting can be done using different strategies

1. split text by **characters** like new line (\n) or full-stop (.) etc
2. split text bt **tokens**
3. split text by **semantic meaning**

In [6]:
# Let's split by characters, metadata is still preserved after chunking

from langchain_text_splitters import CharacterTextSplitter,RecursiveCharacterTextSplitter

splitter = CharacterTextSplitter(separator=".", # splits based on a single character or single condition
                                is_separator_regex=False, # we can also use regex as splitter too
                                chunk_size=500, # max chunk size
                                 chunk_overlap=100, # a bit of overlapping content between documents for continuation context
                                 length_function=len
                                )
# while, splitting first chunk is picked and later seperator is used to seperate. that is the reason full-stop(.) still present in chunks
chunks = splitter.split_documents(pages) # chunked to documents
# chunks = splitter.split_text(pages[0].page_content) # chunked to strings

In [7]:
splitter = RecursiveCharacterTextSplitter(separators=[".","\n\n","\n"], # with recursive separators, we can mention multiple separators
                                         chunk_size=500, # chunk size
                                         chunk_overlap=100, # chunk overlap between documents for continued context
                                         is_separator_regex=False,
                                          length_function=len # length of characters in considered as chunk, based on chunk_size=500, 500 characters is a chunk
                                         )
chunks = splitter.split_documents(pages)

In [8]:
len(chunks)

21

In [9]:
# Now lets use tik-token instead of length function for chunk size
token_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    separators=[".","\n\n","\n"], # with recursive separators, we can mention multiple separators
    chunk_size=500, # chunk size
    chunk_overlap=100, # chunk overlap between documents for continued context
    is_separator_regex=False,
)

chunks = token_splitter.split_documents(pages)

In [10]:
len(chunks) # see, chunk size is reduced, because here 500 chuck size means 500 tokens not 500 characters

6

In [12]:
from langchain_experimental.text_splitter import SemanticChunker
text_splitter = SemanticChunker(NomicEmbeddings(model="nomic-embed-text-v1.5"))

In [13]:
chunks = text_splitter.split_documents(pages)
# How semantic splitter works, it will convert all content and embeddings and will find the relationships between vectors, if the relations
# exceed a particular threshold, then text splitter will split it. 
# that difference can be checked using percentile, standard deviation, interquartile and gradient. test and find out which one to use

In [16]:
embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5")
vector_store = InMemoryVectorStore.from_documents(chunks,embeddings)

In [17]:
vector_store.similarity_search("give me some precautions that i need to follow while using mobile phones?",k=2) # this is right on point

[Document(id='20ac1703-174d-476e-9e12-5235473cb22b', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250213113910', 'source': 'how-mobile-phone-works.pdf', 'total_pages': 4, 'page': 3, 'page_label': '4'}, page_content='Precautions \nMobile phone is an excellent communication device . Mobile radiation defects occur only if it is used for  prolonged \ntime . Controlled use for communication purpose is always safe. Mobile phones emitting radiation below 2 watts  is \ncompletely safe. Still, precautionary measures are always good, even though there are fewer case studies in this \nmatter. Try to consider mobile phone as a communication device  and not an entertainment device . Even if you \nare not talking, mobile phone is emitting strong signals to keep link with the base station having strongest signal. Consider some of the precautionary measures :  \n1. Do not use mobile phones more than 10 minutes continuously. During conversation, mobile phone will release \n

In [18]:
embed_query = embeddings.embed_query("give me some precautions that i need to follow while using mobile phones?")

In [22]:
vector_store.similarity_search_by_vector(embed_query,k=2) # we can also search by embedding vector

[Document(id='20ac1703-174d-476e-9e12-5235473cb22b', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250213113910', 'source': 'how-mobile-phone-works.pdf', 'total_pages': 4, 'page': 3, 'page_label': '4'}, page_content='Precautions \nMobile phone is an excellent communication device . Mobile radiation defects occur only if it is used for  prolonged \ntime . Controlled use for communication purpose is always safe. Mobile phones emitting radiation below 2 watts  is \ncompletely safe. Still, precautionary measures are always good, even though there are fewer case studies in this \nmatter. Try to consider mobile phone as a communication device  and not an entertainment device . Even if you \nare not talking, mobile phone is emitting strong signals to keep link with the base station having strongest signal. Consider some of the precautionary measures :  \n1. Do not use mobile phones more than 10 minutes continuously. During conversation, mobile phone will release \n

We want to use **LCEL** right ? **vector_store** is not runnable. to create a runnable we need to use **retriever**

In [30]:
retriever = vector_store.as_retriever(search_type="similarity", # possible_values: MMR and similarity_score_threshold
                             search_kwargs={"k":2})

In [31]:
retriever.invoke("give me some precautions that i need to follow while using mobile phones?")

[Document(id='20ac1703-174d-476e-9e12-5235473cb22b', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250213113910', 'source': 'how-mobile-phone-works.pdf', 'total_pages': 4, 'page': 3, 'page_label': '4'}, page_content='Precautions \nMobile phone is an excellent communication device . Mobile radiation defects occur only if it is used for  prolonged \ntime . Controlled use for communication purpose is always safe. Mobile phones emitting radiation below 2 watts  is \ncompletely safe. Still, precautionary measures are always good, even though there are fewer case studies in this \nmatter. Try to consider mobile phone as a communication device  and not an entertainment device . Even if you \nare not talking, mobile phone is emitting strong signals to keep link with the base station having strongest signal. Consider some of the precautionary measures :  \n1. Do not use mobile phones more than 10 minutes continuously. During conversation, mobile phone will release \n