#üöÄ Unlocking the Power of RAG: A Step-by-Step PDF Processing Pipeline

Retrieval-Augmented Generation (RAG) is a game-changer for LLM applications, enhancing responses with external knowledge.

This is just the beginning‚Äî"Sky is the limit" in RAG, and this introduction sets the foundation for even more advanced implementations!

#üåü RAG Pipeline for PDF Processing :

1Ô∏è‚É£ Load the PDF üìú
Utilize PyPDFLoader to efficiently extract text from a PDF file.
2Ô∏è‚É£ Chunking the Pages 
Split the text into manageable chunks to enhance retrieval efficiency.
3Ô∏è‚É£ Store Vectors in FAISS üèóÔ∏è
Convert text chunks into embeddings and index them using FAISS for fast and scalable retrieval.
4Ô∏è‚É£ Retrieve Relevant Vectors üîç
Query the FAISS index to fetch the most relevant document embeddings for the given input.
5Ô∏è‚É£ Use Groq Model (LLM) ü§ñ
Pass the retrieved chunks to a Large Language Model (LLM) for context-aware generation.
6Ô∏è‚É£ Parse Output with Runnable & Output Parser üõ†Ô∏è
Refine and structure the model-generated response using Runnable and OutputParser.
7Ô∏è‚É£ Build the RAG Chain üîó
Integrate the retrieval and generation steps into a cohesive RAG Chain for seamless interaction.
8Ô∏è‚É£ Ask Anything from the PDF! üéØ
Now, the LLM can answer questions based on the loaded PDF, making complex document analysis effortless.
üöÄ This is just the beginning! RAG is reshaping AI workflows, enabling smarter, more context-aware systems.

What are your thoughts on RAG? 


In [48]:
# Importing the necessary library 
import os 
from dotenv import load_dotenv
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore

In [None]:
# Invking the hugiing face token from the .env file 
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

# Load environment variables from .env file

In [50]:
# Importing the Huggging Face library and initializing the embedding model
from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Data Ingestion 

In [51]:
# Load the pdf for which you want to work with 
from langchain_community.document_loaders import PyPDFLoader

In [52]:
# load the pdf
FILE_PATH=r"C:\Users\rohit\Desktop\Rohit\2025\Learning\KrishNaik\Agentic AI 2.0\Code\Rohit Code\Data\llama.pdf"

In [53]:
loader=PyPDFLoader(FILE_PATH)

In [54]:
len(loader.load()) # This will give you the number of pages in the pdf

77

In [55]:
loader.load() # This will load the pdf and return a list of documents

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'C:\\Users\\rohit\\Desktop\\Rohit\\2025\\Learning\\KrishNaik\\Agentic AI 2.0\\Code\\Rohit Code\\Data\\llama.pdf', 'total_pages': 77, 'page': 0, 'page_label': '1'}, page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron‚àó Louis Martin‚Ä† Kevin Stone‚Ä†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas 

In [56]:
pages=loader.load() # store the loaded pages in a variable

# Split the pages into chunks

In [57]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Split the loaded pages into smaller chunks

In [58]:
splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

In [59]:
split_docs=splitter.split_documents(pages)
# This will split the loaded pages into smaller chunks of 500 characters with an overlap of 50 characters

In [60]:
len(split_docs) # This will give you the number of chunks created

615

In [61]:
index=faiss.IndexFlatIP(384) # Create a faiss index for the embeddings
vector_store=FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

In [62]:
vector_store.add_documents(documents=split_docs) # Add the split documents to the vector store


['1216e926-7f66-410a-a794-716f512f891d',
 '32535acc-6d0d-44bd-8470-d6510b8fe0c7',
 '74566b68-f015-4179-92a3-e3ae44fed78a',
 '3c2bcb2b-e078-43d9-8e1d-d1ee31ae119b',
 '0c95fd2c-56b6-454b-94ac-5928133aa364',
 'b23b3e17-fe25-4b2b-b14c-37b0cae66352',
 'cd2d5f66-f9b4-4173-a770-6cb642b7bd5f',
 '25083255-e1df-4f9a-aa87-f9923f4ca3c8',
 '5a66de49-d01d-42b2-82bb-4015baa8b8e4',
 'bdea2f88-1ca1-44a3-8777-7afb1dfb23db',
 '3ff59d9c-3ac7-4c12-8881-1a7ed118a463',
 'f6e2afa4-cf9b-45aa-be72-4e4b6e001b73',
 '82b1d5c9-16a7-4f69-ab12-7fe3d29599c8',
 'bec834d0-bc66-47f2-bb55-cc4f4034b494',
 '18b308e3-e301-4b0d-9ea2-57cc9429f1d1',
 'bc52fe95-d616-451b-b46b-db8f328aaf43',
 'f0c0f748-abf9-4a40-90bf-b745108e03ef',
 'c819e643-d7bb-4425-bd9e-945a1076d8bb',
 '80679427-0717-4beb-9764-9277584a0541',
 '4ca312a3-e0f7-499d-a9e3-76cf6ee2b9da',
 '2ea6c31a-396d-48ca-bb71-ce44a5f62b7c',
 '9f13ce06-1c99-49f7-9557-08040127b4ba',
 '8515c097-dfd9-4dc9-b564-0be3b905835d',
 'd45bb6a0-1d2a-444b-8b63-f9135414c79b',
 'e82b827a-6903-

In [63]:
retriever=vector_store.as_retriever(search_kwargs={"k": 3}) # Create a retriever to get the top 3 similar chunks
# Now you can use the retriever to get the top 3 similar chunks for a given query

In [64]:
retriever.invoke("WHat is Llama model and how it works?") # This will return the top 3 similar chunks for the given query

[Document(id='e82b827a-6903-4d55-b710-fad12c54089b', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'C:\\Users\\rohit\\Desktop\\Rohit\\2025\\Learning\\KrishNaik\\Agentic AI 2.0\\Code\\Rohit Code\\Data\\llama.pdf', 'total_pages': 77, 'page': 3, 'page_label': '4'}, page_content='work (Section 6), and conclusions (Section 7).\n‚Ä°https://ai.meta.com/resources/models-and-libraries/llama/\n¬ßWe are delaying the release of the 34B model due to a lack of time to sufficiently red team.\n¬∂https://ai.meta.com/llama\n‚Äñhttps://github.com/facebookresearch/llama\n4'),
 Document(id='449353a4-9d63-4161-9b05-1cc492436d41', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref',

In [65]:
# For creating end ot enc RAG pipeline we need the LLM model to generate the response based on the retrieved chunks.
from langchain_groq import ChatGroq
model=ChatGroq(model="qwen-qwq-32b")

In [66]:
# Now to craete a chain we need prompt tenplate, model, outputparser
# so rather manually creating the prompt template, we can create prefined rag prompt templates using the langchain library
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")

In [67]:
import pprint

In [68]:
pprint.pprint(prompt.messages)

[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})]


### Prompt Template 

 [HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})]

In [69]:
# Create an output parser and Runnable(use take questrion at runtime)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [70]:
# How my Chain will look like
# retriever| prompt | model | parser

In [71]:

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [72]:
rag_chain=(
    {"context":retriever |format_docs, "question":RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [74]:
results = rag_chain.invoke("What is Llama model and how it works?") # This will return the response based on the retrieved chunks

In [75]:
print(results) # Print the response


<think>
Okay, I need to answer the question "What is Llama model and how it works?" using the provided context. Let me look through the context given.

First, the context mentions "Llama 2" and says it's intended for commercial and research use in English. It also differentiates between tuned models for chat and pretrained ones for various tasks. There's a reference to Meta's website and a GitHub repo. The user wants a concise answer in three sentences max.

Wait, the user is asking about the Llama model in general. The context here includes parts about Llama 2, but maybe the original Llama is from Meta. The GitHub link points to facebookresearch/llama, which aligns with Meta's work. The model is a large language model developed by Meta. The answer should mention that it's a series of large language models created by Meta, designed for various NLP tasks. The "how it works" part could reference it being a transformer-based architecture, trained onÂ§ßÈáèÊñáÊú¨Êï∞ÊçÆ to predict sequences