Following Datacamp article: https://www.datacamp.com/tutorial/llama-3-1-rag

Set up the environment

In [None]:
%pip install langchain langchain_community scikit-learn langchain-ollama sentence-transformers tiktoken

Main functions that I have used from LangChain in my book assistant bot:
1. PyPDFLoader > To load the PDF file
2. SKLearnVectorStore > A vector store that uses scikit-learn to store text embeddings.
3. RecursiveCharacterTextSplitter > Splits documents into chunks of text.
4. HuggingFaceEmbeddings > To generate embeddings for the text.
5. ChatOllama > To connect to Ollama (locally runnnig models) with langchain.
6. PromptTemplate > To connect prompt + LLM to get a repsonse.
7. Chains > Combines:  Retriever + LLM > RetrievalQA > Ask questions on documents or PDf files(In my case PDF file)

In [29]:
from langchain_community.vectorstores import SKLearnVectorStore
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate

Load and prepare documents
Documents can be anyhing, I can load a PDF or use webpages as the source also

What is does:
You give the file path of a PDF (sample-book.pdf).
PyPDFLoader reads and extracts the content from the PDF.
docs_list is now a list of all the pages as text.

In [None]:
# List of PDF file paths to load documents from
pdf_paths = [
    "/home/ai-ml-practice/rag-using-llm/sample-book.pdf"
]

Split documents into chunks

What it does:
Big documents are broken into smaller pieces (250 characters each).
This helps the LLM read smaller bits and answer more accurately.
chunk_overlap=2 means it includes a few repeated words to preserve context.

In [31]:
# Load and split documents
docs = [PyPDFLoader(pdf_path).load() for pdf_path in pdf_paths]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=2
)
doc_splits = text_splitter.split_documents(docs_list)

Initialize embeddings

What it does:
You extract just the text from each chunk.
Then store them as vectors in a small database (SKLearnVectorStore).
retriever will now fetch the top 4 most similar chunks for a given query.

In [32]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Create a vector store

In [33]:
# Create vector store
texts = [doc.page_content for doc in doc_splits]
vectorstore = SKLearnVectorStore.from_texts(texts, embedding=embeddings)
retriever = vectorstore.as_retriever(k=4)

Initialize LLM

In [34]:
llm = ChatOllama(model="llama3.1:8b")

Define prompt template

In [35]:
prompt_template = """You are an assistant for question-answering tasks.
Use the following documents to answer the question.
If you don't know the answer, just say that you don't know.
Please answer in simple and easy to understand language.
Use two sentences maximum and keep the answer concise:
Question: {question}
Documents: {context}
Answer:"""
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

Create retrieval chain

What it does:
This creates the full RAG pipeline:
It takes a user query
Uses the retriever to grab relevant document chunks
Passes both query + chunks to the LLM using your prompt
return_source_documents=True helps if you want to show where the answer came from.

In [36]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt},
    return_source_documents=True
)

Define RAG application class 

What it does:
A simple class to wrap your chatbot logic
It runs the RAG chain when you call run()
 

In [37]:
class RAGApplication:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain

    def run(self, question):
        result = self.qa_chain.invoke({"query": question})
        return result["result"]

Initialize and run

What it does:
Initializes the bot with the RAG chain
Sends the question to the chain
Prints out the LLM’s answer based on the retrieved chunks

In [38]:
rag_application = RAGApplication(qa_chain)

question = "What should I know about protein?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What should I know about protein?
Answer: Here's the answer in two sentences:

Protein is important for growth, tissue repair, immune function, energy production, and preserving lean muscle mass. You can get protein from animal products like meat, eggs, and fish, or plant-based sources like beans, nuts, and grains, as long as you combine them to get all essential amino acids.
