Following Datacamp article: https://www.datacamp.com/tutorial/llama-3-1-rag

Set up the environment

In [1]:
%pip install langchain langchain_community scikit-learn langchain-ollama sentence-transformers tiktoken

Note: you may need to restart the kernel to use updated packages.


Main functions that I have used from LangChain in my book assistant bot:
1. PyPDFLoader > To load the PDF file
2. SKLearnVectorStore > A vector store that uses scikit-learn to store text embeddings.
3. RecursiveCharacterTextSplitter > Splits documents into chunks of text.
4. HuggingFaceEmbeddings > To generate embeddings for the text.
5. ChatOllama > To connect to Ollama (locally runnnig models) with langchain.
6. PromptTemplate > To connect prompt + LLM to get a repsonse.
7. Chains > Combines:  Retriever + LLM > RetrievalQA > Ask questions on documents or PDf files(In my case PDF file)

In [2]:
from langchain_community.vectorstores import SKLearnVectorStore
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate

# NEw imports for chat memory implementation
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

Load and prepare documents
Documents can be anyhing, I can load a PDF or use webpages as the source also

What is does:
You give the file path of a PDF (sample-book.pdf).
PyPDFLoader reads and extracts the content from the PDF.
docs_list is now a list of all the pages as text.

In [None]:
# List of PDF file paths to load documents from
pdf_paths = [
    "/home/ai-ml-practice/rag-using-llm/sample-book.pdf"
]

Split documents into chunks

What it does:
Big documents are broken into smaller pieces (250 characters each).
This helps the LLM read smaller bits and answer more accurately.
chunk_overlap=2 means it includes a few repeated words to preserve context.

In [4]:
# Load and split documents
docs = [PyPDFLoader(pdf_path).load() for pdf_path in pdf_paths]
docs_list = [item for sublist in docs for item in sublist]

# too small chunk size(250) dint work as model was not able to
# understand context(basic questions like who is the author dint work)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""],
    add_start_index=True  # Helps track document position
)
doc_splits = text_splitter.split_documents(docs_list)

  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


Initialize embeddings

What it does:
You extract just the text from each chunk.
Then store them as vectors in a small database (SKLearnVectorStore).
retriever will now fetch the top 4 most similar chunks for a given query.

In [5]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
print(embeddings)

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
  from tqdm.autonotebook import tqdm, trange


client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
) model_name='all-MiniLM-L6-v2' cache_folder=None model_kwargs={} encode_kwargs={} multi_process=False show_progress=False


Create a vector store

In [6]:
# Create vector store
texts = [doc.page_content for doc in doc_splits]
vectorstore = SKLearnVectorStore.from_texts(texts, embedding=embeddings)
retriever = vectorstore.as_retriever(k=4)

Initialize LLM

In [7]:
# llm = ChatOllama(model="llama3.1:8b")
llm = ChatOllama(model="deepseek-r1:14b")

NEW TASK: ADD MEMORY TO THE BOOK ASISTANT

For memory I have found two functions > ConversationChain and Memory

Memory has three types:
ConversationBufferMemory > This remembers everything

ConversationSummaryMemory > This remembers summary of conversation

VectorStoreMemory > This remembers conversation in vector store


In [8]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    max_token_limit=2000,  # Trims oldest messages if exceeded (NEW)
    input_key = "question",
    output_key = "answer"
)

Define prompt template

NEW ADDITION IN THE PROMPT > Chat Hisory: {chat_history}

In [9]:
# prompt_template = """You are an assistant for question-answering tasks.
# Use the following document which is a book to answer the question.
# Given the following conversation and context, answer the user's question.
# ONLY output the following format, nothing else:

# Question: <repeat the latest user question here>
# Answer: <your answer here>
# If you don't know the answer, just say that you don't know.
# Please answer in simple and easy to understand language.
# Use two sentences maximum and keep the answer concise:
# Question: {question}
# Documents: {context}
# Chat history: {chat_history}
# Answer:"""
# prompt = PromptTemplate(
#     template=prompt_template,
#     input_variables=["context", "question", "chat_history"]
# )

Stricter prompt:

In [10]:
# prompt_template = """You are a helpful assistant for question-answering.
# Given the following context and conversation history, answer the user's question.

# IMPORTANT:
# - Only output the answer to the question below.
# - Do NOT repeat the question, context, or chat history.
# - Do NOT include any explanations, formatting, or extra text.
# - If you do not know the answer, say: "I don't know."

# Context:
# {context}

# Chat history:
# {chat_history}

# Question:
# {question}

# Answer:
# """
# prompt = PromptTemplate(
#     template=prompt_template,
#     input_variables=["context", "question", "chat_history"]
# )

In [11]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

system_prompt = """You are a book expert assistant. Use these rules:
1. Answer ONLY from the provided context
2. For memory questions, use the exact chat history below
3. If unsure, say "I don't know"

Context: {context}
Chat History: {chat_history}"""

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    ("human", "{question}"),
])

Create retrieval chain

What it does:
This creates the full RAG pipeline:
It takes a user query
Uses the retriever to grab relevant document chunks
Passes both query + chunks to the LLM using your prompt
return_source_documents=True helps if you want to show where the answer came from.

In [12]:
# Create the conversational chain  > this chain is for conversational memory, so  replacing RetrievalQA with ConversationalRetrievalChain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": prompt},    # same prompt from above
    verbose=True,  # debugging here
    rephrase_question=True,  # Helps with follow-up questions
    return_source_documents=True,  # For debugging
    get_chat_history=lambda h: "\n".join([f"{msg.type}: {msg.content}" for msg in h])
)

Define RAG application class 

What it does:
A simple class to wrap your chatbot logic
It runs the RAG chain when you call run()
 

In [13]:
class RAGApplication:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain

    def run(self, question):
        result = self.qa_chain({"question": question})
        return result["answer"]

Initialize and run

What it does:
Initializes the bot with the RAG chain
Sends the question to the chain
Prints out the LLM’s answer based on the retrieved chunks

In [14]:
rag_application = RAGApplication(qa_chain)

# conversation
questions = [
    "What is this book about?",
    "Who is Brian Keane?",
    "What was my first question?",  # testing the memory
    "What is chapter 10 in the book about?"
]

for question in questions:
    answer = rag_application.run(question)
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")
    # memory.clear()

  result = self.qa_chain({"question": question})
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mSystem: You are a book expert assistant. Use these rules:
1. Answer ONLY from the provided context
2. For memory questions, use the exact chat history below
3. If unsure, say "I don't know"

Context: ACKNOWLEDGEMENTS
My gratitude goes out to all those who have passed through my life. All of
you hav e contrib uted in some way to my journey and the substance of this
book.
First, I want to thank my publisher Lucy McCarraher and Re think Press.
W ithout them, this book would not be possible.
My fath er Gerry , who has suppo rted me through my entire life and nurtured
a work ethic that still supports me to this day .
My sister Karen, who has always been my second biggest fan in everything
that I do. She has been one of my best friends and is one of the smartest
people I’ve had the pleasure of knowing.
My team of Pa ul Dermody , Daniel Lupton and Emma Finneg an – while
writing this book, Paul,
Daniel and Emma allowed me to bounce ideas of f them on a dai

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m

[1m> Finished chain.[0m
Question: What is this book about?
Answer: <think>
Okay, so I need to figure out what the book is about based on the provided context. Let me start by reading through the context carefully.

The context begins with acknowledgments from the author, thanking various people like his publisher, family members, and a team. He mentions that the book is dedicated to his mum and daughter because without them, he wouldn't be where he is. This gives me a hint that the book might have a personal and inspirational tone.

Then there's a part about Harley-Davidson and a quote from him regarding writing one's own life story. The author describes a moment in East London where he had an epiphany, leading him to leave his job, move back home, and pursue his passion. This suggests that the book is about personal transformation or pursuing one's goals despite challenges.

He mentions quotes by other authors like Mae West and refers to concepts like "be

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a book expert assistant. Use these rules:
1. Answer ONLY from the provided context
2. For memory questions, use the exact chat history below
3. If unsure, say "I don't know"

Context: THE AUTHOR
B R I A N  K E A N E
Brian is an online fitness trainer; former professional fitness model and the
owner of Brian Keane Fitness. He and his team help serve thousands of
people each year through their online fitness programs.
Brian’ s fitness journey started when his parents bought him a gym
membership for his 16th birthday . He has been training ever since.
He also worked as a primary sch ool teacher for four years whilst setting up
a personal training business on the side. In 2014, he left the teaching
profession to pursue a full-time career in fitness and in the sa me year he
won his  Fitness Model Pro Car d at the Miami Pro Event. He s pent nearly

ACKNOWLEDGEMENTS
My gratitude goes out to all those who have pas

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m

[1m> Finished chain.[0m
Question: Who is Brian Keane?
Answer: <think>

Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
human: What is this book about?
ai: <think>
Okay, so I need to figure out what the book is about based on the provided context. Let me start by reading through the context carefully.

The context begins with acknowledgments from the author, thanking various people like his publisher, family members, and a team. He mentions that the book is dedicated to his mum and daughter because without them, he wouldn't be where he is. This gives me a hint that the book might have a personal and inspirational tone.

Then there's a part about Harley-Davidson and a quote from him regarding writing one's own life story. The author describes a moment in East London where he had an epiphany, leading him to lea

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a book expert assistant. Use these rules:
1. Answer ONLY from the provided context
2. For memory questions, use the exact chat history below
3. If unsure, say "I don't know"

Context: THE AUTHOR
B R I A N  K E A N E
Brian is an online fitness trainer; former professional fitness model and the
owner of Brian Keane Fitness. He and his team help serve thousands of
people each year through their online fitness programs.
Brian’ s fitness journey started when his parents bought him a gym
membership for his 16th birthday . He has been training ever since.
He also worked as a primary sch ool teacher for four years whilst setting up
a personal training business on the side. In 2014, he left the teaching
profession to pursue a full-time career in fitness and in the sa me year he
won his  Fitness Model Pro Car d at the Miami Pro Event. He s pent nearly

C H A P T E R  7
FIND YOUR WHY
The how means nothing until you f

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m

[1m> Finished chain.[0m
Question: What was my first question?
Answer: <think>

Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
human: What is this book about?
ai: <think>
Okay, so I need to figure out what the book is about based on the provided context. Let me start by reading through the context carefully.

The context begins with acknowledgments from the author, thanking various people like his publisher, family members, and a team. He mentions that the book is dedicated to his mum and daughter because without them, he wouldn't be where he is. This gives me a hint that the book might have a personal and inspirational tone.

Then there's a part about Harley-Davidson and a quote from him regarding writing one's own life story. The author describes a moment in East London where he had an epiphany, leading hi

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a book expert assistant. Use these rules:
1. Answer ONLY from the provided context
2. For memory questions, use the exact chat history below
3. If unsure, say "I don't know"

Context: The first half contains everyt hing about fitness and health: nutrition,
training, supplements, sleep, alc ohol and hydration. These are the essentials
you nee d to get into incredible shape, increase your ener gy and reap the
results of understanding how your body works. I nearly quit fitness several
times during my early twenties f rom sheer information overload . There was
so much  conflic ting material out there that I didn’ t know what to believe,
what to try or what worked. I’ve used my personal 17 years of training and
nutrition knowle dge to try and find a shortcut for your journey . I ef fectively
wrote all this for the 22-year -old me who nearly quit a hundred times.
The seco nd half  is focused on cultivating the m