Following Datacamp article: https://www.datacamp.com/tutorial/llama-3-1-rag

Set up the environment

In [97]:
%pip install langchain langchain_community scikit-learn langchain-ollama sentence-transformers tiktoken

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


Main functions that I have used from LangChain in my book assistant bot:
1. PyPDFLoader > To load the PDF file
2. SKLearnVectorStore > A vector store that uses scikit-learn to store text embeddings.
3. RecursiveCharacterTextSplitter > Splits documents into chunks of text.
4. HuggingFaceEmbeddings > To generate embeddings for the text.
5. ChatOllama > To connect to Ollama (locally runnnig models) with langchain.
6. PromptTemplate > To connect prompt + LLM to get a repsonse.
7. Chains > Combines:  Retriever + LLM > RetrievalQA > Ask questions on documents or PDf files(In my case PDF file)

In [98]:
from langchain_community.vectorstores import SKLearnVectorStore
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate

# NEw imports for chat memory implementation
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

Load and prepare documents
Documents can be anyhing, I can load a PDF or use webpages as the source also

What is does:
You give the file path of a PDF (sample-book.pdf).
PyPDFLoader reads and extracts the content from the PDF.
docs_list is now a list of all the pages as text.

In [None]:
# List of PDF file paths to load documents from
pdf_paths = [
    # "/home/ai-ml-practice/rag-using-llm/sample-book.pdf"
    "/home/ai-ml-practice/rag-using-llm/AgileProdMgt_sample.pdf"
    # "/home/ai-ml-practice/rag-using-llm/wifi-sample.pdf"
]

Split documents into chunks

What it does:
Big documents are broken into smaller pieces (250 characters each).
This helps the LLM read smaller bits and answer more accurately.
chunk_overlap=2 means it includes a few repeated words to preserve context.

In [100]:
# Load and split documents
docs = [PyPDFLoader(pdf_path).load() for pdf_path in pdf_paths]
docs_list = [item for sublist in docs for item in sublist]

# too small chunk size(250) dint work as model was not able to
# understand context(basic questions like who is the author dint work)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""],
    add_start_index=True  # Helps track document position
)
doc_splits = text_splitter.split_documents(docs_list)

Initialize embeddings

What it does:
You extract just the text from each chunk.
Then store them as vectors in a small database (SKLearnVectorStore).
retriever will now fetch the top 4 most similar chunks for a given query.

In [101]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
print(embeddings)

client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
) model_name='all-MiniLM-L6-v2' cache_folder=None model_kwargs={} encode_kwargs={} multi_process=False show_progress=False


Create a vector store

In [102]:
# Create vector store
texts = [doc.page_content for doc in doc_splits]
vectorstore = SKLearnVectorStore.from_texts(texts, embedding=embeddings)
retriever = vectorstore.as_retriever(k=2)
# retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

Initialize LLM

In [103]:
# llm = ChatOllama(model="llama3.1:8b")
llm = ChatOllama(model="deepseek-r1:14b")

NEW TASK: ADD MEMORY TO THE BOOK ASISTANT

For memory I have found two functions > ConversationChain and Memory

Memory has three types:
ConversationBufferMemory > This remembers everything

ConversationSummaryMemory > This remembers summary of conversation

VectorStoreMemory > This remembers conversation in vector store


In [104]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    # max_token_limit=2000,  # Trims oldest messages if exceeded (NEW)
    input_key = "question",
    output_key = "answer"
)

Define prompt template

NEW ADDITION IN THE PROMPT > Chat Hisory: {chat_history}

In [105]:
# prompt_template = """You are an assistant for question-answering tasks.
# Use the following document which is a book to answer the question.
# Given the following conversation and context, answer the user's question.
# ONLY output the following format, nothing else:

# Question: <repeat the latest user question here>
# Answer: <your answer here>
# If you don't know the answer, just say that you don't know.
# Please answer in simple and easy to understand language.
# Use two sentences maximum and keep the answer concise:
# Question: {question}
# Documents: {context}
# Chat history: {chat_history}
# Answer:"""
# prompt = PromptTemplate(
#     template=prompt_template,
#     input_variables=["context", "question", "chat_history"]
# )

In [106]:
# prompt_template = """You are a helpful assistant for question-answering.
# Given the following context and conversation history, answer the user's question.

# IMPORTANT:
# - Only output the answer to the question below.
# - Do NOT repeat the question, context, or chat history.
# - Do NOT include any explanations, formatting, or extra text.
# - If you do not know the answer, say: "I don't know."

# Context:
# {context}

# Chat history:
# {chat_history}

# Question:
# {question}

# Answer:
# """
# prompt = PromptTemplate(
#     template=prompt_template,
#     input_variables=["context", "question", "chat_history"]
# )

In [107]:
# from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

# system_prompt = """You are a book expert assistant. Use these rules:
# 1. Answer ONLY from the provided context
# 2. For memory questions, use the exact chat history below
# 3. If unsure, say "I don't know"

# Context: {context}
# Chat History: {chat_history}"""

# prompt = ChatPromptTemplate.from_messages([
#     SystemMessagePromptTemplate.from_template(system_prompt),
#     ("human", "{question}"),
# ])

NEW PROMPT for aglie scrum pdf

In [108]:
# from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

# system_prompt = """You are a book expert assistant. Use these rules:
# 1. Answer ONLY from the provided context
# 2. For memory questions, use the exact chat history below
# 3. If unsure, say "I don't know"

# Context: {context}
# Chat History: {chat_history}"""

# prompt = ChatPromptTemplate.from_messages([
#     SystemMessagePromptTemplate.from_template(system_prompt),

#     # Few-shot examples
#     ("human", "What is the main responsibility of a product owner in Scrum?"),
#     ("ai", "The product owner is responsible for maximizing product value. This includes managing the product backlog, communicating the vision, collaborating with the Scrum Team, and ensuring alignment with customer needs."),

#     ("human", "How is the product owner role different from a traditional product manager?"),
#     ("ai", "In Scrum, the product owner is a single, empowered individual who combines responsibilities that are often spread across multiple roles in traditional setups. They work closely with the development team and are accountable for the product's success."),

#     # Placeholder for dynamic user input
#     ("human", "{question}"),
# ])


In [109]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate

system_prompt = """You are an expert assistant trained on the book *"Agile Product Management with Scrum"* by Roman Pichler.
Answer user questions accurately and concisely using the knowledge from this book.
If a question is outside the book’s scope, politely respond that it's not covered in the source material.
Use these rules:
1. Answer ONLY from the provided context
2. For memory questions, use the exact chat history below
3. If unsure, say "I don't know"

Here are some example questions and answers:

Q: What is the main responsibility of a product owner in Scrum?
A: The product owner is responsible for maximizing product value. This includes managing the product backlog, communicating the vision, collaborating with the Scrum Team, and ensuring alignment with customer needs.

Q: How is the product owner role different from a traditional product manager?
A: In Scrum, the product owner is a single, empowered individual who combines responsibilities that are often spread across multiple roles in traditional setups. They work closely with the development team and are accountable for the product's success.

Context: {context}
Chat History: {chat_history}"""

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    ("human", "{question}"),
])

Create retrieval chain

What it does:
This creates the full RAG pipeline:
It takes a user query
Uses the retriever to grab relevant document chunks
Passes both query + chunks to the LLM using your prompt
return_source_documents=True helps if you want to show where the answer came from.

In [110]:
# Create the conversational chain  > this chain is for conversational memory, so  replacing RetrievalQA with ConversationalRetrievalChain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": prompt},    # same prompt from above
    # verbose=True,  # debugging here
    rephrase_question=True,  # Helps with follow-up questions
    # return_source_documents=True,  # For debugging
    # get_chat_history=lambda h: "\n".join([f"{msg.type}: {msg.content}" for msg in h])
)

Define RAG application class 

What it does:
A simple class to wrap your chatbot logic
It runs the RAG chain when you call run()
 

In [111]:
# chat_history = []
class RAGApplication:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain

    def run(self, question):
        result = self.qa_chain({"question": question})
        return result["answer"]

Initialize and run

What it does:
Initializes the bot with the RAG chain
Sends the question to the chain
Prints out the LLM’s answer based on the retrieved chunks

In [112]:
rag_application = RAGApplication(qa_chain)

# # conversation
# questions = [
#     # "What is the document about?",
#     # "What is the name of the author?",
#     # "Can you summarise the document?"
#     "What is the product owner responsible for in Scrum?",
#     "How is the product owner different from a traditional product manager?",
#     "So does that mean the product owner also handles stakeholder communication directly?",
#     "What is the role of the product owner in Scrum?",
# ]

# for question in questions:
#     answer = rag_application.run(question)
#     print(f"Question: {question}")
#     print(f"Answer: {answer}\n")
#     # memory.clear()

bot = RAGApplication(qa_chain)

print(bot.run("How is the product owner different from a traditional product manager?"))
print(bot.run("So does that mean they also handle stakeholder communication directly?"))


<think>
Okay, so I'm trying to figure out how the product owner role in Scrum differs from a traditional product manager. I remember reading that Roman Pichler's book "Agile Product Management with Scrum" covers this topic, so I'll need to refer back to that.

First, from what I recall, the product owner is responsible for maximizing product value. They manage the product backlog and work closely with the Scrum team. The key difference seems to be that in traditional setups, responsibilities are often split between multiple roles, whereas in Scrum, the product owner combines these into one role. 

Looking at the context provided, it mentions that the product owner is a single, empowered individual. They handle tasks like creating the product vision, grooming the backlog, planning releases, involving stakeholders, managing budgets, and preparing launches. This seems more comprehensive than a traditional product manager's role, which might be more focused on either strategic or tactical 

Good results

accuracy good

retrieval from book good 

follow up which was asked in question 2 GOOD > confirms chat memory

Answers :

How is the product owner different from a traditional product manager?

The product owner role in Scrum differs from a traditional product manager by being a single, empowered individual who combines responsibilities typically split across multiple roles. While traditional setups may separate strategic and tactical aspects between product marketers (outward-facing) and technical product managers (inward-facing), the Scrum product owner unites these roles. They handle everything from market understanding to detailed features, manage budgets, and plan releases, avoiding handoffs and delays. Additionally, in Scrum, project management tasks are distributed among the team, reducing reliance on a separate project manager. This integration ensures end-to-end authority and accountability, enhancing efficiency and communication within the team.

"So does that mean they also handle stakeholder communication directly?"

Yes, in Scrum, the product owner is responsible for handling stakeholder communication directly. They ensure that stakeholders' interests are represented and their feedback is integrated throughout the process, distinguishing this role from traditional setups where responsibilities might be divided among multiple roles.

Answers: chat memory max tokens were 2000

What is the product owner responsible for in Scrum?
The Product Owner (PO) in Scrum is responsible for several key areas:

1. **Product Backlog Management**: The PO manages the Product Backlog to ensure it aligns with business goals and stakeholder needs.

2. **Value Maximization**: They ensure that the work done by the team delivers maximum value, focusing on benefits realization.

3. **Product Vision and Strategy**: The PO defines and communicates the product vision and strategy, guiding the development direction.

4. **Stakeholder Engagement**: They involve customers, users, and other stakeholders to gather requirements and validate solutions.

5. **Release Planning**: The PO collaborates with the Scrum Team to plan releases, ensuring they meet business objectives.

6. **Collaboration with the Scrum Team**: While the ScrumMaster supports in grooming the backlog, the PO works closely with the team to prioritize and clarify items.

7. **Multifaceted Role**: Unlike traditional roles, the PO combines authority and responsibility for product success across its lifecycle.


How is the product owner different from a traditional product manager?
The product owner in Scrum differs significantly from a traditional product manager by combining various roles into one person. They are responsible for creating the product vision, managing the Product Backlog, collaborating closely with the Scrum team, and ensuring value realization through iterative delivery. Unlike traditional PMs who might delegate tasks to others, the PO works directly within the Agile framework to align product development efforts with business goals.
The product owner role is multifaceted, encompassing responsibilities traditionally held by separate roles such as product management or project management. This integration allows for a more streamlined approach where the PO leads the effort to create a winning product, ensuring alignment and collaboration across all stakeholders.

So does that mean the product owner also handles stakeholder communication directly?
No, the Product Owner is not responsible for assigning tasks to team members. That responsibility lies with the self-organizing development team itself.

What is the role of the product owner in Scrum?
The Product Owner (PO) in Scrum differs from a traditional product manager by uniting several roles under one individual. The PO is responsible for managing the Product Backlog, creating the product vision, planning releases, involving stakeholders, collaborating with the Scrum Team, and ensuring alignment with business goals.In contrast to traditional setups where responsibilities might be spread across multiple roles, the PO's multifaceted role includes leading the development effort, ensuring value realization through iterative delivery, and maintaining close collaboration with the team. This integration within the Agile framework allows for a more streamlined approach, emphasizing continuous collaboration and alignment with business objectives.


Final Verdict:
accuracy, retrieval good 
Hallunicated when testing chat memory , was asked about stakeholder communication answered assisnging tasks to teammates.