Following Datacamp article: https://www.datacamp.com/tutorial/llama-3-1-rag

Set up the environment

In [None]:
%pip install langchain langchain_community scikit-learn langchain-ollama sentence-transformers tiktoken

Main functions that I have used from LangChain in my book assistant bot:
1. PyPDFLoader > To load the PDF file
2. SKLearnVectorStore > A vector store that uses scikit-learn to store text embeddings.
3. RecursiveCharacterTextSplitter > Splits documents into chunks of text.
4. HuggingFaceEmbeddings > To generate embeddings for the text.
5. ChatOllama > To connect to Ollama (locally runnnig models) with langchain.
6. PromptTemplate > To connect prompt + LLM to get a repsonse.
7. Chains > Combines:  Retriever + LLM > RetrievalQA > Ask questions on documents or PDf files(In my case PDF file)

In [40]:
from langchain_community.vectorstores import SKLearnVectorStore
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate

Load and prepare documents
Documents can be anyhing, I can load a PDF or use webpages as the source also

What is does:
You give the file path of a PDF (sample-book.pdf).
PyPDFLoader reads and extracts the content from the PDF.
docs_list is now a list of all the pages as text.

In [41]:
# List of PDF file paths to load documents from
pdf_paths = [
    "/home/ai-ml-practice/rag-using-llm/sample-book.pdf"
]

Split documents into chunks

What it does:
Big documents are broken into smaller pieces (250 characters each).
This helps the LLM read smaller bits and answer more accurately.
chunk_overlap=2 means it includes a few repeated words to preserve context.

In [None]:
# Load and split documents
docs = [PyPDFLoader(pdf_path).load() for pdf_path in pdf_paths]
docs_list = [item for sublist in docs for item in sublist]

# too small chunk size(250) dint work as model was not able to
# understand context(basic questions like who is the author dint work)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=550, chunk_overlap=100
)
doc_splits = text_splitter.split_documents(docs_list)

Initialize embeddings

What it does:
You extract just the text from each chunk.
Then store them as vectors in a small database (SKLearnVectorStore).
retriever will now fetch the top 4 most similar chunks for a given query.

In [74]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
print(embeddings)

client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
) model_name='all-MiniLM-L6-v2' cache_folder=None model_kwargs={} encode_kwargs={} multi_process=False show_progress=False


Create a vector store

In [75]:
# Create vector store
texts = [doc.page_content for doc in doc_splits]
vectorstore = SKLearnVectorStore.from_texts(texts, embedding=embeddings)
retriever = vectorstore.as_retriever(k=4)

Initialize LLM

In [76]:
llm = ChatOllama(model="llama3.1:8b")

Define prompt template

NEW ADDITION IN THE PROMPT > Chat Hisory: {chat_history}

In [79]:
prompt_template = """You are an assistant for question-answering tasks.
Use the following documents to answer the question.
If you don't know the answer, just say that you don't know.
Please answer in simple and easy to understand language.
Use two sentences maximum and keep the answer concise:
Question: {question}
Documents: {context}
Chat Hisory: {chat_history}
Answer:"""
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question", "chat_history"]
)

Create retrieval chain

What it does:
This creates the full RAG pipeline:
It takes a user query
Uses the retriever to grab relevant document chunks
Passes both query + chunks to the LLM using your prompt
return_source_documents=True helps if you want to show where the answer came from.

In [80]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt},
    return_source_documents=True
)

Define RAG application class 

What it does:
A simple class to wrap your chatbot logic
It runs the RAG chain when you call run()
 

In [81]:
class RAGApplication:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain

    def run(self, question):
        result = self.qa_chain.invoke({"query": question})
        return result["result"]

Initialize and run

What it does:
Initializes the bot with the RAG chain
Sends the question to the chain
Prints out the LLM’s answer based on the retrieved chunks

In [83]:
rag_application = RAGApplication(qa_chain)

question = "What is chapter 5 about?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: What is this book about?
Assistant: I don't know what this question refers to, as there is no "Chat History" section in the provided documents.
Human: What is the name of the author?
Assistant: I don't have enough information to answer your question. There's no clear reference to a book in the chat history you provided.
Human: What was my first question?
Assistant: I don't know who the author is, as their name is not mentioned in the provided documents. 

The documents appear to be excerpts from a self-help or motivational book, but I couldn't find any information about the author's identity.
Follow Up Input: Who is the author of the book?
Standalone question:[0m


Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mYou are an assistant for question-answering tasks.
Use the following documents to answer the question.
If you don't know the answer, just say that you don't know.
Please answer in simple and easy to understand language.
Use two sentences maximum and keep the answer concise:
Question: Here is the rephrased follow-up question as a standalone question:

¿Quién es el autor del libro? (Who is the author of the book?)
Documents: engrossed in the book, and I felt free, if only for a short period of time. This
was when I disc overed my first trick to dealing with worry , which I call
‘taking control of your free time’, which ultimately is your thinking time.
T ake control of your thinking time
The biggest dif ference in my life  over the last two years, alongside the birth
of my daughter , is my ability to take control of my wandering an d unhelpful

ACKNOWLEDGEMENTS
My gratitude goes out to all those who have passed through my l