# [ Let's Talk with Someone : (RAG) - Using TinyLLM-1B ]

st125214 - Maung Maung Kyi Tha

Practical application of RAG (Retrieval-Augmented Generation) techniques in Langchain framework to augment the chatbot that specializes in answering questions related to a person, in this case, Mr. Bill Gates.

In [3]:
# Environment Setup
import re
import os
import torch

# Import Languagechain and its components
from langchain.document_loaders import PyPDFLoader, WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Alternate Imports
# from langchain.chains.retrieval import create_retrieval_chain
# from langchain.chains.combine_documents import create_stuff_documents_chain
# from langchain.prompts import ChatPromptTemplate


In [4]:
# setting device to GPU cuda if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using {device}")
print("Available GPUs:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
    print(f"GPU {i}: {torch.cuda.get_device_name(i)}")

# Seet my seed
SEED = 75
torch.manual_seed(SEED)

# Making sure we get the same results on each run
torch.backends.cudnn.deterministic = True

# Disable user warnings for neater output
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Using cuda
Available GPUs: 1
GPU 0: NVIDIA GeForce RTX 4050 Laptop GPU


### Source Discovery

I've decided to base on a person, Mr. Bill Gates, Founder of Microsoft, in this task.

The following licture and information were collected and used as relevant sources about Bill Gates.

1. Bill Gates, with Nathan Myhrvold and Peter Rinearson - The Road Ahead, Penguin Publishing, 1995
2. Bill Gates Resume.pdf (curated and prepared from the wikipaedia web site )
3. Wikipedia - https://en.wikipedia.org/wiki/Bill_Gates

In [None]:
# Step 1: Load Documents (PDFs & Web Data)

def load_documents(pdf_paths): #, web_links):
    documents = []
    
    # Load PDF Documents
    for pdf in pdf_paths:
        loader = PyPDFLoader(pdf)
        documents.extend(loader.load())
    
    # Load Web Data
    # web_loader = WebBaseLoader(web_links)
    # documents.extend(web_loader.load())
    
    return documents

pdf_files = ['Documents/Bill Gates Resume.pdf', 'Documents/Bill Gates - Wikipedia.pdf']

# Combine all documents
documents = load_documents(pdf_files) # , web_links)

# Document Transformation
# Split Text into Chunks for better embedding
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_documents = text_splitter.split_documents(documents)

### Crate vector store for retrieval

In [6]:
# Generate embeddings and store for retrieval

model_name = 'hkunlp/instructor-base'

# model_name = 'sentence-transformers/all-MiniLM-L6-v2' # experiment with different models

# Use Hugging Face embeddings instead of OpenAI
embedding_model = HuggingFaceEmbeddings(
                   model_name=model_name,
                   model_kwargs={"device": device})





In [7]:
# save vector locally
vector_store = FAISS.from_documents(
    documents = split_documents,
    embedding = embedding_model
)
vector_path = 'vector-store'
db_file_name = 'nlp_vector_store'

vector_store.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### Prepare retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [8]:
# Reloading vector from local
vector_path = 'vector-store'
db_file_name = 'nlp_vector_store'

vector_store = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp', #default index
    allow_dangerous_deserialization=True  # required to load from pickle
) 

In [9]:
# Define retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
#retriever = vector_store.as_retriever()
print("Retriever is ready for query processing.")

Retriever is ready for query processing.


In [10]:
# Testing the retriever
retriever.get_relevant_documents("What is your name")

  retriever.get_relevant_documents("What is your name")


[Document(id='b42ed06b-d72c-4e19-a5b7-4960185284db', metadata={'producer': 'Microsoft: Print To PDF', 'creator': 'PyPDF', 'creationdate': '2025-03-13T22:15:23+07:00', 'author': 'Kyi Tha', 'moddate': '2025-03-13T22:15:23+07:00', 'title': 'Microsoft Word - Bill Gates Resume.docx', 'source': 'Bill Gates Resume.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='Bill Gates \n \nEmail: bill.gates@gatesfoundation.org \nWebsite: www.gatesfoundation.org \nLinkedIn: linkedin.com/in/billgates \nTwitter: @BillGates \n \nFull Name  : William Henry Gates III \nDate of Birth  : October 28, 1955 (age 69) \nPlace of Birth  : Seattle, Washington, U.S. \nEducation  : Harvard University (dropped out) \nSpouse  : Melinda French \n    (m. 1994; div. 2021) \nChildren  : 3 \nParents  :  Bill Gates Sr., Mary Maxwell \n \nProfessional Summary \nVisionary entrepreneur, technologist, and philanthropist with a proven track record of \nrevolutionizing the technology industry and addressing global 

### Prompt for Personal Questions

In [11]:
# Simple QA prompt creation
prompt_template = """Answer the following question based solely on the provided context. 
If the answer is not present in the context, say 'I don't know.'

Context: {context}

Question: {question}

Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

### Build RAG System
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [12]:
# Define the retrieval chain
# here are all the LLM models I am going to explore

llm = HuggingFacePipeline.from_model_id("TinyLlama/TinyLlama-1.1B-Chat-v1.0", task="text-generation", device=0)

#llm = HuggingFacePipeline.from_model_id("gpt2", task="text-generation", device=0)



Device set to use cuda:0


### Single Question Answering with RetrievalQA

In [14]:
# Setup RetrievalQA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type='stuff',
    return_source_documents=True,  # crucial to keep answer concise
    chain_type_kwargs={"prompt": prompt}
)

In [21]:
# Run single Query on the QA Chain
query = "What is your name?"
result = qa_chain.invoke({"query": query}) 

# Check Results
print("Answer:", result["result"])


Answer: Answer the following question based solely on the provided context. 
If the answer is not present in the context, say 'I don't know.'

Context: "Trey" (i.e., three) because his father had the "II" suffix.[6][7] The family lived in the Sand Point
area of Seattle in a home that was damaged by a rare tornado when Gates was 7.[8]
When Gates was young his parents wanted him to pursue a career in law.[9] During his childhood,
his family regularly attended a church of the Congregational Christian Churches, a Protestant
Reformed denomination.[10][11][12]
Gates was small for his age and was bullied as a child.[7] The family encouraged competition; one
visitor reported that "it didn't matter whether it was hearts or pickleball or swimming to the dock;
there was always a reward for winning and there was always a penalty for losing".[13]
At age 13, he enrolled in the private Lakeside prep school.[14][15] When he was in the eighth grade,
the Mothers' Club at the school used proceeds from La

In [22]:
# extracting relevant first answer only
match = re.search(r'Answer:\s*(.+)', result["result"])
first_answer = match.group(1).strip() if match else "No answer found."
# Extract source documents
source_docs = result.get("source_documents", [])
# Extract and print document names
source_names = [doc.metadata.get("source", "Unknown source") for doc in source_docs]

print("Question:", query)
print("First Answer:", first_answer)
print("Source Documents:", source_names)    

Question: What is your name?
First Answer: My name is Bill Gates.
Source Documents: ['Bill Gates - Wikipedia.pdf', 'Bill Gates Resume.pdf', 'Bill Gates - Wikipedia.pdf']


### Generating strings of questions and creating Q&A collection

In [16]:
questions = [
    'What is your name?',   
    'How old are you?',
    'What is your highest level of education?',
    'What major or field of study did you pursue during your education?',
    'How many years of work experience do you have?',
    'What type of work or industry have you been involved in?',
    'Can you describe your current role or job responsibilities?',
    'What are your core beliefs regarding the role of technology in shaping society?',
    'How do you think cultural values should influence technological advancements?',
    'As a master’s student, what is the most challenging aspect of his studies so far?',
    'What specific research interests or academic goals does Kaung hope to achieve during your time as a master’s student?'
    ]
answers = []
for query in questions:
    result = qa_chain.invoke({"query": query}) 
    match = re.search(r'Answer:\s*(.+)', result["result"])
    first_answer = match.group(1).strip() if match else "No answer found."
    answers.append({'question': query, 'answer': first_answer})

# Printing answers in a more readable format
for entry in answers:
    print(f"Question: {entry['question']}\nAnswer: {entry['answer']}\n")


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


[{'question': 'What is your name?', 'answer': 'My name is Bill Gates.'}, {'question': 'How old are you?', 'answer': 'I am 69 years old.'}, {'question': 'What is your highest level of education?', 'answer': 'I have a Bachelor of Science degree in mathematics and computer science from Harvard University.'}, {'question': 'What major or field of study did you pursue during your education?', 'answer': 'I attended Harvard University, where I studied mathematics and computer science before leaving to pursue Microsoft.'}, {'question': 'How many years of work experience do you have?', 'answer': 'I have over 25 years of experience in the technology industry, including 19 years at Microsoft.'}, {'question': 'What type of work or industry have you been involved in?', 'answer': 'I have been involved in various work and industries, including technology, global'}, {'question': 'Can you describe your current role or job responsibilities?', 'answer': 'As a philanthropic advisor, I work with individuals

In [17]:
# save to json file
import json
with open('answers_tinyLLM.json', 'w', encoding='utf-8') as f:
    json.dump(answers, f, ensure_ascii=False, indent=4)

print("Answers saved to 'answers_tinyLLM.json'")

Answers saved to 'answers.json'
