## **A6: Let’s Talk with Yourself!**

In this assignment, apply RAG (Retrieval-Augmented Generation) techniques in Langchain framework to augment your chatbot that specializes in answering questions related to yourself, your documents, resume and any other relevant information.

### **Load Libraries and define device**

In [1]:
import os
import torch

# Set GPU device for CUDA computation
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"
# os.environ['http_proxy']  = 'http://192.41.170.23:3128'
# os.environ['https_proxy'] = 'http://192.41.170.23:3128'
# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

os.environ["TOKENIZERS_PARALLELISM"] = "True"

# Use the MPS (Metal Performance Shaders) backend for PyTorch if available for MAC Users.
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
device

device(type='mps')

#### **Task 1. Source Discovery - Based on code-along/01-rag-langchain.ipynb, modify as follows:**

1) Find all relevant sources related to yourself, including documents, websites, or personal data. Please list down the reference documents (1 point)
   
   **Answer:** For this task I have used my CV and Profile as a reference document which holds all the answers to the 10 questions that is to be answered about myself by the chat bot. The document can be downloaded from the link: [Download Document](https://drive.google.com/drive/folders/1WsGkfhqnxTYJMzgKL1IwmOW_xUIcGIws?usp=sharing)

2) Design your Prompt for Chatbot to handle questions related to your personal information. Develop a model that can provide gentle and informative answers based on the designed template. (0.5 point)

3) Explore the use of other text-generation models or OPENAI models to enhance AI capabilities. (0.5 point)

**Note:** Groq also offers the llama3-70b model (generator model) with limited request capacity. For further exploration, refer to the following link (https://python.langchain.com/docs/integrations/chat/groq/)


##### **1. Prompt**
A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [2]:
from langchain import PromptTemplate

prompt_template = """
    Hi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. 
    I ensure responses are accurate and cite relevant sources.
    I'll be sure to answer you to the best of my ability!!
    Context: {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="Hi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. \n    I ensure responses are accurate and cite relevant sources.\n    I'll be sure to answer you to the best of my ability!!\n    Context: {context}\n    Question: {question}\n    Answer:")

In [3]:
PROMPT.format(
    context = "A RAGBot is a type of artificial intelligence that answers questions by retrieving information from uploaded documents. It uses NLP techniques to process and understand the content of these documents, allowing it to provide accurate and contextually relevant responses to user queries. By analyzing text and context, the RAGBot can extract the most pertinent details and generate appropriate answers without the need for explicit instructions.",
    question = "What is a RAGBot?"
)

"Hi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. \n    I ensure responses are accurate and cite relevant sources.\n    I'll be sure to answer you to the best of my ability!!\n    Context: A RAGBot is a type of artificial intelligence that answers questions by retrieving information from uploaded documents. It uses NLP techniques to process and understand the content of these documents, allowing it to provide accurate and contextually relevant responses to user queries. By analyzing text and context, the RAGBot can extract the most pertinent details and generate appropriate answers without the need for explicit instructions.\n    Question: What is a RAGBot?\n    Answer:"

##### **2. Retrieval**

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
   
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
   
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
   
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
   
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

##### **2.1 Document Loaders**
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

In [4]:
from langchain.document_loaders import PyMuPDFLoader

# nlp_docs = 'documents/CV _SachinMalego.pdf'
# loader = PyMuPDFLoader(nlp_docs)
# documents = loader.load()

# List of document paths
nlp_docs = [
    '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/documents/CV _SachinMalego.pdf',
    '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/documents/Profile_SachinMalego.pdf'
]

# Load all documents
documents = []
for doc in nlp_docs:
    loader = PyMuPDFLoader(doc)
    documents.extend(loader.load())

# Now 'documents' contains all loaded text from the PDFs
documents = [doc for file in nlp_docs for doc in PyMuPDFLoader(file).load()]

In [5]:
len(documents)

9

In [6]:
documents[1]

Document(page_content='SACHIN MALEGO  \n  \n+66-0910633538(M); e-mail: s.malego@gmail.com; st125171@ait.asia   \n2 | P a g e  \n  \nOrganization/Duty Station: National Disaster Risk Reduction and Management Authority  \n(NDRRMA), Government of Nepal and National Housing and Settlements Resilience Platform \n(NHSRP) Central Office, Lalitpur (Assigned districts: Kathmandu, Lalitpur, Bhaktapur, Makwanpur, \nNawalpur, Parasi, and Sindhuli) Job Responsibilities:  \n• \nCoordination with Relevant IM Actors: Collaborate with pertinent Information Management (IM) \nstakeholders to ensure the upkeep of common district-level IM tools. These tools facilitate various \nfunctions such as data collection, collation, analysis, and dissemination. It involves maintaining \narchives of baseline data, key datasets, key indicators, and other essential tools necessary for \neffective information management.  \n• \nManagement of IT Projects: Oversee IT projects, including software development initiatives, t

##### **2.2 Document Transformers**

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [8]:
doc[1]

Document(page_content='offering assistance in information system design, data visualization, data security, and data \nmanagement of Reconstruction portals.  \n• \nSupported the National Disaster Risk Reduction and Management Authority (NDRRMA) as a \nconsultant in the design and development of Bipad portals, along with creating training and \norientation packages.  \n• \nServed as Technical Lead for Asia Shelter Forum 2020 and Co-facilitator for Asia Shelter Forum \n2021.  \n  \nOBJECTIVE  \nTo leverage my extensive experience in Information Systems Design, Data Management, Data Science,  \nAI, and Disaster Risk Reduction to contribute effectively as an Expert to the field. With a proven track', metadata={'source': '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/documents/CV _SachinMalego.pdf', 'file_path': '/Users/sachinmalego/Documents/AIT Course Works/02. S

In [9]:
len(doc)

45

##### **2.3 Text Embedding Models**
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [10]:
import warnings
warnings.simplefilter("ignore", FutureWarning)

import torch
from langchain_community.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

load INSTRUCTOR_Transformer
max_seq_length  512


##### **2.4 Vector Stores**

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [11]:
#locate vectorstore
vector_path = './vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [12]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_db'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

##### **2.5 retrievers**
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [13]:
#calling vector from local
vector_path = './vector-store'
db_file_name = 'nlp_db'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp' #default index
)   

In [14]:
#ready to use
retriever = vectordb.as_retriever()

In [15]:
retriever.get_relevant_documents("How old are you?")

[Document(page_content='record in technical leadership and consultancy roles, my aim is to drive digital transformation initiatives \nand enhance organizational capabilities in utilizing data-driven strategies, artificial intelligence, and \nmachine learning for impactful decision-making and sustainable development.  \n  \nPERSONAL DETAILS  \nDate of Birth:  \n23rd Dec 1988 A.D.  \nGender:  \nMale  \nPermanent Address:  \nAsan Tole, Tansen-2, Palpa, Nepal   \nCurrent Address:  \nAIT Thailand  \nNationality:  \nNepali  \nLanguage Known:  \nEnglish, Nepali  \n  \nPROFESSIONAL WORKING EXPERIENCE  \nDate: July 08, 2018 – July 15, 2024 \nJob Title: Information Management Officer - National', metadata={'source': '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/documents/CV _SachinMalego.pdf', 'file_path': '/Users/sachinmalego/Documents/AIT Course Works/02. Second Seme

In [16]:
retriever.get_relevant_documents("What is your highest level of education?")

[Document(page_content='(Division)  \nM.Sc. DSAI \nAIT \nAsian Institute of Technology, \nThailand \nCurrent \n \nB.Sc. CSIT  \nT.U.  \nSt. Xavier’s College Maitighar,  \nKathmandu  \n2013 A.D.  \n81.36%  \n(Distinction)  \n10+2  \nHSEB  \nMillennium H.S.S. Tansen, \nPalpa  \n2007 A.D.  \n64.1%  \n(Ist div.)  \nS.L.C.  \nHMG  \nNew Horizon Eng. Bo. Sec. School \nTansen, Palpa  \n2005 A.D.  \n71.37%  \n(Ist div.)  \n  \nACADEMIC PROJECTS  \n• \nFinal year project entitled “ICT in SMART Energy for Sustainable Empowerment” as per the \nrequirement for the partial fulfillment of B.Sc. CSIT degree.  \n• \nMini project on Database entitled “Vehicle Registration System” as per the requirement for the \npartial fulfillment of B.Sc. CSIT degree.  \n•', metadata={'source': '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/documents/CV _SachinMalego.pdf', 'file_path': '/Use

##### **3. Memory**

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [17]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

ChatMessageHistory(messages=[])

In [18]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')
history.add_user_message('I\'m fine')
history.add_ai_message('That\'s good to hear')
history.add_user_message('Bye')
history.add_ai_message('Bye')

In [19]:
history

ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?"), HumanMessage(content="I'm fine"), AIMessage(content="That's good to hear"), HumanMessage(content='Bye'), AIMessage(content='Bye')])

##### **3.1 Memory types**

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window
  
**What variables get returned from memory?**

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do.

##### **Converstaion Buffer**
This memory allows for storing messages and then extracts the messages in a variable.

In [20]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [21]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

##### **Conversation Buffer Window**
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [22]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

##### **4. Chain**

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [23]:
#%cd /Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Code/06 - RAG/code-along/models
#!git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [24]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/models/fastchat-t5-3b-v1.0'

tokenizer = AutoTokenizer.from_pretrained(
    model_id)

tokenizer.pad_token_id = tokenizer.eos_token_id

bitsandbyte_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
    bnb_4bit_use_double_quant = True
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    #quantization_config = bitsandbyte_config, #caution Nvidia
    #device_map = 'auto',
    torch_dtype=torch.float16,
    device_map={"": device}
    #load_in_8bit = True
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


##### **[Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)**

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


##### **Question Generator**

In [25]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [26]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [27]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [28]:
query = 'Comparing both of them'
chat_history = "Human:How old are you?\nAI:\nHuman:What is your Date of Birth?\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:How old are you?
AI:
Human:What is your Date of Birth?
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'Human:How old are you?\nAI:\nHuman:What is your Date of Birth?\nAI:',
 'question': 'Comparing both of them',
 'text': '<pad> What  is  your  Date  of  Birth?\n'}

#### **Combined Docs Chain**

In [29]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Hi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. \n    I ensure responses are accurate and cite relevant sources.\n    I'll be sure to answer you to the best of my ability!!\n    Context: {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x35120e8e0>)), document_variable_name='context')

In [30]:
query = "What is your age?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. 
    I ensure responses are accurate and cite relevant sources.
    I'll be sure to answer you to the best of my ability!!
    Context: record in technical leadership and consultancy roles, my aim is to drive digital transformation initiatives 
and enhance organizational capabilities in utilizing data-driven strategies, artificial intelligence, and 
machine learning for impactful decision-making and sustainable development.  
  
PERSONAL DETAILS  
Date of Birth:  
23rd Dec 1988 A.D.  
Gender:  
Male  
Permanent Address:  
Asan Tole, Tansen-2, Palpa, Nepal   
Current Address:  
AIT Thailand  
Nationality:  
Nepali  
Language Known:  
English, Nepali  
  
PROFESSIONAL WORKING EXPERIENCE  
Date: July 08, 2018 – July 15, 2024 
Job Title: Information Man

{'input_documents': [Document(page_content='record in technical leadership and consultancy roles, my aim is to drive digital transformation initiatives \nand enhance organizational capabilities in utilizing data-driven strategies, artificial intelligence, and \nmachine learning for impactful decision-making and sustainable development.  \n  \nPERSONAL DETAILS  \nDate of Birth:  \n23rd Dec 1988 A.D.  \nGender:  \nMale  \nPermanent Address:  \nAsan Tole, Tansen-2, Palpa, Nepal   \nCurrent Address:  \nAIT Thailand  \nNationality:  \nNepali  \nLanguage Known:  \nEnglish, Nepali  \n  \nPROFESSIONAL WORKING EXPERIENCE  \nDate: July 08, 2018 – July 15, 2024 \nJob Title: Information Management Officer - National', metadata={'source': '/Users/sachinmalego/Documents/AIT Course Works/02. Second Semester/02. AI-NLU/Lab Work/Github/Python-fo-Natural-Language-Processing/Assignment/NLP-A6-Lets-Talk-with-Yourself/documents/CV _SachinMalego.pdf', 'file_path': '/Users/sachinmalego/Documents/AIT Course W

In [31]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Hi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. \n    I ensure responses are accurate and cite relevant sources.\n    I'll be sure to answer you to the best of my ability!!\n    Context: {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x35120e8e0>)), document_variable_name='context'), question_generator=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a s

In [32]:
prompt_question = "Who are you by the way?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. 
    I ensure responses are accurate and cite relevant sources.
    I'll be sure to answer you to the best of my ability!!
    Context: record in technical leadership and consultancy roles, my aim is to drive digital transformation initiatives 
and enhance organizational capabilities in utilizing data-driven strategies, artificial intelligence, and 
machine learning for impactful decision-making and sustainable development.  
  
PERSONAL DETAILS  
Date of Birth:  
23rd Dec 1988 A.D.  
Gender:  
Male  
Permanent Address:  
Asan Tole, Tansen-2, Palpa, Nepal   
Current Address:  
AIT Thailand  
Nationality:  
Nepali  
Language Known:  
English, Nepali  
  
PROFESSIONAL WORKING EXPERIENCE  


{'question': 'Who are you by the way?',
 'chat_history': [],
 'answer': '<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n',
 'source_documents': [Document(page_content='record in technical leadership and consultancy roles, my aim is to drive digital transformation initiatives \nand enhance organizational capabilities in utilizing data-driven strategies, artificial intelligence, and \nmachine learning for impactful decision-making and sustainable development.  \n  \nPERSONAL DETAILS  \nDate of Birth:  \n23rd Dec 1988 A.D.  \nGender:  \nMale  \nPermanent Address:  \nAsan Tole, Tansen-2, Palpa, Nepal   \nCurrent Address:  \nAIT Thailand  \nNationality:  \nNepali  \nLanguage Known:  \nEnglish, Nepali  \n  \nPROFESSIONAL WORKING EXPERIENCE  \nDate: July 08, 20

#### **Task 2. Analysis and Problem Solving**
1) Provide a list of the retriever and generator models you have utilized. (0.25 point)

2) Analyze any issues related to the models providing unrelated information. (0.25 point)

**Note:** RAG utilizes two models: a retriever model and a generator model. Therefore, when performing
your analysis, make sure to evaluate and analyze both models, not just one.

**Answers:** 
1. The list of retriever and generator models that I have used are:
    - **Retriever Models:**
        - Embedding Model used for this task is `hkunlp/instructor-base` from the HuggingFaceInstructEmbeddings
        - Vector Store used for this task is `faiss`
    - **Generator Model:**
        - The generator model used for this task is `Fast Chat Model` from the HuggingFacePipeline.
    
2. **Issues related to the models:**
    - **Unrelated Information:**
        - The model may give unrelated information to the user.
    - **Inaccurate Information:**
        - The model may give inaccurate information to the user.
    - **Missing Information:**
        - The model may not give all the information to the user.
    - **Duplicate Information:**
        - The model may give duplicate information to the user.
    - **Generation Errors:**
        - The model may generate errors in the response.
    - **Inconsistent Responses:**
        - The model may give inconsistent responses to the user.
    - **Lack of Contextual Information:**
        - The model may not give contextual information to the user.

Here’s a table summarizing the limitations of the retriever and generator models, along with possible solutions:  

| **Component**       | **Model Used**                  | **Limitations**                                                                                                                                   | **Possible Solutions**                                                                                               |
|---------------------|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|
| **Retriever Model** | `hkunlp/instructor-base`      | - May not generalize well to highly domain-specific queries. <br> - Embedding quality depends on training data. <br> - Struggles with complex queries requiring reasoning. | - Fine-tune the model on domain-specific data. <br> - Use hybrid retrieval (dense + keyword-based search).         |
| **Vector Store**    | `faiss`                        | - Doesn't support distributed indexing natively. <br> - Struggles with dynamic data updates (requires re-indexing). <br> - Lacks metadata filtering capabilities. | - Use alternatives like `Weaviate` or `Pinecone` for real-time updates. <br> - Implement approximate nearest neighbors (ANN) tuning. |
| **Generator Model** | `Fast Chat Model` (HuggingFacePipeline) | - May generate hallucinated responses. <br> - Performance varies with prompt engineering. <br> - Computationally expensive for long responses. | - Implement retrieval-augmented generation (RAG) for factual accuracy. <br> - Optimize inference with quantization techniques. |

#### **Task 3. Chatbot Development - Web Application Development - Develop a web application that demonstrates a chatbot.**

1) The application should feature a chat interface with an input box where users can type messages.

2) Based on the user input, the model should generate coherent responses and also provide relevant source documents that support the generated response. For example, if the user types ”How old are you?”, the model might generate a concise summary along with links to related articles or documents. (0.5 point)

**Note:** You are encouraged to use any available resources related to your personal information, and ensure the chatbot provides accurate and relevant information.

**Answer:**

<h5><b>Application Interfaces</b></h5>

<p align="left">
  <img src="./screenshots/Screenshot_RAG1.png" width="30%">
  <img src="./screenshots/Screenshot_RAG2.png" width="30%">
  <img src="./screenshots/Screenshot_RAG3.png" width="30%">
</p>

<p align="left">
  <img src="./screenshots/Screenshot_RAG4.png" width="30%">
  <img src="./screenshots/Screenshot_RAG5.png" width="30%">
  <img src="./screenshots/Screenshot_RAG6.png" width="30%">
</p>

<p align="left">
  <img src="./screenshots/Screenshot_RAG7.png" width="30%">
  <img src="./screenshots/Screenshot_RAG8.png" width="30%">
  <img src="./screenshots/Screenshot_RAG9.png" width="30%">
</p>

<p align="left">
  <img src="./screenshots/Screenshot_RAG10.png" width="30%">
  <img src="./screenshots/Screenshot_RAG11.png" width="30%">
  <img src="./screenshots/Screenshot_RAG12.png" width="30%">
</p>

<p align="left">
  <img src="./screenshots/Screenshot_RAG13.png" width="30%">
</p>


<h5><b>Working of the Application</b></h5>

<a href="">Link to the application video</a>

<p align="left">
  <img src="./screenshots/Video.gif" width="50%">
</p>

#### **Below are 10 questions your chatbot should be able to answer:**
1) How old are you?

2) What is your highest level of education?

3) What major or field of study did you pursue during your education?

4) How many years of work experience do you have?

5) What type of work or industry have you been involved in?

6) Can you describe your current role or job responsibilities?

7) What are your core beliefs regarding the role of technology in shaping society?

8) How do you think cultural values should influence technological advancements?

9) As a master’s student, what is the most challenging aspect of your studies so far?

10) What specific research interests or academic goals do you hope to achieve during your time as a master’s student?

**Submission Instructions:** For each question, your chatbot should generate a response. Please submit the question-answer pairs to your Github repository in the following JSON format:

`[
{  
"question": "How old are you?",  
"answer": "Your answer here"  
},  
{  
"question": "What is your highest level of education?",  
"answer": "Your answer here"  
},  
...  
]`

**Make sure that each question and corresponding answer is properly formatted in the JSON structure. This will be part of your deliverables. (0.5 point)**

In [33]:
prompt_question = "How old are you?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content='<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n')]
Follow Up Input: How old are you?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHi! I am an AI - RAGBot assistant that answers questions about the user based on the available documents. 
    I ensur

{'question': 'How old are you?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content='<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n')],
 'answer': '<pad>  23rd  Dec  1988  A.D.\n',
 'source_documents': [Document(page_content='record in technical leadership and consultancy roles, my aim is to drive digital transformation initiatives \nand enhance organizational capabilities in utilizing data-driven strategies, artificial intelligence, and \nmachine learning for impactful decision-making and sustainable development.  \n  \nPERSONAL DETAILS  \nDate of Birth:  \n23rd Dec 1988 A.D.  \nGender:  \nMale  \nPermanent Address:  \nAsan Tole, Tansen-2, Palpa, Nepal   \nCurrent Address:  \nAIT Thailand  \nNationality:  \nNepali  \

In [34]:
prompt_question = "What is your highest level of education?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content='<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n'), HumanMessage(content='How old are you?'), AIMessage(content='<pad>  23rd  Dec  1988  A.D.\n')]
Follow Up Input: What is your highest level of education?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3

{'question': 'What is your highest level of education?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content='<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n'),
  HumanMessage(content='How old are you?'),
  AIMessage(content='<pad>  23rd  Dec  1988  A.D.\n')],
 'answer': '<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n',
 'source_documents': [Document(page_content='(Division)  \nM.Sc. DSAI \nAIT \nAsian Institute of Technology, \nThailand \nCurrent \n \nB.Sc. CSIT  \nT.U.  \nSt. Xavier’s College Maitighar,  \nKathmandu  \n2013 A.D.  \n81.36%  \n(Distinction)  \n10+2  \nHSEB  \nMillennium H.S.S. Tansen, \nPalpa  \n2007 A.D.  \n64.1%  \n(Ist div.)  \nS.L.C.  \nHMG  \nNew Horizon Eng. Bo. Sec. School

In [35]:
prompt_question = "What major or field of study did you pursue during your education?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content='<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n'), HumanMessage(content='How old are you?'), AIMessage(content='<pad>  23rd  Dec  1988  A.D.\n'), HumanMessage(content='What is your highest level of education?'), AIMessage(content='<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n')]
Follow Up Input: What major or field of study did you pursue during your education?
S

{'question': 'What major or field of study did you pursue during your education?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content='<pad>  I  am  an  AI  -  RAGBot  assistant  that  answers  questions  about  the  user  based  on  the  available  documents.  I  am  not  a  human  being  and  do  not  have  personal  experiences  or  identity.  I  am  here  to  assist  you  with  any  questions  you  may  have.\n'),
  HumanMessage(content='How old are you?'),
  AIMessage(content='<pad>  23rd  Dec  1988  A.D.\n'),
  HumanMessage(content='What is your highest level of education?'),
  AIMessage(content='<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n')],
 'answer': '<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n',
 'source_documents': [Document(page_content='(Divis

In [36]:
prompt_question = "Can you describe your current role or job responsibilities?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='How old are you?'), AIMessage(content='<pad>  23rd  Dec  1988  A.D.\n'), HumanMessage(content='What is your highest level of education?'), AIMessage(content='<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n'), HumanMessage(content='What major or field of study did you pursue during your education?'), AIMessage(content='<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n')]
Follow Up Input: Can you describe your current role or job responsibilities?
Standalone question:[0m

[1m> Fin

{'question': 'Can you describe your current role or job responsibilities?',
 'chat_history': [HumanMessage(content='How old are you?'),
  AIMessage(content='<pad>  23rd  Dec  1988  A.D.\n'),
  HumanMessage(content='What is your highest level of education?'),
  AIMessage(content='<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n'),
  HumanMessage(content='What major or field of study did you pursue during your education?'),
  AIMessage(content='<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n')],
 'answer': '<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  National  with  Catholic  Relief  Services  -  Nepal.  My  main  responsibilities  include  developing  and  administering  a  monitoring  tool  and  key  performance  indicators  for  Information  M

In [37]:
prompt_question = "What type of work or industry have you been involved in?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What is your highest level of education?'), AIMessage(content='<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n'), HumanMessage(content='What major or field of study did you pursue during your education?'), AIMessage(content='<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n'), HumanMessage(content='Can you describe your current role or job responsibilities?'), AIMessage(content='<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  N

{'question': 'What type of work or industry have you been involved in?',
 'chat_history': [HumanMessage(content='What is your highest level of education?'),
  AIMessage(content='<pad>  M.Sc.  DSAI  AIT  Asian  Institute  of  Technology,  Thailand\n'),
  HumanMessage(content='What major or field of study did you pursue during your education?'),
  AIMessage(content='<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n'),
  HumanMessage(content='Can you describe your current role or job responsibilities?'),
  AIMessage(content='<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  National  with  Catholic  Relief  Services  -  Nepal.  My  main  responsibilities  include  developing  and  administering  a  monitoring  tool  and  key  performance  indicators  for  Information  Manageme

In [38]:
prompt_question = "What are your core beliefs regarding the role of technology in shaping society?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What major or field of study did you pursue during your education?'), AIMessage(content='<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n'), HumanMessage(content='Can you describe your current role or job responsibilities?'), AIMessage(content='<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  National  with  Catholic  Relief  Services  -  Nepal.  My  main  responsibilities  include  developing  and  administering  a  monitoring  tool  and  key  perfo

{'question': 'What are your core beliefs regarding the role of technology in shaping society?',
 'chat_history': [HumanMessage(content='What major or field of study did you pursue during your education?'),
  AIMessage(content='<pad>  B.Sc.  CSIT  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  from  St.  Xavier’s  College,  Maitighar,  Kathmandu,  where  he  graduated  with  distinction,  securing  81.36%.\n'),
  HumanMessage(content='Can you describe your current role or job responsibilities?'),
  AIMessage(content='<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  National  with  Catholic  Relief  Services  -  Nepal.  My  main  responsibilities  include  developing  and  administering  a  monitoring  tool  and  key  performance  indicators  for  Information  Management  Officers  (IMOs),  ensuring  regular  capacity  appraisals  and  performance  evaluations  to  maintain  effectiveness.  I  also  implem

In [39]:
prompt_question = "How do you think cultural values should influence technological advancements?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Can you describe your current role or job responsibilities?'), AIMessage(content='<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  National  with  Catholic  Relief  Services  -  Nepal.  My  main  responsibilities  include  developing  and  administering  a  monitoring  tool  and  key  performance  indicators  for  Information  Management  Officers  (IMOs),  ensuring  regular  capacity  appraisals  and  performance  evaluations  to  maintain  effectiveness.  I  also  implement  training  sessions  to  support  staff  in  various  data-related  tasks,  including  data  processing,  analysis,  GIS  mapping,  and  p

{'question': 'How do you think cultural values should influence technological advancements?',
 'chat_history': [HumanMessage(content='Can you describe your current role or job responsibilities?'),
  AIMessage(content='<pad> My  current  role  or  job  responsibilities  are  as  an  Information  Management  Officer  -  National  with  Catholic  Relief  Services  -  Nepal.  My  main  responsibilities  include  developing  and  administering  a  monitoring  tool  and  key  performance  indicators  for  Information  Management  Officers  (IMOs),  ensuring  regular  capacity  appraisals  and  performance  evaluations  to  maintain  effectiveness.  I  also  implement  training  sessions  to  support  staff  in  various  data-related  tasks,  including  data  processing,  analysis,  GIS  mapping,  and  production  of  information  management  products.  I  also  engage  with  government  authorities,  Partner  Organizations  (POs),  and  other  stakeholders  to  facilitate  collaboration  and

In [40]:
prompt_question = "As a master’s student, what is the most challenging aspect of your studies so far?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What type of work or industry have you been involved in?'), AIMessage(content='<pad>  I  have  been  involved  in  the  field  of  software  development  and  data  management.  I  have  worked  in  various  organizations  such  as  Web  Fusion  Nepal,  NDRRMA,  and  Web  Fusion  Nepal  as  a  Web  Developer  and  Data  Management  Officer.  I  have  also  provided  consultancy  services  to  national  and  international  organizations  on  data  visualization,  system  documentation,  and  disaster  management.\n'), HumanMessage(content='What are your core beliefs regarding the role of technology in shaping society?'), AIMessage(content='<pad> Sachin  believes  i

{'question': 'As a master’s student, what is the most challenging aspect of your studies so far?',
 'chat_history': [HumanMessage(content='What type of work or industry have you been involved in?'),
  AIMessage(content='<pad>  I  have  been  involved  in  the  field  of  software  development  and  data  management.  I  have  worked  in  various  organizations  such  as  Web  Fusion  Nepal,  NDRRMA,  and  Web  Fusion  Nepal  as  a  Web  Developer  and  Data  Management  Officer.  I  have  also  provided  consultancy  services  to  national  and  international  organizations  on  data  visualization,  system  documentation,  and  disaster  management.\n'),
  HumanMessage(content='What are your core beliefs regarding the role of technology in shaping society?'),
  AIMessage(content='<pad> Sachin  believes  in  the  transformative  power  of  technology  in  shaping  society.  He  advocates  for  ethical  AI  practices,  ensuring  technological  advancements  align  with  cultural  values

In [41]:
prompt_question = "What specific research interests or academic goals do you hope to achieve during your time as a master’s student?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What are your core beliefs regarding the role of technology in shaping society?'), AIMessage(content='<pad> Sachin  believes  in  the  transformative  power  of  technology  in  shaping  society.  He  advocates  for  ethical  AI  practices,  ensuring  technological  advancements  align  with  cultural  values  and  community  needs.  He  believes  that  technology  can  be  used  to  build  resilient  communities,  develop  intelligent  systems  that  enhance  disaster  risk  assessment  and  response,  and  contribute  to  sustainable  development  and  community  resilience.  He  believes  that  technology  can  be  used  to  enhance  disaster  risk  assessment 

{'question': 'What specific research interests or academic goals do you hope to achieve during your time as a master’s student?',
 'chat_history': [HumanMessage(content='What are your core beliefs regarding the role of technology in shaping society?'),
  AIMessage(content='<pad> Sachin  believes  in  the  transformative  power  of  technology  in  shaping  society.  He  advocates  for  ethical  AI  practices,  ensuring  technological  advancements  align  with  cultural  values  and  community  needs.  He  believes  that  technology  can  be  used  to  build  resilient  communities,  develop  intelligent  systems  that  enhance  disaster  risk  assessment  and  response,  and  contribute  to  sustainable  development  and  community  resilience.  He  believes  that  technology  can  be  used  to  enhance  disaster  risk  assessment  and  response,  and  contribute  to  sustainable  development  and  community  resilience.  He  also  believes  that  technology  can  be  used  to  enhanc

In [44]:
from langchain.chains import RetrievalQA

# Create RAG pipeline
qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff",
    retriever = retriever,
    return_source_documents = True,
    chain_type_kwargs={"prompt": PROMPT}
)

# Function to clean the response
def clean_text(text):
    return text.replace("<pad>", "").replace("\n", " ").strip()

# Function to ask a question and clean the answer
def ask_question(question):
    response = qa_chain.invoke({"query": question})  # Use invoke instead of run
    cleaned_answer = clean_text(response["result"])  # Clean the answer
    return cleaned_answer


# Example usage
questions = [
    "How old is Sachin Malego?",
    "What is Sachin Malego's highest level of education?",
    "What major or field of study did Sachin Malego pursue during his education?",
    "How many years of work experience does Sachin Malego have?",
    "What type of work or industry have Sachin Malego been involved in?",
    "Can you describe Sachin Malego's current role or job responsibilities?",
    "Can you describe Sachin Malego's past role or job responsibilities?",
    "What are Sachin Malego's core beliefs regarding the role of technology in shaping society?",
    "How do you think cultural values should influence technological advancements?",
    "As a master’s student, what is the most challenging aspect of Sachin Malego's studies so far?",
    "What specific research interests or academic goals do you hope to achieve during your time as a master’s student?"
]

# Get answers and clean them
answers = [{"question": q, "answer": ask_question(q)} for q in questions]

# Save responses as JSON
import json

with open("chatbot_responses.json", "w") as f:
    json.dump(answers, f, indent=4)

print("Chatbot responses saved.")

Chatbot responses saved.


##### **JSON Responses**

In [45]:
answers

[{'question': 'How old is Sachin Malego?',
  'answer': 'Sachin  Malego  is  30  years  old.'},
 {'question': "What is Sachin Malego's highest level of education?",
  'answer': "Sachin  Malego's  highest  level  of  education  is  a  Master  of  Science  in  Data  Science  and  Artificial  Intelligence  from  the  Asian  Institute  of  Technology  (AIT),  Thailand."},
 {'question': 'What major or field of study did Sachin Malego pursue during his education?',
  'answer': 'Sachin  Malego  pursued  a  Bachelor  of  Science  in  Computer  Science  and  Information  Technology  (B.Sc.  CSIT)  during  his  education.'},
 {'question': 'How many years of work experience does Sachin Malego have?',
  'answer': 'Based  on  the  information  provided  in  the  resume,  Sachin  Malego  has  over  10  years  of  work  experience.'},
 {'question': 'What type of work or industry have Sachin Malego been involved in?',
  'answer': 'Sachin  Malego  has  been  involved  in  the  field  of  Information  Sy