# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
# #langchain library
# !pip install langchain==0.1.6
# !pip install langchain-community==0.0.19
# #LLM
# !pip install accelerate==0.25.0
# !pip install transformers==4.36.2
# !pip install bitsandbytes==0.43
# #Text Embedding
# !pip install sentence-transformers==2.2.2
# !pip install InstructorEmbedding==1.0.1
# #vectorstore
# !pip install pymupdf==1.23.8
# !pip install faiss-gpu==1.7.2
# !pip install faiss-cpu==1.7.4

In [2]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [3]:
from langchain.prompts import PromptTemplate

prompt_template = """
    Welcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).
    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.
    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.
    I'll do my best to provide you with accurate and helpful answers to your inquiries.
    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!
    {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="Welcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).\n    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.\n    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.\n    I'll do my best to provide you with accurate and helpful answers to your inquiries.\n    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!\n    {context}\n    Question: {question}\n    Answer:")

In [4]:
PROMPT.format(
    context = "Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.",
    question = "What is Machine Learning"
)

"Welcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).\n    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.\n    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.\n    I'll do my best to provide you with accurate and helpful answers to your inquiries.\n    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!\n    Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.\n    Question: What is Machine Learning\n    Answer:"

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [5]:
from langchain.document_loaders import PyMuPDFLoader

# Path to the folder containing PDF documents
pdf_folder = 'AIT_database'

# List to hold the paths of PDF documents
pdf_paths = []

# Iterate through the PDF files in the folder and collect their paths
for filename in os.listdir(pdf_folder):
    if filename.endswith('.pdf'):
        pdf_paths.append(os.path.join(pdf_folder, filename))

# List to hold the loaded documents
documents = []

# Create a PyMuPDFLoader instance for each PDF file
for pdf_path in pdf_paths:
    loader = PyMuPDFLoader(pdf_path)
    loaded_doc = loader.load()
    for page in loaded_doc:
        documents.append(page)

In [6]:
# from langchain.document_loaders import PyMuPDFLoader

# nlp_docs = './AIT.pdf'

# loader = PyMuPDFLoader(nlp_docs)
# documents = loader.load()

In [7]:
# documents

In [8]:
len(documents)

47

In [9]:
documents[2]

Document(page_content='Link: http://giving.ait.ac.th/student-conference-fund-4/ Document: Student Conference Fund – AIT\nFundraising Campaign Skip to content AIT Fundraising Campaign Social Impact with Innovation Menu\nPrimary menu AIT Transforming AIT AIT Five Key Thematic Areas Single Naming of AIT Conference\nCenter (AITCC): A Complete Renovation Naming Opportunities & Donor Recognition How to Donate\nGTE laboratory campaign AIT Alumni Scholarships/Student Exchange/School Lab Fund AIT Alumni\nScholarship Fund Student Conference Fund Scholarships, Lab Equipment and Campus Facilities SOM\nAccreditation Fund Student Exchange Scholarship Program Upgrading of the AIT International School\nCanteen Donors Contact Us Student Conference Fund Donation Pledge for Student Conference Fund\nThe Student Conference Fund will be primarily used to support our PhD students who receive\ninvitations to attend prestigious conferences, workshop, seminar, etc. overseas or via webinars. This\nwill help the 

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [11]:
doc[2]

Document(page_content='1974 •Key activities • Graduation Ceremonies and Cultural Shows • International/Local\nConferences/Seminars/Training programs • Intergovernmental events •Current Equipment and\nFacilities • 500-Seat Auditorium • Continue Reading Naming Opportunities and Donor Recognition\nPosted on February 22, 2019 By alumniaffairs To express appreciation to our donors, AIT provides\nmany types of naming recognition opportunities. Naming of the building will be for the life of the\nbuilding. Same for facilities. Naming of endowed professorships will be perpetual. Naming of\nScholarships and Continue Reading OFFICE OF ADVANCEMENT AND ALUMNI AFFAIRS Message', metadata={'source': 'AIT_database\\Document_1.pdf', 'file_path': 'AIT_database\\Document_1.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.4', 'title': '(anonymous)', 'author': '(anonymous)', 'subject': '(unspecified)', 'keywords': '', 'creator': '(unspecified)', 'producer': 'ReportLab PDF Library - www.reportlab.com', 'c

In [12]:
len(doc)

237

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [13]:
import torch
from langchain_community.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

  from tqdm.autonotebook import trange


load INSTRUCTOR_Transformer
max_seq_length  512


### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [14]:
#locate vectorstore
vector_path = './vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [15]:
#save vector locally
from langchain_community.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_ait'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [16]:
#calling vector from local
vector_path = './vector-store'
db_file_name = 'nlp_ait'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp' #default index
)   

In [17]:
#ready to use
retriever = vectordb.as_retriever()

In [18]:
retriever.get_relevant_documents("What is Dependency Parsing")

[Document(page_content='Communications Co., Ltd. 1 CPU for SERD at the student’s computer lab. 4 URC Thailand Co., Ltd. 2\nbox each: Fun-O Emoji, Dewberry Strawberry, Magic Twin Jumbo Butter for students at Organizational\nQuarantine facility and AIT International School 5 Mr. Umer H. Al-Qureshi 50 Rapid Antigen test kits for\nCovid-19 6 i-bitz Co., Ltd. Vallaris software to RS&GIS; program Books S.N. DONORS TITLE OF\nBOOKS USAGE 1 AITAA Hong Kong & Macau 1. China’s Pan-Pearl River Delta: Regional Cooperation\nand Development 2. The Social Embeddedness of Industrial Ecology 3. Governance and planning of\nMega-City Regions: an international comparative perspective 4. AIT Celebration of The 30th (2 copies)', metadata={'source': 'AIT_database\\Document_27.pdf', 'file_path': 'AIT_database\\Document_27.pdf', 'page': 0, 'total_pages': 1, 'format': 'PDF 1.4', 'title': '(anonymous)', 'author': '(anonymous)', 'subject': '(unspecified)', 'keywords': '', 'creator': '(unspecified)', 'producer': 'R

In [19]:
retriever.get_relevant_documents("What is Transformers")

[Document(page_content='1 Sheila Jay Demafeliz Wee Eng (GTE’86) Purchase GTE Laboratory equipment 2 Mr. Shih-Yi George\nChen Purchase GTE Laboratory equipment 3 Mr. Chawalit Tanomtin Purchase GTE Laboratory\nequipment 4 Dr. Thumanoon Susumpow Purchase STE Laboratory equipment 5 Continue Reading\nCategories Transforming AIT AIT Faculty and Staff Donations Posted on July 3, 2020 June 30, 2022\nCategories Transforming AIT Search for: Search Office of Advancement and Alumni Affairs P.O. Box 4,\n58 M.9 Klong Neung Klong Luang, Pathumthani 12120 Thailand Tel: 0(+66)2524- 6318 and 6336\nEmail: oaaa@ait.ac.th/ aru@ait.ac.th Copyright © 2024 AIT Fundraising Campaign . All Rights', metadata={'source': 'AIT_database\\Document_34.pdf', 'file_path': 'AIT_database\\Document_34.pdf', 'page': 0, 'total_pages': 1, 'format': 'PDF 1.4', 'title': '(anonymous)', 'author': '(anonymous)', 'subject': '(unspecified)', 'keywords': '', 'creator': '(unspecified)', 'producer': 'ReportLab PDF Library - www.reportla

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [20]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

ChatMessageHistory(messages=[])

In [21]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [22]:
history

ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [23]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [24]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [25]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [None]:
#locate models
model_path = './models'
if not os.path.exists(model_path):
    os.makedirs(model_path)
    print('create path done')

In [26]:
%cd ./models
!git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [27]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = './models/fastchat-t5-3b-v1.0/'

tokenizer = AutoTokenizer.from_pretrained(
    model_id)

tokenizer.pad_token_id = tokenizer.eos_token_id

bitsandbyte_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
    bnb_4bit_use_double_quant = True
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    quantization_config = bitsandbyte_config, #caution Nvidia
    device_map = 'cuda:0',
    load_in_8bit = True
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [28]:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# from transformers import BitsAndBytesConfig
# from langchain import HuggingFacePipeline
# import torch

# model_id = './models/gpt2-span-head-few-shot-k-16-finetuned-squad-seed-0/'

# tokenizer = AutoTokenizer.from_pretrained(
#     model_id)

# tokenizer.pad_token_id = tokenizer.eos_token_id

# bitsandbyte_config = BitsAndBytesConfig(
#     load_in_4bit=True,
#     bnb_4bit_quant_type="nf4",
#     bnb_4bit_compute_dtype=torch.float16,
#     bnb_4bit_use_double_quant=True
# )

# model = AutoModelForCausalLM.from_pretrained(
#     model_id,
#     quantization_config=bitsandbyte_config,
#     device_map='cuda:0',
#     load_in_8bit=True
# )

# pipe = pipeline(
#     model=model,
#     tokenizer=tokenizer,
#     task="text-generation",
#     max_new_tokens=256,
#     model_kwargs={
#         "temperature": 0,
#         "repetition_penalty": 1.5
#     }
# )

# llm = HuggingFacePipeline(pipeline=pipe)


### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [29]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [30]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [31]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [32]:
query = 'Comparing both of them'
chat_history = "Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

  warn_deprecated(




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:What is Machine Learning
AI:
Human:What is Deep Learning
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:',
 'question': 'Comparing both of them',
 'text': '<pad> What  is  the  difference  between  Machine  Learning  and  Deep  Learning  AI?\n'}

`combine_docs_chain`

In [33]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Welcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).\n    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.\n    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.\n    I'll do my best to provide you with accurate and helpful answers to your inquiries.\n    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x000002B279560F90>)), document_variable_name='context')

In [34]:
query = "What is Transformers?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWelcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).
    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.
    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.
    I'll do my best to provide you with accurate and helpful answers to your inquiries.
    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!
    Link: https://twitter.com/aitasia Document: This browser is no longer supported. Please switch to a
supported browser to continue using twitter.com. You can see a list of supported browsers in our Help
Center. Help Center Terms of Service Privacy Policy Cookie Policy Imprint Ads info © 2024 X Corp.

1 Sheila Jay Demafeliz Wee 

{'input_documents': [Document(page_content='Link: https://twitter.com/aitasia Document: This browser is no longer supported. Please switch to a\nsupported browser to continue using twitter.com. You can see a list of supported browsers in our Help\nCenter. Help Center Terms of Service Privacy Policy Cookie Policy Imprint Ads info © 2024 X Corp.', metadata={'source': 'AIT_database\\Document_2.pdf', 'file_path': 'AIT_database\\Document_2.pdf', 'page': 0, 'total_pages': 1, 'format': 'PDF 1.4', 'title': '(anonymous)', 'author': '(anonymous)', 'subject': '(unspecified)', 'keywords': '', 'creator': '(unspecified)', 'producer': 'ReportLab PDF Library - www.reportlab.com', 'creationDate': "D:20240320144950-07'00'", 'modDate': "D:20240320144950-07'00'", 'trapped': ''}),
  Document(page_content='1 Sheila Jay Demafeliz Wee Eng (GTE’86) Purchase GTE Laboratory equipment 2 Mr. Shih-Yi George\nChen Purchase GTE Laboratory equipment 3 Mr. Chawalit Tanomtin Purchase GTE Laboratory\nequipment 4 Dr. Thum

In [35]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Welcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).\n    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.\n    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.\n    I'll do my best to provide you with accurate and helpful answers to your inquiries.\n    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGen

## 5. Chatbot

In [38]:
prompt_question = "What is AIT?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWelcome to AITBot, your virtual assistant for all things related to the Asian Institute of Technology (AIT).
    Whether you're a student, faculty member, or simply curious about AIT, I'm here to provide you with information and assistance.
    Feel free to ask me anything about AIT, including its programs, research areas, campus facilities, and more.
    I'll do my best to provide you with accurate and helpful answers to your inquiries.
    So go ahead, ask me anything you'd like to know about AIT, and let's explore together!
    AIT places great emphasis on various issues of security for future population. Various research studies
on campus now focus on securing stable food, energy and water resources to ensure sustainable
future developments. Example a study in collaboration with the Brit

{'question': 'What is AIT?',
 'chat_history': [],
 'answer': '<pad>  Asian  Institute  of  Technology  (AIT)  is  a  private  university  located  in  Bangkok,  Thailand.  It  was  founded  in  1959  and  is  known  for  its  innovative  programs  in  engineering,  management,  and  technology.  AIT  offers  undergraduate  and  graduate  programs  in  various  fields,  including  engineering,  management,  and  technology.  The  university  also  offers  a  number  of  programs  in  business  and  management,  as  well  as  a  number  of  programs  in  the  arts  and  humanities.  AIT  is  known  for  its  commitment  to  social  impact  and  its  focus  on  sustainability  and  environmental  sustainability.\n',
 'source_documents': [Document(page_content='AIT places great emphasis on various issues of security for future population. Various research studies\non campus now focus on securing stable food, energy and water resources to ensure sustainable\nfuture developments. Example a s

In [39]:
prompt_question = "What scholarships does AIT offer?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What is AIT?'), AIMessage(content='<pad>  Asian  Institute  of  Technology  (AIT)  is  a  private  university  located  in  Bangkok,  Thailand.  It  was  founded  in  1959  and  is  known  for  its  innovative  programs  in  engineering,  management,  and  technology.  AIT  offers  undergraduate  and  graduate  programs  in  various  fields,  including  engineering,  management,  and  technology.  The  university  also  offers  a  number  of  programs  in  business  and  management,  as  well  as  a  number  of  programs  in  the  arts  and  humanities.  AIT  is  known  for  its  commitment  to  social  impact  and  its  focus  on  sustainability  and  environment

{'question': 'What scholarships does AIT offer?',
 'chat_history': [HumanMessage(content='What is AIT?'),
  AIMessage(content='<pad>  Asian  Institute  of  Technology  (AIT)  is  a  private  university  located  in  Bangkok,  Thailand.  It  was  founded  in  1959  and  is  known  for  its  innovative  programs  in  engineering,  management,  and  technology.  AIT  offers  undergraduate  and  graduate  programs  in  various  fields,  including  engineering,  management,  and  technology.  The  university  also  offers  a  number  of  programs  in  business  and  management,  as  well  as  a  number  of  programs  in  the  arts  and  humanities.  AIT  is  known  for  its  commitment  to  social  impact  and  its  focus  on  sustainability  and  environmental  sustainability.\n')],
 'answer': "<pad> < pad>  AIT  offers  several  scholarships  to  students  who  are  interested  in  pursuing  a  Master's  degree  program.  Some  of  the  scholarships  offered  by  AIT  include:\n 1.  Full 