Edvin Yang  
st124277@ait.asia


# Natural Language Processing

# Retrieval-Augmented generation (RAG)

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
# Attaching cloud drive 
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
#langchain library
!pip install langchain==0.1.0
#LLM
!pip install accelerate==0.25.0
!pip install transformers==4.36.2
!pip install bitsandbytes==0.41.2
#Text Embedding
!pip install sentence-transformers==2.2.2
!pip install InstructorEmbedding==1.0.1
#vectorstore
!pip install pymupdf==1.23.8
!pip install faiss-gpu==1.7.2
!pip install faiss-cpu==1.7.4

Collecting langchain==0.1.0
  Downloading langchain-0.1.0-py3-none-any.whl (797 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/798.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.7/798.0 kB[0m [31m8.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m788.5/798.0 kB[0m [31m12.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m798.0/798.0 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.0)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain==0.1.0)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.9 (from langchain==0.1.0)
  Downloading langchain_community-0.0.29-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━

In [3]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [4]:
from langchain import PromptTemplate

prompt_template = """
    I'm your GTP bot named AIT GPT.
    I will assist you around to let you know more about AIT.
    Please feel free to ask any questions regarding AIT.

    Context: {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="I'm your GTP bot named AIT GPT. \n    I will assist you around to let you know more about AIT.\n    Please feel free to ask any questions regarding AIT. \n\n    Context: {context}\n    Question: {question}\n    Answer:")

In [5]:
PROMPT.format(
    context = "Situated north of Bangkok, Thailand, AIT boasts a stunning main campus. It functions as a self-sufficient international community, embracing a cosmopolitan ethos in both lifestyle and education.",
    question = "What is the location of AIT"
)

"I'm your GTP bot named AIT GPT. \n    I will assist you around to let you know more about AIT.\n    Please feel free to ask any questions regarding AIT. \n\n    Context: Situated north of Bangkok, Thailand, AIT boasts a stunning main campus. It functions as a self-sufficient international community, embracing a cosmopolitan ethos in both lifestyle and education.\n    Question: What is the location of AIT\n    Answer:"

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code).
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [6]:
file_path = '/content/drive/MyDrive/Colab Notebooks/GPT'

In [7]:
# Load pdf document
from langchain.document_loaders import PyMuPDFLoader

nlp_docs = f'{file_path}/AIT.pdf'

loader = PyMuPDFLoader(nlp_docs)
documents = loader.load()

In [8]:
# documents

In [9]:
len(documents)

95

In [10]:
documents[1]

Document(page_content=' \n \n \nPREFACE \n \nThe Student Handbook will help you cope with your academic and social life at AIT. \nThere is no doubt that you will get much more if you are well informed on AIT \nregulations, services and facilities. \n \nThis handbook is up-to-date at the time of publishing. However, changes in regulations \nand procedures may be made before the next edition of the Student Handbook is \npublished. Important changes will be announced via email. \nBe well informed and make the best of your life at AIT.  \n \nOffice of Student Affairs  \nAugust 2022\n', metadata={'source': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'file_path': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'page': 1, 'total_pages': 95, 'format': 'PDF 1.6', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'RICOH MP C3004ex', 'producer': 'RICOH MP C3004ex', 'creationDate': "D:20220811112929+08'00'", 'modDate': "D:20220811112021+07'00'", 'trapped': ''})

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [12]:
doc[1]

Document(page_content='TABLE OF CONTENTS \n \n \nI. \nIntroducing AIT \n \nII. \nStudent Bill of Rights \n \nIII. \nStudent Code of Conduct \n \nIV. \nGuidance for New Students \n \nV. \nStudent Welfare Unit and the AIT Career Center \n \nVI. \nHarassment Policy \n \nVII. \nSubstance Abuse Policy \n \nVIII. \nEnvironment Policy \n \nIX. \nStudent Organizations and Student Participation in Institute Governance \n- Student Union \n- Nationality Associations \n- Student Participation in Institute Governance \n \nX. \nCampus Facilities and Services \n- Student Accommodation \n- Visas \n- Banking \n- Dining \n- Sports and Recreation \n- Movies on Campus \n- Religious Services \n- Mails \n- AIT Reception \n- Office of Public Affairs (OPA)', metadata={'source': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'file_path': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'page': 2, 'total_pages': 95, 'format': 'PDF 1.6', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'crea

In [13]:
len(doc)

367

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [14]:
import torch
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

  from tqdm.autonotebook import trange
  _torch_pytree._register_pytree_node(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


.gitattributes:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/270 [00:00<?, ?B/s]

2_Dense/config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/66.2k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/439M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.43k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/461 [00:00<?, ?B/s]

load INSTRUCTOR_Transformer


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


max_seq_length  512


### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [15]:
# Create vector-store folder
vector_path = '../vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('path created')

path created


In [16]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_stanford'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [17]:
# Take vector
vector_path = '../vector-store'
db_file_name = 'nlp_stanford'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp' #default index
)

In [18]:
#ready to use
retriever = vectordb.as_retriever()

In [None]:
# retriever.get_relevant_documents("What is Dependency Parsing")

In [None]:
# retriever.get_relevant_documents("What is Transformers")

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [19]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

ChatMessageHistory(messages=[])

In [20]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [21]:
history

ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios.
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [22]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [23]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time.
- it only uses the last K interactions.
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [24]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available),
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [26]:
# Uploaded fastchat to google drive
# %cd /content/drive/MyDrive/Colab Notebooks/GPT/model
# !git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'


Exiting because of "interrupt" signal.
^C


In [27]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = f'{file_path}/model/fastchat-t5-3b-v1.0/'

tokenizer = AutoTokenizer.from_pretrained(
    model_id)

tokenizer.pad_token_id = tokenizer.eos_token_id

bitsandbyte_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
    bnb_4bit_use_double_quant = True
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    quantization_config = bitsandbyte_config, #caution Nvidia
    device_map = 'auto',
    load_in_8bit = True
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [28]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [29]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [30]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [31]:
query = 'Comparing both of them'
chat_history = "Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

  warn_deprecated(




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:What is Machine Learning
AI:
Human:What is Deep Learning
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:',
 'question': 'Comparing both of them',
 'text': '<pad> What  is  the  difference  between  Machine  Learning  and  Deep  Learning  AI?\n'}

`combine_docs_chain`

In [32]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="I'm your GTP bot named AIT GPT. \n    I will assist you around to let you know more about AIT.\n    Please feel free to ask any questions regarding AIT. \n\n    Context: {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7f6ffb4d3eb0>)), document_variable_name='context')

In [33]:
query = "Where is AIT?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your GTP bot named AIT GPT. 
    I will assist you around to let you know more about AIT.
    Please feel free to ask any questions regarding AIT. 

    Context: AIT’s main campus offers fast and easy access to Bangkok, a city at the crossroads of 
East, Southeast and South Asia. AIT has also established a key learning center in 
Vietnam. 
 
With friends all over the globe, a strong history of academic excellence, and an 
enduring reputation for responding to emerging regional and global challenges such as 
climate change and sustainability, AIT is advancing new understanding and applying 
relevant technological solutions across Asia through its knowledge hub in Thailand. 
 
AIT Quick Facts 
Students: 1,200 + from 40+ countries 
Faculty: 81 internationally recruited Faculty from 19 countries & 134 Adjunct faculty

General 
AIT is located 42 km north

{'input_documents': [Document(page_content='AIT’s main campus offers fast and easy access to Bangkok, a city at the crossroads of \nEast, Southeast and South Asia. AIT has also established a key learning center in \nVietnam. \n \nWith friends all over the globe, a strong history of academic excellence, and an \nenduring reputation for responding to emerging regional and global challenges such as \nclimate change and sustainability, AIT is advancing new understanding and applying \nrelevant technological solutions across Asia through its knowledge hub in Thailand. \n \nAIT Quick Facts \nStudents: 1,200 + from 40+ countries \nFaculty: 81 internationally recruited Faculty from 19 countries & 134 Adjunct faculty', metadata={'source': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'file_path': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'page': 3, 'total_pages': 95, 'format': 'PDF 1.6', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'RICOH MP C3004ex',

In [34]:
memory = ConversationBufferWindowMemory(
    k=3,
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="I'm your GTP bot named AIT GPT. \n    I will assist you around to let you know more about AIT.\n    Please feel free to ask any questions regarding AIT. \n\n    Context: {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7f6ffb4d3eb0>)), document_variable_name='context'), question_generator=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_hi

## 5. Chatbot

In [35]:
prompt_question = "What do you want to know?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your GTP bot named AIT GPT. 
    I will assist you around to let you know more about AIT.
    Please feel free to ask any questions regarding AIT. 

    Context: complainee(s). Findings and recommendations are confidential and shall not be 
made public by the Institute or by any participant in a hearing, including the

TABLE OF CONTENTS 
 
 
I. 
Introducing AIT 
 
II. 
Student Bill of Rights 
 
III. 
Student Code of Conduct 
 
IV. 
Guidance for New Students 
 
V. 
Student Welfare Unit and the AIT Career Center 
 
VI. 
Harassment Policy 
 
VII. 
Substance Abuse Policy 
 
VIII. 
Environment Policy 
 
IX. 
Student Organizations and Student Participation in Institute Governance 
- Student Union 
- Nationality Associations 
- Student Participation in Institute Governance 
 
X. 
Campus Facilit

{'question': 'What do you want to know?',
 'chat_history': [],
 'answer': '<pad> What  are  the  policies  and  procedures  for  handling  complaints  and  disputes  at  AIT?\n',
 'source_documents': [Document(page_content='complainee(s). Findings and recommendations are confidential and shall not be \nmade public by the Institute or by any participant in a hearing, including the', metadata={'source': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'file_path': '/content/drive/MyDrive/Colab Notebooks/GPT/AIT.pdf', 'page': 38, 'total_pages': 95, 'format': 'PDF 1.6', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'RICOH MP C3004ex', 'producer': 'RICOH MP C3004ex', 'creationDate': "D:20220811112929+08'00'", 'modDate': "D:20220811112021+07'00'", 'trapped': ''}),
  Document(page_content='TABLE OF CONTENTS \n \n \nI. \nIntroducing AIT \n \nII. \nStudent Bill of Rights \n \nIII. \nStudent Code of Conduct \n \nIV. \nGuidance for New Students \n \nV. \nStudent Welfar

In [36]:
prompt_question = "How many students are there in AIT?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What do you want to know?'), AIMessage(content='<pad> What  are  the  policies  and  procedures  for  handling  complaints  and  disputes  at  AIT?\n')]
Follow Up Input: How many students are there in AIT?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your GTP bot named AIT GPT. 
    I will assist you around to let you know more about AIT.
    Please feel free to ask any questions regarding AIT. 

    Context: a self-contained international community with a cosmopolitan approach to living and 
learning. 
 
Since 1959, AIT

{'question': 'How many students are there in AIT?',
 'chat_history': [HumanMessage(content='What do you want to know?'),
  AIMessage(content='<pad> What  are  the  policies  and  procedures  for  handling  complaints  and  disputes  at  AIT?\n')],
 'answer': '<pad>  1,200+\n',
 'source_documents': [Document(page_content='a self-contained international community with a cosmopolitan approach to living and \nlearning. \n \nSince 1959, AIT has carried out its mission “to develop highly qualified and committed \nprofessionals who play a leading role in the region’s sustainable development and its \nintegration into the global economy” by supporting technological change and \nsustainable development through higher learning, research, capacity building and \noutreach. \n \nAIT’s renowned degree programs are administered by its School of Engineering and \nTechnology; School of Environment, Resources and Development; and School of \nManagement. Students benefit from challenging academic program

In [37]:
prompt_question = "What courses does AIT provide?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What do you want to know?'), AIMessage(content='<pad> What  are  the  policies  and  procedures  for  handling  complaints  and  disputes  at  AIT?\n'), HumanMessage(content='How many students are there in AIT?'), AIMessage(content='<pad>  1,200+\n')]
Follow Up Input: What courses does AIT provide?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your GTP bot named AIT GPT. 
    I will assist you around to let you know more about AIT.
    Please feel free to ask any questions regarding AIT. 

    Context: a self-contained in

{'question': 'What courses does AIT provide?',
 'chat_history': [HumanMessage(content='What do you want to know?'),
  AIMessage(content='<pad> What  are  the  policies  and  procedures  for  handling  complaints  and  disputes  at  AIT?\n'),
  HumanMessage(content='How many students are there in AIT?'),
  AIMessage(content='<pad>  1,200+\n')],
 'answer': '<pad>  The  courses  offered  at  AIT  are  offered  by  its  School  of  Engineering  and  Technology,  School  of  Environment,  Resources  and  Development,  and  School  of  Management.  These  courses  cover  various  fields  such  as  engineering,  technology,  environment,  resources  and  development,  and  management.\n',
 'source_documents': [Document(page_content='a self-contained international community with a cosmopolitan approach to living and \nlearning. \n \nSince 1959, AIT has carried out its mission “to develop highly qualified and committed \nprofessionals who play a leading role in the region’s sustainable developm

In [38]:
answer['answer']

'<pad>  The  courses  offered  at  AIT  are  offered  by  its  School  of  Engineering  and  Technology,  School  of  Environment,  Resources  and  Development,  and  School  of  Management.  These  courses  cover  various  fields  such  as  engineering,  technology,  environment,  resources  and  development,  and  management.\n'