# Part A: Building a simple RAG-Based Chatbot with LangChain <span style="color:green">**[25 marks]**</span>

<center>
<img src="https://images.ctfassets.net/kftzwdyauwt9/5ca4df8a-bd0c-47e0-7efe1b15187a/f891c43eaec1e52760e3cf7c9902a819/gpt-2-1-5b-release.jpg?w=1920&q=90&fm=webp">
</center>

In Part A of this assignment, you will explore how to build a chatbot using LangChain, Pinecone and HuggingFace APIs. The goal is to create a chatbot that can answer questions based on the content of a PDF document, in our case the LUMS Student Handbook.

---

## 0. Instructions

- Run the entire notebook to ensure everything is working correctly.
- Modify the chatbot's prompt template(s) to suit your specific use cases (e.g., a different context or more detailed instructions), unless stated otherwise.
- Do not use GPT or other AI tools to generate code. Refer to documentations instead. It is important you learn these tools yourself for your course projects.

## 1. Introduction to LangChain and RAG

LangChain is a framework and Python library designed to help you build applications powered by large language models (LLMs). It simplifies every stage of the LLM application lifecycle by providing abstractions for integrating various components in order to make applications easy to deploy and scalable. Various components like document loaders, embeddings, vector databases, chains and more, make up your application's cognitive architecture. You can read more about langchain [here](https://python.langchain.com/docs/introduction/).

RAG, or Retrieval-Augmented Generation, is a technique that enhances the capabilities of LLMs by combining information retrieval with text generation. When a query is posed, a retrieval system searches through a knowledge base, a vast corpus of documents or databases, to find relevant information that can address the query. This additional context is then fed into the language model along with the original query, in order to generate a coherent response. RAG allows us to incorporate up-to-date and specific information that may not have been present in the model's training data, making it particularly valuable for personalized applications.

### Installing Dependencies

Before we dive into the code, ensure that you have the necessary dependencies installed, by either running the cell below, or executing the following command from your preferred terminal.

```bash
pip install langchain langchain-community langchain-huggingface langchain-pinecone pinecone-client python-dotenv streamlit
```

In [11]:
%pip install langchain langchain-community
%pip install langchain-huggingface
%pip install langchain-pinecone pinecone-client
%pip install python-dotenv streamlit

Note: you may need to restart the kernel to use updated packages.
Collecting langchain-huggingfaceNote: you may need to restart the kernel to use updated packages.

  Using cached langchain_huggingface-0.1.0-py3-none-any.whl.metadata (1.3 kB)
Collecting huggingface-hub>=0.23.0 (from langchain-huggingface)
  Using cached huggingface_hub-0.25.1-py3-none-any.whl.metadata (13 kB)
Collecting sentence-transformers>=2.6.0 (from langchain-huggingface)
  Using cached sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Collecting tokenizers>=0.19.1 (from langchain-huggingface)
  Using cached tokenizers-0.20.0-cp312-none-win_amd64.whl.metadata (6.9 kB)
Collecting transformers>=4.39.0 (from langchain-huggingface)
  Using cached transformers-4.45.1-py3-none-any.whl.metadata (44 kB)
Collecting filelock (from huggingface-hub>=0.23.0->langchain-huggingface)
  Using cached filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec>=2023.5.0 (from huggingface-hub>=0.23.0->langchain-h

### Setting Up API Keys

You will need to obtain one API key for [HuggingFace](https://huggingface.co/), and one for [Pinecone](https://www.pinecone.io/). Huggingface is where the LLM we will be using for this assignment is hosted, and Pincecone is the vector database we will be using to store our embeddings and create the knowledge base for our RAG chatbot.  You will be required to create an account to use both these serivces.

- Create a Huggingface account if you do not have one already. Then go to `Profile` > `Edit Profile` > `Access Tokens` > `Create new Token`. Give the token an appropriate name, set the Token type to `Read` and copy the token to your clipboard.
- Create a Pinecone account if you do not have one already. Click on `API keys` and create a key if one doesn't exist already.

Once done, run the following cell to save those API keys into a file called `.env` in the same directory as this notebook. You may create this file manually instead if you prefer.

In [2]:
HUGGINGFACE_API_KEY = "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"  # Replace with your Hugging Face API key
PINECONE_API_KEY = "da1cca19-c5e5-43be-abfc-ec8b7f43ba8c"        # Replace with your Pinecone API key

env_content = f"""
HUGGINGFACE_API_KEY={HUGGINGFACE_API_KEY}
PINECONE_API_KEY={PINECONE_API_KEY}
"""

with open(".env", "w") as file:
    file.write(env_content)

print("Environment variables are saved to .env file.")

Environment variables are saved to .env file.


### Loading the Environment File

Run the following snippet of code to load the environment file each time you use this notebook

In [5]:
import dotenv

dotenv.load_dotenv()

True

## 2. Document Loaders Embeddings

##### Document Loaders
LangChain supports various 'document loaders' which can be used to load data from a provided 'document' or source of data. Document loaders are available for various types of files, such as plaintext, webpages, and even more special use-cases like YouTube video transcripts.

In this assignment, we'll be using `PyMuPDF` to load the contents of a PDF document. This loader treats each page of a given PDF as a separate document and also attaches useful metadata such as the page title, page number and more.

You can read more about Document Loaders at: https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/

We will now load the student handbook present in the `handbook` folder. After loading the document, we use the `CharacterTextSplitter` function to split the documents into smaller chunks of text, with `chunk_size` set to 1000 and `chunk_overlap` set to 4.

In [1]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import CharacterTextSplitter

loader = PyMuPDFLoader('./handbook/Undergraduate Student Handbook 2021-2022.pdf')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=4)
docs = text_splitter.split_documents(documents)



Print a random page or loaded document to see what the extracted data looks like and judge whether the pages were loaded correctly.

In [2]:
print(documents[171].page_content)

 
Academic Advising at SBASSE
 
The SBASSE Undergraduate Student Academic Aﬀairs Oﬃce helps students in planning their academic 
career during their undergraduate degree programme from the start of freshman year to graduaƟon. The 
oﬃce helps the students clarify and implement individual educaƟonal plans that match their skills, interests 
and values, and guide them to achieve their personal, professional and educaƟonal goals. This advising can 
be about course planning and enrolment, major selecƟon, maintaining good academic standing, extra-
curricular acƟviƟes, career guidance or any other issues that require counselling. ParƟcularly in the ﬁrst-
year, students may have quesƟons about how to manage their course load if they want to take courses 
from other schools or do not get any
 
secƟon of a course etc. At SBASSE, students are provided academic 
advising through a faculty advisor and staﬀ advisors in the Dean’s Academic Aﬀairs Oﬃce. Our goal is to 
enable a Ɵmely graduaƟon and enc

## 3. Connecting to Pinecone Database

Pinecone is a vector database with broad functionality. You have already created a pinecone account in the beginning to obtain an API key. We will now be creating a remote vector store associated with your account which will contain the HuggingFace Embeddings.

You can read more about integrating pinecone with LangChain at: https://python.langchain.com/v0.2/docs/integrations/vectorstores/pinecone/

### Connecting to Pinecone

In [6]:
import os
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

### Create Pinecone index if one doesn't exist

After connecting to Pinecone, create an index if one doesn't already exist. An index is the highest-level organizational unit of vector data in Pinecone. It can accept and store vectors and serve queries related to them.

You can read more about Pinecone indexes at: https://docs.pinecone.io/guides/indexes/understanding-indexes


In [7]:
# Defining Index Name
index_name = "handbook-chatbot"

# Checking Index
if index_name not in pc.list_indexes().names():
  # Creating new Index
  pc.create_index(
    name=index_name,
    dimension=768,
    metric="cosine",
    spec=ServerlessSpec(
      cloud="aws",
      region="us-east-1"
    )
  )

You should now see the 'handbook-chatbot' index present in your pinecone database in the browser. Currently there should be no records present in it.

## 4. Creating and Querying the Vector Store
After creating the Pinecone index, we'll store our document embeddings in it and query the database to retrieve relevant information.

### Embeddings
Embeddings are essentially vectors that are used to represent, in this case text, in a form that can be processed by LLMs. This allows us to map text into a vector space based on its semantic sense. This is very useful for us as it allows us to easily find pieces of text which are most similar to each other by using mathematical operations such as cosine similarity. All of the embeddings collectively make up the knowledge base for our LLM application.

There are various models which can be used to generate such embeddings. In this assignment, we use `HuggingFaceEmbeddings` to generate embeddings from the text.

### Adding Document Embeddings to the Vector Store

In [13]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore
from uuid import uuid4

embeddings = HuggingFaceEmbeddings()

index = pc.Index(index_name)

vector_store = PineconeVectorStore(index=index, embedding=embeddings)

uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)



['6feaa00d-bca2-4815-adb1-94dbf5e86c4a',
 'a0a93bb5-54a7-432c-b228-622b4e408990',
 'ca1910cc-e865-4ccc-bc58-6056f8d76b27',
 '88c94d06-f4e4-4df3-8e50-7ece4879aa50',
 '4c71a16a-a42c-4afa-8060-6f9a452d426c',
 'b335ac2f-efec-4f70-ab11-f26738d9b673',
 '748fb8cb-7987-4f4b-964a-75a4e7162c73',
 'e4bf3ae6-5cbe-4d35-a563-5269478c79ef',
 '553dec21-9b77-4d0f-9208-d8ffcf26915a',
 'b98ac557-ed7b-4589-802f-513999d08dcc',
 '09b9f8b4-db95-4962-98b7-c3a6188d9071',
 '215b897d-482f-4e39-8412-9987099bfd05',
 'a5bf6c1f-cd35-4cf6-8d9d-17cca270f86b',
 'c42a572a-5a96-4c02-b611-58834614fb0e',
 'a91a855d-60c7-4ab9-8bb2-ac6d9415163e',
 '63d60d44-74a7-46df-a5a7-fd0e83fe4e99',
 '2a13543a-a086-476e-ad34-739f773e0992',
 'f9598fa9-bff7-4dbc-ad11-b1df69317c88',
 '0ae65740-5392-4317-80ea-4f8375150ace',
 '58a6587d-bc1a-4af7-99ab-8552c7659957',
 '5fc167f7-2a8a-4f33-9cc8-422d757c113b',
 '8e338bb3-e27c-42a6-ab3f-c80bafcc9f9a',
 '9a424c4e-8490-4c6c-88e0-4f383e4819d4',
 '9b8413ff-68e9-4cf7-b80c-17a2b6c5af10',
 '6972a0cc-deb9-

There should now be quite a few embeddings in your Pinecone vector store, along with some metedata for each of them, such as the filepath, what page number the embeddings represent and what was the original text that was used to create the embedding.

### Querying the Vector Store
You can directly query the vector store to retrieve the most relevant pages of our document based on a provided search query. `k` is used to indicate how many records should be retrieved for the given request.

In [14]:
results = vector_store.similarity_search(
    "Grading Policy",
    k=2,
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

*  
The instructor informs students about the weightage
 
assigned to each instrument. This is menƟoned in the 
course outline, and it is used for evaluaƟng student performance in the course.
 
21.2.
 
Grading Policy
 
Course grades are based on cumulaƟve performance in deﬁned instruments. 
 
 
The ﬁnal grades are assigned as follows:
 
 
Table 2
 
 
LeƩer Grades and their Numeric Equivalents
 
 
 
 
LETTER GRADE
 
NUMERIC EQUIVALENT
 
ExcepƟonal
 
A+
 
4.0
 
Outstanding
 
A
 
4.0
 
Excellent
 
A-
 
3.7
 
Very Good
 
B+
 
3.3
 
Good
 
B
 
3.0
 
Average
 
B-
 
2.7
 
SaƟsfactory
 
C+ 
2.3
 
Low Pass
 
C 
2.0
 
Marginal Pass
 
C- 
1.7
 
UnsaƟsfactory
 
D 
1.0
 
Pass
 
*P 
-
 
Fail
 
F 
0.0
 
Withdrawn
 
**W  
-
 
Incomplete 
 
***I
 
-
 
Transfer
 
****T
 
-
 
 
Grading at LUMS is based on relaƟve performance. However, for some courses, absolute grading is used. This 
informaƟon is menƟoned in the course outline.
 
A+
 
and F
 
are absolute grades. The other grades (A
 
to D) may be award

## 5. Building the Chatbot
### Transforming the Vector Store into a Retriever
To simplify the process of retrieving relevant documents in our chatbot, we'll transform the vector store into a retriever. Retrievers are langchain components that will automatically fetch relevant data from our vector database given the user input. You can read more about retrievers here: https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/

Here, we will define our retriever to fetch the two most similar pages on the basis of the cosine similarity score.

In [15]:
retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 2, "score_threshold": 0.5},
)

### Setting up the HuggingFace Model
To provide the LLM, we'll use HuggingFaceHub. HuggingFaceHub is a platform we can connect to and call the model without having to deploy it on our machine. We just define the ID of the model we want to use. In this case, it’s `mistralai/Mixtral-8x7B-Instruct-v0.1`.

In [16]:
from langchain_huggingface import HuggingFaceEndpoint

# Defining the repo ID and connect to Mixtral model on Huggingface
repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
llm = HuggingFaceEndpoint(
  repo_id=repo_id,
  temperature= 0.8,
  top_k= 50,
  huggingfacehub_api_token=os.getenv('HUGGINGFACE_API_KEY')
)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\hp\.cache\huggingface\token
Login successful


## 6. Prompt Engineering and Template Design

Prompt engineering is a key part of working with LLMs. Langchain provides a user-friendly interface to construct complex prompts using 'prompt tempaltes'. This is especially useful when implementing advanced prompting techniques such as few-shot examples which you may have already learnt in the course.

For this section, we'll create a simple prompt template which incorporates the retrieved context and the user's question before sending it to the LLM.

In [17]:
from langchain import PromptTemplate

# DO NOT CHANGE THE FOLLOWING PROMPT TEMPLATE
template = """
You are a chatbot designed to answer questions from LUMS students. LUMS is a university and you have access to the student handbook.
Use following extract from the handbook to answer the question.
If the context doesn't contain any relevant information to the question, then just say "I don't know".
If you don't know the answer, then just say "I don't know".
Do NOT make something up.

Context: {context}
Question: {question}
Answer: 

"""

prompt = PromptTemplate(
  template=template, 
  input_variables=["context", "question"]
)

As you can see, we have two input variables defined: `context` and `question`. When a user prompts our chatbot, their question will replace the '{question}' in our prompt template, and the retrived context from our vector store will replace the '{context}' in our prompt template. Thus the fully completed prompt template will now be sent to the LLM for answering.

## 7. Chaining It All Together

Finally, we'll chain the components together using LangChain's LCEL (LangChain Expression Language) to create a fully functional chatbot. This is a proprietary language which simplifies the process of creating complex chains involving prompts and LLMS. LCEL follows the pipe architecture, where the output from the preceding element is used as the input for the next element using the `|` operator. Writing LCEL is as simple as listing which components we want to be excuted as part of our chain and in what order. For example,

```bash
rag_chain = ( 
            RunnableParallel(context = retriever | format_docs, question = RunnablePassthrough() ) |
            qa_prompt | 
            llm 
)
```

For the following code, a flowchart representation would look like this:

<center>
<img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*dM-V2AQYihP7-FHkjEmQKA.png">
</center>


Learn more about LCEL here: https://python.langchain.com/v0.1/docs/expression_language/

### Creating the ChatBot

In [18]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

rag_chain = (
  {"context": retriever | format_docs,  "question": RunnablePassthrough()} 
  | prompt 
  | llm
  | StrOutputParser() 
)

Let's break down our LCEL chain step-by-step:

1. In the first line, output from our vector store retriever is fed into a function for formatting the document page (so that the chatbot does not need to look at unnecessary metadata) which is then assigned to the `context` input variable we defined earlier. Simultaneously, the input query from the user is assigned to the `question` variable using `RunnablePassthrough`.
2. The output of the previous step is fed into the prompt template.
3. The prompt template from the previous step is passed to the LLM for inference.
4. The output is parsed as a string using `StrOutputParser()`.

### Using the RAG-based ChatBot

We will now test our ChatBot by asking it some highly contextualized questions related to University rules and policy.

In [19]:
question = "What is the grading policy for the university?"
result = rag_chain.invoke(question)
print(result)

The grading policy for LUMS is based on relative performance. However, for some courses, absolute grading is used. This information is mentioned in the course outline. The final grades are assigned as follows: A+ (4.0), A (4.0), A- (3.7), B+ (3.3), B (3.0), B- (2.7), C+ (2.3), C (2.0), C- (1.7), D (1.0), F (0.0), W (-), I (-), T (-). The A+ and F grades are absolute, while the other grades (A to D) may be awarded based on relative performance. The P grade contributes towards earned credits but does not affect the CGPA. The W grade has no numeric equivalence and the credit hours for withdrawn courses will not count towards the credit hours taken in the semester. The I grade is awarded if a student has completed 90% of the directed course work in the semester and the remaining is to be completed in 6-8 weeks into the next semester. If not repeated and replaced, the F grade will count towards Semester GPA, CGPA, and SCGPA. The semester GPA and CGPA are recomputed once the result of the di

In [20]:
question = "What are some minors offered at SBASSE?"
result = rag_chain.invoke(question)
print(result)

SBASSE offers minors in Biology, Chemistry, Computer Science, Mathematics, and Physics. To obtain a minor in any of these areas, students must accumulate a minimum of 18 credit hours by taking 6 courses in their area of interest and secure a cumulative GPA of 2.75 in them. The compulsory courses for each minor are as follows:

For Biology minor, students must take a minimum of 18 credits of courses, including at least two courses from the list of compulsory courses, in addition to BIO 101 and BIO 216, which are considered as minor "core" courses.

For Computer Science minor, students must take a minimum of 18 credits of Computer Science courses, including 10 credit hours of compulsory courses and three additional CS courses, at least two of which must be 300+ level.

For Mathematics minor, students must take a minimum of 18 credit hours, including at least one course from the list of compulsory courses and at least 3 credit hours from 300/400 level Mathematics courses.

Please note tha

### Testing its limitations

General questions that are not related to the university would not answered due to the way we structured the prompt template. Try asking such questions below to see.

In [21]:
question = "What are some gift ideas for Mothers Day?"
result = rag_chain.invoke(question)
print(result)

I don't have access to information about gift ideas for Mothers Day. However, I can suggest some general ideas such as personalized jewelry, a spa day, a sentimental photo album, or a handwritten letter expressing your gratitude.


In [22]:
question = "What is the fastest land animal?"
result = rag_chain.invoke(question)
print(result)

I don't know. The provided context only contains information about LUMS policies and procedures, and it does not include any information about the fastest land animal.


## 8. Dynamically route logic based on input

One of the key advantages of using LangChain is how easy it makes it to create non-deterministic chains where the output of a previous step defines the next step. Different instances of specialized LLMs can be used to make decisions along the way and perform different sub-tasks.

You can read more about routing in langchain here: https://python.langchain.com/v0.1/docs/expression_language/how_to/routing/

### Creating More Chains <span style="color:red">**[20 marks]**</span>

In this section, we will be adding another agent which will determine whether the user's query is related to university policy or not, and based on that, the user's query will be passed on to a separate chain with the corresponding prompt templates and RAG access.

In [26]:
from langchain_core.runnables import RunnableLambda

# 1. Define the classifier chain. This is the chain that takes in the user input to determine if it is related to education/academic policies or not. [5 marks]
classifier_chain = (
    prompt
    | llm  n
    | StrOutputParser()  
)

# 2. Create the General LLM chain. This is the chain that takes in the user input and responds to it. [5 marks]
general_chain = (
    prompt
    | llm  
    | StrOutputParser()  
)

# 3. Implement the routing logic using RunnableLambda in the full_chain and a routing function [10 marks]
def route(info):
    
    info.setdefault("context", "") 
    
    classification = classifier_chain.invoke(info)
   
    
    if "education" in classification.lower() or "academic" in classification.lower():
       
        return classifier_chain.invoke(info)
    else:
       
        return general_chain.invoke(info)


full_chain = RunnableLambda(route)

### Testing Our Improved ChatBot <span style="color:red">**[5 marks]**</span>

In [29]:
question = "What is the my name?"
answer = full_chain.invoke({"question": question})
print(answer)

I'm sorry, I don't know the answer to that question. Could you please provide more context or ask another question?


In [31]:
question = "What is the biggest school?"
answer = full_chain.invoke({"question": question})
print(answer)

The LUMS campus is home to five schools, including the Suleman Dawood School of Business (SDSB), the Mushtaq Ahmad Gurmani School of Humanities and Social Sciences (MGSHSS), the Syed Babar Ali School of Science and Engineering (SBASSE), the Shaikh Ahmad Hassan School of Law (SAHSOL), and the LUMS School of Education (LSOE). However, the largest school in terms of the number of students and faculty is the Syed Babar Ali School of Science and Engineering (SBASSE).


## End of Part A