# Part B: Building a RAG-Based Researcher Chatbot with LangChain <span style="color:orange">**[75 marks]**</span>

Now that you have worked with some of the fundamental components of langchain, you will be tasked with building a somewhat more complex chatbot. This chatbot will be your own personal research assistant. It will have access to a collection of research papers and the ability to answer specific questions according to the contents of those papers. Furthermore, the chatbot will also be able to retrieve information from Wikipedia to formulate a response if the user requests so. To standardise marking, a folder named `papers` containing the research papers you should run your chatbot on has been provided to you.

---

## 0. Instructions

- Run the entire notebook to ensure everything is working correctly.
- Modify the chatbot's prompt template(s) to suit your specific use cases (e.g., a different context or more detailed instructions).
- Do not use GPT or other AI tools to generate code. Refer to documentations instead. It is important you learn these tools yourself for your course projects.

## 1. Importing libraries

Import all of the required libraries here.

In [4]:
import os
import langchain 
import langchain_community
import langchain_huggingface
import langchain_pinecone 
import pinecone
import dotenv
import streamlit as st

  from tqdm.autonotebook import tqdm


### Setting Up API Keys

Run the following cell to save any additional API keys into the `.env` file. You may edit this file manually instead if you prefer.

In [5]:
# Replace with the API keys you need
API_KEY = "enter_your_api_key"

env_content = f"""
API_KEY={API_KEY}
"""

with open(".env", "w") as file:
    file.write(env_content)

print("Environment variables are saved to .env file.")

Environment variables are saved to .env file.


### Loading the Environment File

Run the following snippet of code to load the environment file each time you use this notebook

In [67]:
dotenv.load_dotenv()

True

## 2. ResearchChatbot Class

This is the class where you will be implementing the chatbot. It contains the following methods:

- `__init__(self)`: Here you will initialize the chatbot and declare all the required variables/objects, such as the loader, retrievers, llm, prompt templates, etc. <span style="color:red">**[5 marks]**</span>

- `load_papers(self)`: Here you will load all the research paper documents. <span style="color:red">**[5 marks]**</span>

- `store_papers(self)`: Here you will connect to the vector database, initialize the vector store and store all the embeddings along with their metadata into the database. <span style="color:red">**[5 marks]**</span>

- `generate(self, query)`: The user will call this function to generate a response to their query. Here you will invoke the necessary chains and dynamically route the logic further based on the user input. You may assume that the user will make it explicitly clear in their question if they want the chatbot to retrieve a certain Wikipedia article to answer their question. The chatbot should be able to answer questions from the research papers otherwise and must also be capable of responding to general queries. <span style="color:red">**[10 + 10 + 10 marks]**</span>

__Some things to note:__
1. You may assume that the user will use the chatbot properly. `load_papers` and `store_papers` will always be called at least once before `generate`.

2. The chatbot should maintain memory/history of the previous user prompts and responses, thus allowing for follow-up questions to be asked. <span style="color:red">**[10 marks]**</span>

3. The chatbot should also be capable of caching previous prompts and responses. If the chatbot recieves a prompt it has seen before or something close to it, then inference should not happen and the chatbot should reuse the prior response, thus improving response times. <span style="color:red">**[10 marks]**</span>

4. You can only use langchain components but you may create as many chains, retrievers, prompt templates, etc as you want.

5. You may use a different document loader or LLM if you prefer.

6. You may create as many helper functions as needed.

In [59]:
%pip install hugging_facehub
%pip install wikipedia

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement hugging_facehub (from versions: none)
ERROR: No matching distribution found for hugging_facehub


Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting beautifulsoup4 (from wikipedia)
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->wikipedia)
  Downloading soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Downloading soupsieve-2.6-py3-none-any.whl (36 kB)
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py): started
  Building wheel for wikipedia (setup.py): finished with status 'done'
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11704 sha256=4a21c7c8604019bef89ae32a1131d25fd210d2961a5fab58497f896900e49f55
  Stored in directory: c:\users\hp\appdata\local\pip\cache\wheels\63\47\7c\a9688349aa74d228ce0a9023229c6c0ac52ca2a40fe87679b8
Successfully built wikipedia
Ins

In [68]:
%pip install pypdf

Collecting pypdf
  Downloading pypdf-5.0.1-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-5.0.1-py3-none-any.whl (294 kB)
Installing collected packages: pypdf
Successfully installed pypdf-5.0.1
Note: you may need to restart the kernel to use updated packages.


In [102]:
%pip install langchain

Note: you may need to restart the kernel to use updated packages.


In [164]:
import os
from langchain_huggingface import HuggingFaceEndpoint
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pinecone import Pinecone, ServerlessSpec
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.retrievers import WikipediaRetriever
from pinecone import Pinecone, ServerlessSpec
import pickle
from langchain.chains import RetrievalQA
from langchain_pinecone import PineconeVectorStore
from uuid import uuid4
from langchain.schema import Document
from pinecone import Index
from langchain.chains import ConversationalRetrievalChain
from langchain import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.chains import LLMChain
from langchain_community.retrievers import WikipediaRetriever
from langchain.document_loaders import WikipediaLoader

In [163]:
%pip install langchain wikipedia-api


Collecting wikipedia-api
  Downloading wikipedia_api-0.7.1.tar.gz (17 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: wikipedia-api
  Building wheel for wikipedia-api (setup.py): started
  Building wheel for wikipedia-api (setup.py): finished with status 'done'
  Created wheel for wikipedia-api: filename=Wikipedia_API-0.7.1-py3-none-any.whl size=14398 sha256=e816a234e34cab4a93dd9593d4f4d2113431004d9a0e3b6886605469ee782388
  Stored in directory: c:\users\hp\appdata\local\pip\cache\wheels\48\93\2f\978da1e445cf17606445f4b47fd8454250f5440d5a10c677e9
Successfully built wikipedia-api
Installing collected packages: wikipedia-api
Successfully installed wikipedia-api-0.7.1
Note: you may need to restart the kernel to use updated packages.


In [210]:
class ResearchChatbot:

    def __init__(self):
       
        self.pinecone_api_key = os.getenv("PINECONE_API_KEY")
        self.huggingface_api_key = os.getenv("HUGGINGFACE_API_KEY")
        repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
        self.llm = HuggingFaceEndpoint(
            repo_id=repo_id,
            temperature= 0.8,
            top_k= 25,
            huggingfacehub_api_token= self.huggingface_api_key,
            
            
        )
        self.wiki_retriever = WikipediaRetriever()
        self.memory = ConversationBufferMemory(
            memory_key ="chat_history",
            return_messages=True
        )
        self.embeddings = HuggingFaceEmbeddings(
            model_name = "sentence-transformers/all-MiniLM-L6-V2"
        )
        self.pc = Pinecone(api_key=self.pinecone_api_key)
        #self.cache_file = "response_cache.pkl"
        #self.load_cache()
        self.index_name = "research-papers"
        try : 
            existing_indices = [index.name for index in self.pc.list_indexes()]
            print(f"existing indexes: {existing_indices}")
            # index_name_to_delete = 'research-papers'
            # if index_name_to_delete in existing_indices:
            #     try:
            #         self.pc.delete_index(index_name_to_delete)
            #         print(f"Index '{index_name_to_delete}' deleted successfully.")
            #     except Exception as e:
            #         print(f"Error deleting index '{index_name_to_delete}': {e}")

            if self.index_name not in existing_indices:
                print(f"Creating new index: {self.index_name}")
                self.pc.create_index(
                    name=self.index_name,
                    dimension=384,
                    metric="cosine",
                    spec=ServerlessSpec(
                        cloud="aws",
                        region="us-east-1"
                    )
                )
                
                print("created successfully")
            print("vector store CREATING")
            self.index = self.pc.Index(self.index_name)
            self.vectorstore = PineconeVectorStore(index=self.index, embedding=self.embeddings)
            print("vector store CREATED")
        except Exception as e : 
            print(f"Error in creating pinecone index : {e}")

    

    def load_papers(self, folder_path):
        
        self.pdf_documents = []
        for filename in os.listdir(folder_path):
            if filename.endswith('.pdf'):
                pdf_path = os.path.join(folder_path, filename)
                try:
                    pdf_loader = PyPDFLoader(pdf_path)
                    document = pdf_loader.load()  
                    self.pdf_documents.append(document)
                    print(f"Loaded document: {filename}")
                except Exception as e:
                    print(f"Error loading document {filename}: {e}")
        text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=1000,
                chunk_overlap=200
            )
        self.split_documents = []
        for doc in self.pdf_documents:
            doc_text = " ".join([chunk.page_content for chunk in doc])
            self.split_chunks = text_splitter.split_text(doc_text)  
            for chunk in self.split_chunks:
                self.split_documents.append(Document(page_content=chunk, metadata={"source": filename}))

        print(f"Split into {len(self.split_documents)} chunks.")
    
    
    def store_papers(self):

        print("Storing the documents in the pinecone database")
        uuids = [str(uuid4()) for _ in range(len(self.split_documents))]
        try: 
            self.vectorstore.add_documents(documents=self.split_documents, ids=uuids)
            print("Success")
        except Exception as e : 
            raise RuntimeError(f"Error occurred: {str(e)}")
    
    def generate(self, query):
        try:
            # Handling Wikipedia queries
            if "wikipedia" in query.lower():
                docs = WikipediaLoader(query=query, load_max_docs=1).load()
                context = docs[0].page_content[:300] if docs else "No relevant Wikipedia information found."

                prompt = f"""You are a helpful research assistant. Use the following Wikipedia information to answer the question. If you can't answer based on the given information, say so.

                Wikipedia information: {context}
                Question: {query}
                Previous conversation: {self.memory.load_memory_variables({"chat_history"})}
                Assistant:"""

                response = self.llm(prompt)
            else:
                retriever = self.vectorstore.as_retriever()

                qa_prompt_template = """You are a helpful research assistant. Use the following pieces of research paper context to answer the question. If you can't answer based on the given context, say so.

                Context: {context}

                Question: {question}

                Previous conversation:
                {chat_history}

                Assistant:"""

                QA_PROMPT = PromptTemplate(
                    template=qa_prompt_template,
                    input_variables=["context", "question", "chat_history"]
                )

                qa_chain = ConversationalRetrievalChain.from_llm(
                    llm=self.llm,
                    retriever=retriever,
                    memory=self.memory,
                    combine_docs_chain_kwargs={"prompt": QA_PROMPT}
                )

                response = qa_chain.invoke({"question": query})["answer"]

            # Explicitly save context between queries
            inputs = {"question": query}
            outputs = {"response": response}
            self.memory.save_context(inputs, outputs)  # Saving context (question and response) in memory

            # Cache the response
            
            return response

        except Exception as e:
            return f"An error occurred while generating the response: {str(e)}"
    

### Testing your ResearchChatbot

Create an instance of your chatbot in the cell below. Showcase all the functionalities of your chatbot below by asking it various types of questions and printing the generated responses. <span style="color:red">**[5 marks]**</span>

In [211]:
bot = ResearchChatbot()
bot.load_papers("./papers")
bot.store_papers()

query = "What is going on?"
#response = bot.generate(query)
#print(response)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\hp\.cache\huggingface\token
Login successful




existing indexes: ['handbook-chatbot', 'research-papers-2', 'research-papers-4', 'research-papers-5', 'research-papers']
vector store CREATING
vector store CREATED
Loaded document: Can_Foundation_Models_Perform_Zero-Shot_Task_Specification_For_Robot_Manipulation.pdf
Loaded document: Domain-Specific_Language_Model_Pretraining_for_Biomedical_Natural_Language_Processing.pdf
Loaded document: Improving_Vision-Inspired_Keyword_Spotting_Using_Dynamic_Module_Skipping_in_Streaming_Conformer_Encoder.pdf
Loaded document: Large_Language_Models_Understand_and_Can_Be_Enhanced_by_Emotional_Stimuli.pdf
Split into 338 chunks.
Storing the documents in the pinecone database
Success


In [212]:
query = "Explain transformer architecture"
response = bot.generate(query)
print(response)

 The Transformer architecture is a state-of-the-art choice for Automatic Speech Recognition (ASR). It uses a conformer architecture, which includes residual connections in a way that allows gates to dynamically skip modules. Recent work by Peng et al. [18] has shown that binary gates can be added inside a Transformer-based ASR architecture to dynamically adjust the network's depth and reduce the average number of computations while retaining the same word error-rate. They extend the input-dependent dynamic depth (I3D) method to the conformer by placing local gates to skip feedforward, convolution and attention modules based on characteristics of the input to the module itself. Skipping modules still requires the full model to be loaded and does not noticeably impact the user-perceived latency, nevertheless, it improves the efficiency and can lead to considerable power savings.


In [214]:
query = "what is dog wikipedia"
response = bot.generate(query)
print(response)

 The Transformer architecture is a state-of-the-art choice for Automatic Speech Recognition (ASR). It uses a conformer architecture, which includes residual connections in a way that allows gates to dynamically skip modules. Recent work by Peng et al. [18] has shown that binary gates can be added inside a Transformer-based ASR architecture to dynamically adjust the network's depth and reduce the average number of computations while retaining the same word error-rate. They extend the input-dependent dynamic depth (I3D) method to the conformer by placing local gates to skip feedforward, convolution and attention modules based on characteristics of the input to the module itself. Skipping modules still requires the full model to be loaded and does not noticeably impact the user-perceived latency, nevertheless, it improves the efficiency and can lead to considerable power savings.

The term "dog whistle" is a political term, not related to the Transformer architecture or Wikipedia. It seem

In [215]:
query = "what is a dog"
response = bot.generate(query)
print(response)

  Based on the provided context, the question appears to be unrelated to the content. The context discusses the use of domain-specific pretraining for natural language processing in the biomedical field, specifically the training of a language model named PubMedBERT. The model is trained on a large corpus of biomedical literature to improve its performance in downstream NLP tasks related to this field. The paper discusses the effectiveness of this approach compared to traditional pretraining methods, as well as potential future directions for this research.

Given this, I would respond to the question "What is a dog?" by saying that a dog is a common domestic mammal that is often kept as a pet. It is a member of the Canidae family, which also includes wolves, foxes, and other types of animals. Dogs are known for their loyalty and their ability to be trained, and they come in a wide variety of breeds with different characteristics and appearances.

However, I want to note that this answ

In [216]:
query = "Explain transformer architecture"
response = bot.generate(query)
print(response)

  The Transformer architecture is a type of model used in machine learning, specifically for Automatic Speech Recognition (ASR). It uses a conformer architecture, which includes residual connections in a way that allows gates to dynamically skip modules. Recent work by Peng et al. [18] has shown that binary gates can be added inside a Transformer-based ASR architecture to dynamically adjust the network's depth and reduce the average number of computations while retaining the same word error-rate. They extend the input-dependent dynamic depth (I3D) method to the conformer by placing local gates to skip feedforward, convolution and attention modules based on characteristics of the input to the module itself. Skipping modules still requires the full model to be loaded and does not noticeably impact the user-perceived latency, nevertheless, it improves the efficiency and can lead to considerable power savings.

The Transformer architecture is a state-of-the-art choice for ASR. It uses a co

## 3. Creating a User Interface with Streamlit

While you are working on your chatbot, you may use the provided code in `<RollNumber>_interface.py` in order to interact with your research assistant chatbot in a typical conversational frontend. To do so, copy the `ResarchChatbot` class in it's current state to `<RollNumber>_interface.py`, navigate to the directory in a terminal and run `python -m streamlit run <RollNumber>_interface.py`. Streamlit will open an interface for your chatbot in your browser.

## Congratulations!

You have taken a dive into learning how to use LangChain to make RAG ChatBots, give yourself a pat on the back. :)