Building RAG Chatbots with LangChain
In this example, we'll work on building an AI chatbot from start-to-finish. We will be using LangChain, OpenAI, and vector DB, to build a chatbot capable of learning from the external world using Retrieval Augmented Generation (RAG).

We will be scrapping a using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

By the end of the example we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.

### Chatbot (no RAG)

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a ChatOpenAI object. For this we do need an OpenAI API key.

In [1]:
# Setting up the openAI key
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [2]:
import os
from langchain.chat_models import ChatOpenAI

# Creating an OpenAI object
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") 
chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

  warn_deprecated(


Chats with *OpenAI's gpt-3.5-turbo and gpt-4 chat models* are typically structured (in plain text) like this:

System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
The final "Assistant:" without a response is what would prompt the model to continue the conversation. In the official OpenAI ChatCompletion endpoint these would be passed to the model in a format like:



[
    
    {"role": "system", "content": "You are a helpful assistant."},

    {"role": "user", "content": "Hi AI, how are you today?"},
    
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    
    {"role": "user", "content": "I'd like to understand string theory."}

]

In LangChain there is a slightly different format. We use three message objects like so:

In [4]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
   
]

In [5]:
res = chat(messages)
print(res.content)

  warn_deprecated(


Hello! As an AI, I don't have feelings, but I'm here to assist you. How can I help you today?


Because res is just another AIMessage object, we can append it to messages, add another HumanMessage, and generate the next response in the conversation

In [6]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="I would like to understand string theory")

# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

String theory is a theoretical framework in physics that aims to describe the fundamental building blocks of the universe. It suggests that instead of point-like particles, the fundamental objects are tiny, one-dimensional "strings" that vibrate in different modes.

Here are some key points to understand about string theory:

1. Dimensions: String theory requires the universe to have more than the usual three spatial dimensions (length, width, and height). It typically posits that there are extra spatial dimensions, usually six or seven in total, curled up and hidden at microscopic scales.

2. Vibrating strings: In string theory, particles are not treated as point-like entities but as tiny strings. The way these strings vibrate determines their properties, such as mass and charge. Different vibrational patterns correspond to different particles.

3. Unification of forces: One of the major goals of string theory is to unify all the fundamental forces of nature, including gravity, electr

### Dealing with Hallucinations

The knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the parametric knowledge of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [10]:
# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)

# add to messages
messages = [
    SystemMessage(content="You are a helpful assistant."),
]
messages.append(prompt)

# send to OpenAI
res = chat(messages)
print(res.content)

Llama 2 is a popular online game that has gained a lot of attention and popularity for several reasons. Here are a few special features of Llama 2:

1. Engaging Gameplay: Llama 2 offers a unique and addictive gameplay experience. Players control a llama character and navigate through various obstacles and challenges. The game requires quick reflexes, strategy, and problem-solving skills to progress through different levels.

2. Stunning Visuals: Llama 2 boasts impressive graphics and visually appealing designs. The game features vibrant colors, detailed environments, and cute character animations, creating an immersive and visually pleasing experience for players.

3. Multiplayer Mode: Llama 2 offers a multiplayer mode where players can compete with their friends or other online players. This adds an extra layer of excitement and competitiveness to the game, allowing players to showcase their skills and challenge others.

4. Regular Updates: The developers of Llama 2 are dedicated to p

As we can see that the model answer is totally wrong. Llama 2 is not a game. Lets try model response for LangChain

In [12]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [13]:
print(res.content)

I apologize, but I'm not familiar with a specific entity called "LLMChain" in the context of LangChain. It's possible that it could be a term or acronym specific to the LangChain platform or project. However, without more information, I'm unable to provide specific details about LLMChain. It would be best to refer to official documentation, whitepapers, or announcements from LangChain to get accurate information about LLMChain or reach out to their support team for clarification.


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it does know the answer — and this can be very hard to detect.

##### Adding source knowledge

There is another way of feeding knowledge into LLMs. It is called source knowledge and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [11]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [14]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now feed this information to our chat bot

In [15]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)
print(res.content)

LLMChain is a type of chain within the LangChain framework. It is the most common type of chain and is used to connect a language model (either an LLM or a ChatModel) to other components within an application. 

The LLMChain consists of three main components: a PromptTemplate, a model, and an optional output parser. The PromptTemplate is responsible for formatting multiple input variables into a suitable prompt. This formatted prompt is then passed to the language model (LLM or ChatModel) within the chain. 

Once the model receives the prompt, it generates an output based on the provided input. The LLMChain also allows for the inclusion of an OutputParser, which can be used to parse and format the output of the language model into a final desired format.

Overall, the LLMChain plays a crucial role in enabling applications powered by language models within the LangChain framework. It facilitates the integration of language models with other data sources and allows for interactions betwe

The quality of this answer is phenomenal. This is made possible due to the augmention of our query with external knowledge (source knowledge). We can use the concept of vector databases to get this information automatically.

### Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the <b>"jamescalam/llama-2-arxiv-papers"</b> dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

In [17]:
from datasets import load_dataset
dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)


Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [20]:
dataset[2]

{'doi': '1102.0183',
 'chunk-id': '2',
 'chunk': 'promising architectures for such tasks. The most successful hierarchical object recognition systems\nall extract localized features from input images, convolving image patches with \x0clters. Filter\nresponses are then repeatedly sub-sampled and re-\x0cltered, resulting in a deep feed-forward network\narchitecture whose output feature vectors are eventually classi\x0ced. One of the \x0crst hierarchical\nneural systems was the Neocognitron (Fukushima, 1980) which inspired many of the more recent\nvariants.\nUnsupervised learning methods applied to patches of natural images tend to produce localized\n\x0clters that resemble o\x0b-center-on-surround \x0clters, orientation-sensitive bar detectors, Gabor \x0clters\n(Schmidhuber et al. , 1996; Olshausen and Field, 1997; Hoyer and Hyv\x7f arinen, 2000). These \x0cndings\nin conjunction with experimental studies of the visual cortex justify the use of such \x0clters in the\nso-called standard m

##### Dataset Overview

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because most Large Language Models (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2 — at least not without this data.

### Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

Using HuggingFace model to generate embeddings like so:

In [23]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

model_id = 'sentence-transformers/all-MiniLM-L6-v2'
model_kwargs = {'device': 'cpu'}
hf_embedding = HuggingFaceEmbeddings(
    model_name=model_id,
    model_kwargs=model_kwargs
)

texts = [
    'this is the first chunk of text',
    'my name is xyz'
]

res = hf_embedding.embed_documents(texts)
len(res), len(res[0])

(2, 384)

In [24]:
print(res[0])

[-0.011914796195924282, 0.08918038755655289, 0.0382428839802742, 0.01908116601407528, 0.05977559834718704, 0.0053163692355155945, 0.038781408220529556, -0.008798947557806969, 0.060234904289245605, -0.015470288693904877, 0.027178632095456123, 0.05887822061777115, -0.02538408152759075, -0.033832449465990067, -0.019123584032058716, -0.0002414310147287324, 0.06661207228899002, -0.06140352413058281, -0.00425687013193965, -0.0020314722787588835, 0.02755112014710903, 0.13663724064826965, 0.0045464178547263145, 0.02143850550055504, 0.04543565586209297, 0.07764163613319397, -0.09471825510263443, 0.0631301999092102, 0.06228697672486305, -0.03511490300297737, -0.00600011320784688, -0.029994618147611618, 0.155880868434906, 0.05987439304590225, 0.01639336161315441, 0.04131562262773514, 0.008261461742222309, 0.028696317225694656, 0.02570163644850254, 0.05118108168244362, 0.03644575923681259, -0.11936520785093307, -0.008073659613728523, 0.05804252624511719, 0.01930527202785015, -0.005999195855110884,

From this we get two (aligning to our two chunks of text) 384-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [29]:
from langchain.vectorstores import Chroma

# Define a vector data base client
chroma_client = chromadb.Client()
client.delete_collection(name="chroma_info")

AttributeError: type object 'Chroma' has no attribute 'Collection'

In [27]:
from tqdm.auto import tqdm  # for progress bar
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
import chromadb


# Define a vector data base client
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="my_collections")

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100
for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = hf_embedding.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]

    # Adding collections
    collection.add(
                metadatas=metadata,
                embeddings=embeds,
                ids=ids
                )

  0%|          | 0/49 [00:00<?, ?it/s]

Creating a vector database on a local machine

In [30]:
from langchain.vectorstores import Chroma
vectorstore = Chroma(client=chroma_client, collection_name="my_collections") 

print(vectorstore._collection.count())

4838


### Retrieval Augmented Generation
We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a vectorstore. We pass in our vector index to initialize the object.

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})