[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)

# Building RAG Chatbots with LangChain

In this example, we'll work on building an AI chatbot from start-to-finish. We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using **R**etrieval **A**ugmented **G**eneration (RAG).

We will be using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

By the end of the example we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.

### Before you begin

You'll need to get an [OpenAI API key](https://platform.openai.com/account/api-keys) and [Pinecone API key](https://app.pinecone.io).

### Prerequisites

Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

- **langchain**: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.
- **openai**: This is the official OpenAI Python client. We'll use it to interact with the OpenAI API and generate responses for our chatbot.
- **datasets**: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.
- **pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.

You can install these libraries using pip like so:

In [1]:
# !pip install -qU \
#     langchain==0.0.292 \
#     openai==0.28.0 \
#     datasets==2.10.1 \
#     pinecone-client==2.2.4 \
#     tiktoken==0.5.1

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flagembedding 1.1.8 requires transformers==4.34.0, but you have transformers 4.36.2 which is incompatible.
ragas 0.0.22 requires openai>1, but you have openai 0.28.0 which is incompatible.

[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


### Building a Chatbot (no RAG)

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a `ChatOpenAI` object. For this we do need an [OpenAI API key](https://platform.openai.com/account/api-keys).

In [1]:
import os
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY") 


In [2]:
import os
from langchain.chat_models import ChatOpenAI

OPENAI_API_KEY= os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") 

chat = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model='gpt-3.5-turbo'
)


  warn_deprecated(


Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```

In LangChain there is a slightly different format. We use three _message_ objects like so:

In [3]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

We generate the next response from the AI by passing these messages to the `ChatOpenAI` object.

In [7]:
res = chat(messages)
res

AIMessage(content='String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and forces in the universe. It proposes that the most basic building blocks of the universe are not point-like particles, but rather tiny, vibrating strings. These strings can exist in various vibrational modes, which correspond to different particles and forces.\n\nString theory seeks to unify the laws of quantum mechanics and general relativity, which are currently described by separate theories. It also suggests the existence of extra dimensions beyond the three spatial dimensions and one time dimension that we experience.\n\nOverall, string theory is a complex and mathematically challenging theory that is still being actively researched and developed by physicists. It has the potential to revolutionize our understanding of the universe, but many aspects of the theory remain speculative and unproven.')

In response we get another AI message object. We can print it more clearly like so:

In [8]:
print(res.content)

String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and forces in the universe. It proposes that the most basic building blocks of the universe are not point-like particles, but rather tiny, vibrating strings. These strings can exist in various vibrational modes, which correspond to different particles and forces.

String theory seeks to unify the laws of quantum mechanics and general relativity, which are currently described by separate theories. It also suggests the existence of extra dimensions beyond the three spatial dimensions and one time dimension that we experience.

Overall, string theory is a complex and mathematically challenging theory that is still being actively researched and developed by physicists. It has the potential to revolutionize our understanding of the universe, but many aspects of the theory remain speculative and unproven.


Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [9]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe that string theory has the potential to produce a unified theory because it offers a way to reconcile and unify the two major pillars of modern physics: quantum mechanics and general relativity.

Quantum mechanics describes the behavior of particles at the smallest scales, while general relativity describes the force of gravity and the behavior of massive objects like planets and stars. These two theories are incredibly successful in their own domains but are fundamentally incompatible when it comes to describing the behavior of particles at extremely small scales or in the presence of strong gravitational fields.

String theory attempts to resolve this conflict by providing a framework that can encompass both quantum mechanics and general relativity. By describing particles as tiny vibrating strings rather than point-like particles, string theory naturally incorporates the principles of quantum mechanics. Additionally, string theory predicts the existence of gravito

### Dealing with Hallucinations

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [6]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [8]:
print(res.content)

I'm sorry, but I'm not aware of any specific significance or special meaning associated with "Llama 2." It's possible that you may be referring to something specific, like a book, a movie, a game, or a particular reference in a context that I'm not aware of. Could you please provide more information or context so that I can better understand and assist you?


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and this can be very hard to detect.

OpenAI have since adjusted the behavior for this particular example as we can see below:

In [9]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [10]:
print(res.content)

I apologize, but I couldn't find any specific information about an "LLMChain" in relation to "LangChain." It's possible that these terms are specific to a particular context or project that I'm not familiar with. If you can provide more details or clarify the context, I'll do my best to assist you further.


There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [11]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [12]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [13]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [14]:
print(res.content)

The LLMChain is a specific type of chain within the LangChain framework, which is used for developing applications powered by language models. A chain in this context refers to a sequence of modular components combined in a specific way to achieve a particular purpose.

The LLMChain, in particular, is the most common type of chain in LangChain. It consists of three main components: a PromptTemplate, a language model (either an LLM or a ChatModel), and an optional output parser. 

When using an LLMChain, multiple input variables are taken and formatted into a prompt using the PromptTemplate. This prompt is then passed to the language model, which generates an output based on the given input. Finally, the optional output parser can be used to parse and format the output of the language model into a final desired format.

The LangChain framework aims to enable powerful and differentiated applications by not only integrating language models through an API but also by making them data-aware

The quality of this answer is phenomenal. This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem — how do we get this information in the first place?

We learned in the previous chapters about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset.

### Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the `"jamescalam/llama-2-arxiv-papers"` dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

In [67]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/langchain-docs",
    split="train"
)

dataset

Downloading data: 100%|██████████| 2.76M/2.76M [00:01<00:00, 1.85MB/s]
Generating train split: 2212 examples [00:00, 3045.48 examples/s]


Dataset({
    features: ['id', 'text', 'source'],
    num_rows: 2212
})

In [73]:
dataset[0]

{'id': '9c04de564ed3-0',
 'text': '.rst\n.pdf\nWelcome to LangChain\n Contents \nGetting Started\nModules\nUse Cases\nReference Docs\nLangChain Ecosystem\nAdditional Resources\nWelcome to LangChain#\nLarge language models (LLMs) are emerging as a transformative technology, enabling\ndevelopers to build applications that they previously could not.\nBut using these LLMs in isolation is often not enough to\ncreate a truly powerful app - the real power comes when you are able to\ncombine them with other sources of computation or knowledge.\nThis library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include:\n❓ Question Answering over specific documents\nDocumentation\nEnd-to-end Example: Question Answering over Notion Database\n💬 Chatbots\nDocumentation\nEnd-to-end Example: Chat-LangChain\n🤖 Agents\nDocumentation\nEnd-to-end Example: GPT+WolframAlpha\nGetting Started#\nCheckout the below guide for a walkthrough of ho

#### Dataset Overview

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2 — at least not without this data.

### Task 4: Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

We begin by initializing our connection to Pinecone, this requires a [free API key](https://app.pinecone.io).

In [23]:
import os
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'gcp-starter'
)

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

In [71]:
# import fitz  # PyMuPDF

# def extract_text_from_pdf(pdf_path):
#     text = ""
#     doc = fitz.open(pdf_path)
#     for page_num in range(doc.page_count):
#         page = doc.load_page(page_num)
#         text += page.get_text("text")
#     doc.close()
#     return text

# pdf_path = "../Dataset/10 Academy Cohort A - Weekly Challenge_ Week - 6.pdf"
# data = extract_text_from_pdf(pdf_path)
# print(data)


Then we connect to the index:

In [74]:
import time

index_name = "langchain"

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pinecone.Index(index_name)

In [75]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model — we can access it via LangChain like so:

In [76]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

Using this model we can create embeddings like so:

In [77]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [80]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['id']}" for _, x in batch.iterrows()]

    # get text to embed
    texts = [x['text'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['text'],
         'source': x['source']
         } for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

100%|██████████| 23/23 [03:05<00:00,  8.05s/it]


We can check that the vector index has been populated using `describe_index_stats` like before:

In [81]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.02212,
 'namespaces': {'': {'vector_count': 2212}},
 'total_vector_count': 2212}

In [69]:
# index_name = '10-academy'

# # Delete the entire index
# pinecone.delete_index(index_name)

In [70]:
# indexes = pinecone.list_indexes()

# # Check if the list of indexes is empty
# if not indexes:
#     print("The database is empty.")
# else:
#     print("The database is not empty.")

The database is empty.


#### Retrieval Augmented Generation

We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [82]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)




In [83]:
def get_user_input():
    """
    Get user input from the console and return it as a string.
    """
    user_input = input("Enter your query: ")
    return str(user_input)


In [None]:
def get_user_input():
    """
    Get user input from the console and return it as a string.
    """
    user_input = input("Enter your query: ")
    return str(user_input)


Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [84]:
query = get_user_input()

print(query)


what is langchain


In [85]:
vectorstore.similarity_search(query, k=3)

[Document(page_content='Chatbots\\n\\nDocumentation\\n\\nEnd-to-end Example: Chat-LangChain\\n\\nð\\x9f¤\\x96 Agents\\n\\nDocumentation\\n\\nEnd-to-end Example: GPT+WolframAlpha\\n\\nð\\x9f“\\x96 Documentation\\n\\nPlease see here for full documentation on:\\n\\nGetting started (installation, setting up the environment, simple examples)\\n\\nHow-To examples (demos, integrations, helper functions)\\n\\nReference (full API docs)\\n  Resources (high-level explanation of core concepts)\\n\\nð\\x9f\\x9a\\x80 What can this help with?\\n\\nThere are six main areas that LangChain is designed to help with.\\nThese are, in increasing order of complexity:\\n\\nð\\x9f“\\x83 LLMs and Prompts:\\n\\nThis includes prompt management, prompt optimization, generic interface for all LLMs, and common utilities for working with LLMs.\\n\\nð\\x9f”\\x97 Chains:\\n\\nChains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard int

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [94]:
def augment_prompt(folder_path, query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    
    # save the source knowledge to a file in the specified folder
    file_path = os.path.join(folder_path, 'context.txt')
    with open(file_path, 'w') as file:
        file.write(source_knowledge)
    
    return augmented_prompt



In [90]:
folder_path = "../prompts"
print(augment_prompt(query, folder_path))

Using the contexts below, answer the query.

    Contexts:
    Chatbots\n\nDocumentation\n\nEnd-to-end Example: Chat-LangChain\n\nð\x9f¤\x96 Agents\n\nDocumentation\n\nEnd-to-end Example: GPT+WolframAlpha\n\nð\x9f“\x96 Documentation\n\nPlease see here for full documentation on:\n\nGetting started (installation, setting up the environment, simple examples)\n\nHow-To examples (demos, integrations, helper functions)\n\nReference (full API docs)\n  Resources (high-level explanation of core concepts)\n\nð\x9f\x9a\x80 What can this help with?\n\nThere are six main areas that LangChain is designed to help with.\nThese are, in increasing order of complexity:\n\nð\x9f“\x83 LLMs and Prompts:\n\nThis includes prompt management, prompt optimization, generic interface for all LLMs, and common utilities for working with LLMs.\n\nð\x9f”\x97 Chains:\n\nChains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface

Using this we produce an augmented prompt:

In [95]:
print(augment_prompt(query))

Using the contexts below, answer the query.

    Contexts:
    Chatbots\n\nDocumentation\n\nEnd-to-end Example: Chat-LangChain\n\nð\x9f¤\x96 Agents\n\nDocumentation\n\nEnd-to-end Example: GPT+WolframAlpha\n\nð\x9f“\x96 Documentation\n\nPlease see here for full documentation on:\n\nGetting started (installation, setting up the environment, simple examples)\n\nHow-To examples (demos, integrations, helper functions)\n\nReference (full API docs)\n  Resources (high-level explanation of core concepts)\n\nð\x9f\x9a\x80 What can this help with?\n\nThere are six main areas that LangChain is designed to help with.\nThese are, in increasing order of complexity:\n\nð\x9f“\x83 LLMs and Prompts:\n\nThis includes prompt management, prompt optimization, generic interface for all LLMs, and common utilities for working with LLMs.\n\nð\x9f”\x97 Chains:\n\nChains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [96]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

LangChain is a library aimed at assisting developers in building applications with Large Language Models (LLMs) through composability. LLMs are powerful language models that enable developers to create applications that combine them with other sources of computation or knowledge. LangChain provides a standard interface for various features such as prompt management, chains (sequences of calls), data augmented generation, agents (LLMs making decisions), memory (persisting state), and evaluation. It offers integrations with other tools and examples of end-to-end applications like chatbots and question answering over specific documents.


We can continue with more Llama 2 questions. Let's try _without_ RAG first:

In [97]:
prompt = HumanMessage(
    content="what are Agents in langchain?"
)

res = chat(messages + [prompt])
print(res.content)

In the context of LangChain, Agents refer to a concept where a large language model (LLM) makes decisions about which actions to take, takes those actions, observes the outcomes, and repeats the process until a desired outcome is achieved. Agents essentially use LLMs to perform tasks that involve decision-making and interaction with the environment.

LangChain provides a standard interface for agents, a selection of pre-built agents to choose from, and examples of end-to-end agent applications. These agents can be used to build applications that require the LLM to perform sequential actions based on the observed outcomes.

By leveraging the capabilities of LLMs and the agent framework provided by LangChain, developers can create more complex and interactive applications that involve decision-making, such as chatbots, virtual assistants, game characters, and more.


The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

We get a much more informed response that includes several items missing in the previous non-RAG response, such as "red-teaming", "iterative evaluations", and the intention of the researchers to share this research to help "improve their safety, promoting responsible development in the field".