# Building RAG Chatbots with LangChain

In this project, we'll work on building an AI chatbot from start-to-finish. We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using Retrieval Augmented Generation (RAG).

We will be using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

By the end of the project we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.

### Prerequisites

You'll need to get an [OpenAI API key](https://platform.openai.com/account/api-keys) and [Pinecone API key](https://app.pinecone.io).

Apart from these before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

**langchain:** This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.

**openai:** This is the official OpenAI Python client. We'll use it to interact with the OpenAI API and generate responses for our chatbot.

**datasets:** This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.

**pinecone-client:** This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.

In [1]:
#Installing Requiremnets
!pip install -qU \
    langchain==0.0.354 \
    openai==1.6.1 \
    datasets==2.10.1 \
    pinecone-client==2.2.4 \
    tiktoken==0.5.2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.3/803.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.1/242.1 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━

In [2]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or "OPENAI_API_KEY"

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

  warn_deprecated(


In [3]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

In [4]:
res = chat(messages)
res

  warn_deprecated(


AIMessage(content='String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. In string theory, the fundamental building blocks of the universe are not particles, but rather one-dimensional strings that vibrate at different frequencies. These vibrations give rise to the different particles and forces we observe in the universe.\n\nString theory has led to new insights and ideas in theoretical physics, but it is still a work in progress and has not yet been experimentally confirmed. It is a complex and mathematically challenging theory that requires a deep understanding of quantum mechanics and advanced mathematics to fully grasp.\n\nIf you have any specific questions about string theory, feel free to ask!')

In [5]:
print(res.content)

String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. In string theory, the fundamental building blocks of the universe are not particles, but rather one-dimensional strings that vibrate at different frequencies. These vibrations give rise to the different particles and forces we observe in the universe.

String theory has led to new insights and ideas in theoretical physics, but it is still a work in progress and has not yet been experimentally confirmed. It is a complex and mathematically challenging theory that requires a deep understanding of quantum mechanics and advanced mathematics to fully grasp.

If you have any specific questions about string theory, feel free to ask!


In [6]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe that string theory has the potential to produce a unified theory because it aims to describe all fundamental forces and particles in a single, coherent framework. In traditional particle physics, there are separate theories for the electromagnetic, weak nuclear, and strong nuclear forces, as well as the particles that interact through these forces. General relativity, which describes gravity, is a separate theory altogether.

String theory attempts to unify all these forces and particles by treating them as different vibrations of the same underlying strings. This could potentially lead to a "theory of everything" that explains all known phenomena in the universe, from the behavior of subatomic particles to the dynamics of galaxies and the structure of spacetime.

While string theory has not yet been experimentally confirmed, its mathematical consistency and ability to address some of the fundamental questions in physics have led many physicists to believe that it ho

###Why use RAG?
(Dealing with Hallucinations)

Although as previously shown, our chatbot's understanding of LLMs may be restricted. This is because all of the knowledge that LLMs possess is acquired during training. In essence, an LLM compresses the "world" as it appears in the training data into the model's internal parameters. This information is referred to as the model's _parametric knowledge_.

LLMs can't reach the outside world by default.

When we question LLMs about more current material, such as about the brand-new and highly sought-after Llama 2 LLM, the outcome of this is evident.


In [7]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [8]:
print(res.content)

I'm not sure what you're referring to when you mention "Llama 2." It could be a specific topic, concept, or reference that I'm not familiar with. If you can provide more context or details, I'd be happy to try to help you understand or provide information about it.


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it does know the answer — and this can be very hard to detect.

In [9]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [10]:
print(res.content)

I apologize, but I am not familiar with the terms "LLMChain" or "LangChain." It's possible that they may be specific to a certain field or concept that I don't have information about. If you can provide more context or details, I'll do my best to assist you or suggest where you might find more information on the topic.


In [11]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

In [12]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

In [13]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [14]:
print(res.content)

The LLMChain in LangChain is a common type of chain that is part of the LangChain framework for developing applications powered by language models. The LLMChain consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. 

This chain is designed to take multiple input variables, format them into a prompt using the PromptTemplate, and pass that prompt to the model. The model, which could be an LLM (Large Language Model) or a ChatModel, processes the prompt and generates an output. If an OutputParser is provided, it is used to parse the output of the model into a final format.

LangChain aims to enable applications that are data-aware and agentic, meaning they can connect a language model to other sources of data and allow the language model to interact with its environment. By using the LLMChain within the LangChain framework, developers can create powerful and differentiated applications that leverage language models in innovative ways.


### Importing Data


In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the "jamescalam/llama-2-arxiv-papers" dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.


We are utilising a dataset that comes from the ArXiv papers Llama 2. It is an assortment of scholarly works from ArXiv, an electronic preprint repository that has been cleared for publication following moderation. A "chunk" of text from these papers is represented by each entry in the dataset.

Without this data, most Large Language Models (LLMs) are unable to respond to our inquiries on Llama 2, as they are only able to learn about the world as it was during training.

In [15]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading and preparing dataset json/jamescalam--llama-2-arxiv-papers-chunked to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [16]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

We begin by initializing our connection to Pinecone, this requires a [free API key](https://app.pinecone.io).

In [19]:
import pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
pinecone.init(
  api_key = os.getenv("PINECONE_API_KEY") or "PINECONE_API_KEY",
  environment="gcp-starter"
)

In [21]:
import time
import pinecone  # Assuming you have pinecone-client installed

index_name = 'llama-2-rag'

# Check if the index already exists
if index_name not in pinecone.list_indexes():
    # Create the index if it doesn't exist
    pinecone.create_index(
        index_name,
        dimension=1536,  # Dimensionality of ada 002
        metric='cosine'
    )

    # Wait for the index to be initialized
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

# Connect to the index
index = pinecone.Index(index_name)
time.sleep(1)

# View index stats
index.describe_index_stats()


{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's text-embedding-ada-002 model — we can access it via LangChain like so:

In [22]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

  warn_deprecated(


In [23]:
# creating Embeddings
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [24]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

  0%|          | 0/49 [00:00<?, ?it/s]

In [25]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.04838,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

We've built a fully-fledged knowledge base. Now we'll connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

In [26]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

  warn_deprecated(


In [27]:
query = "What is so special about Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

In [28]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

In [29]:
print(augment_prompt(query))

Using the contexts below, answer the query.

    Contexts:
    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety
asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwith

In [30]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed in the context of the provided information. These LLMs range in scale from 7 billion to 70 billion parameters and are optimized for dialogue use cases. The fine-tuned LLMs in Llama 2, such as L/l.sc/a.sc/m.sc/a.sc/two.taboldstyle-C/h.sc/a.sc/t.sc, have been shown to outperform open-source chat models on various benchmarks tested.

One of the notable aspects of Llama 2 is that these models have been designed and evaluated for helpfulness and safety, potentially making them a suitable substitute for closed-source models. The development and release of Llama 2 contribute to advancing AI alignment research and provide a transparent and reproducible approach to fine-tuning large language models.

Overall, the special characteristics of Llama 2 lie in its performance in dialogue applications, its optimization for helpfulness and safety, and its potential as a viable alternative to closed-source models

Here we'll have to modify this section in such a way that the user can input promps and the chatbot will answer.

In [31]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, several safety measures were implemented to ensure the safety and reliability of the models. These measures include:

1. Safety-specific data annotation and tuning: The models underwent specific data annotation and tuning processes aimed at enhancing safety and mitigating potential risks associated with language model outputs.

2. Red-teaming: Red-teaming involves the practice of creating adversarial scenarios to test the robustness and security of the models. This process helps identify vulnerabilities and areas for improvement in the models' safety features.

3. Iterative evaluations: Continuous and iterative evaluations were conducted to assess the safety performance of the models at different stages of development. This approach allows for ongoing monitoring and refinement of safety measures.

By implementing these safety measures, the developers aimed to enhance the safety and responsible use of Llama 2 models, enabling the community to reproduce fin

In [32]:
#Delete the index to save resources:
pinecone.delete_index(index_name)