In [8]:
!pip install -qU \
    langchain==0.0.354 \
    openai==1.6.1 \
    datasets==2.10.1 \
    pinecone-client==3.0.0 \
    tiktoken==0.5.2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.3/803.3 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.9/199.9 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.5/229.5 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━

**Building RAG Chatbots with LangChain**

In this demonstration, we're going to construct an AI-powered chatbot from the ground up. Our tools of choice will include LangChain, OpenAI, and the Pinecone vector database, enabling us to create a chatbot that learns from external sources through Retrieval Augmented Generation (RAG).

Our resource for training will be a dataset derived from the Llama 2 ArXiv paper along with other relevant research papers, aiding our chatbot in responding to inquiries about the latest advancements in GenAI.

By the conclusion of this demonstration, we will have developed a fully operational chatbot and RAG system capable of engaging in conversation and delivering well-informed answers based on an extensive knowledge database.

**Building a Chatbot (no RAG)**

Our project will predominantly utilize the LangChain library to integrate the various elements required for our chatbot. Initially, we will develop a basic chatbot without incorporating retrieval augmentation. This is achieved by initializing a ChatOpenAI object, which necessitates an OpenAI API key.


In [9]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = "Enter Key"

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

  warn_deprecated(


In [10]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

In [11]:
res = chat(messages)
res

  warn_deprecated(


AIMessage(content='Sure! String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and their interactions. It proposes that particles are not point-like objects, but rather tiny, vibrating loops of energy called "strings."\n\nIn string theory, the different properties of particles, such as their mass and charge, arise from the different ways in which strings vibrate. Just like the different vibrational modes of a guitar string produce different musical notes, the various vibrational modes of a string correspond to different particles.\n\nString theory also incorporates the idea of extra dimensions beyond the three spatial dimensions we are familiar with. It suggests that there are six additional spatial dimensions that are curled up and invisible at everyday scales.\n\nOne of the intriguing aspects of string theory is that it aims to unify all the fundamental forces of nature, including gravity, electromagnetism, and the strong and

In [11]:
print(res.content)

In [12]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe that string theory has the potential to produce a unified theory because it incorporates all the fundamental forces of nature and provides a consistent framework for describing their interactions. Here are a few reasons why physicists are optimistic about string theory:

1. Unification of Forces: String theory naturally incorporates gravity, electromagnetism, and the strong and weak nuclear forces into a single framework. This is in contrast to the Standard Model of particle physics, which treats these forces as distinct and separate entities.

2. Consistency of Quantum Mechanics and General Relativity: String theory provides a way to reconcile the principles of quantum mechanics (which describe the behavior of particles on small scales) with general relativity (which describes the behavior of gravity on large scales). This is important because the two theories are currently incompatible in certain situations, such as near the center of a black hole or during the ear

**Dealing with Hallucinations**

Our chatbot is operational, yet as noted, the knowledge of Large Language Models (LLMs) can be restricted. This limitation arises because LLMs acquire their entire knowledge base during their training phase. Essentially, an LLM condenses the "world" as depicted in its training data into its internal model parameters. This stored information is referred to as the model's parametric knowledge.

Inherently, LLMs lack the capability to access information from the external world.

This limitation becomes particularly evident when querying LLMs for more recent developments, such as information on the new and widely-discussed Llama 2 LLM.

In [13]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [14]:
print(res.content)

I'm sorry, but I don't have any information about a specific entity called "Llama 2." It's possible that you may be referring to something specific or using a term that I am not familiar with. Could you please provide more context or clarify your question?


In [15]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [16]:
print(res.content)

I apologize, but I am not familiar with the specific terms "LLMChain" or "LangChain." It's possible that these terms are specific to a particular context or domain that I am not aware of. If you can provide more information or context about what you are referring to, I will do my best to assist you.


Another method exists for inputting knowledge into LLMs, known as source knowledge. This pertains to any data that is introduced into the LLM through the prompt. We can experiment with this approach using the LLMChain question. For this, we can utilize a description taken from the LangChain documentation.

In [17]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

In [18]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

In [19]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [20]:
print(res.content)

The LLMChain is a type of chain within the LangChain framework. Chains in LangChain are sequences of modular components or other chains combined in a specific way to achieve a particular purpose. The LLMChain, in particular, is the most common type of chain.

The LLMChain consists of three main components: a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables and utilizes the PromptTemplate to format them into a prompt. The formatted prompt is then passed to the language model (LLM or ChatModel) for processing. Finally, if an OutputParser is provided, it is used to parse the output of the language model into a final desired format.

LangChain itself is a framework for developing applications that leverage the power of language models. It goes beyond simply calling a language model via an API by also enabling data-awareness and agentic capabilities. Being data-aware means connecting a language model to other da

**Importing the Data**

For this activity, we're going to import our data using the Hugging Face Datasets library. Our focus will be on the "jamescalam/llama-2-arxiv-papers" dataset. This particular dataset comprises a compilation of ArXiv papers, which will function as the external knowledge repository for our chatbot.

In [21]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading and preparing dataset json/jamescalam--llama-2-arxiv-papers-chunked to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [22]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

**Dataset Overview**

Our chosen dataset originates from the Llama 2 ArXiv papers. It's an assembly of scholarly papers from ArXiv, an online archive of electronic preprints that have been moderated and approved for publication. Every record in this dataset is a segment of text extracted from these papers.

Since the majority of Large Language Models (LLMs) are limited to the knowledge available up to their training period, they are unable to provide answers about Llama 2 without access to this specific information.

**Task 4: Building the Knowledge Base**

We have successfully acquired a dataset that will act as the knowledge base for our chatbot. The following step involves converting this dataset into a format that our chatbot can utilize. This process necessitates the use of an embedding model and a vector database.

Our initial action is to establish a connection with Pinecone, which necessitates obtaining a free API key.

In [23]:
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = "Enter API Key"

# configure client
pc = Pinecone(api_key=api_key)

Next, we will configure our index specification, enabling us to select the cloud provider and region for deploying our index. A comprehensive list of all available providers and regions can be found here.

In [24]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-west-2"
)

Following that, we proceed to initiate the index. For generating the embeddings, we'll employ OpenAI's text-embedding-ada-002 model, and accordingly, we adjust the dimension to 1536.

In [25]:
import time

index_name = 'llama-2-rag'
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of ada 002
        metric='dotproduct',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Our index is set up but currently vacant. Being a vector index, it requires vectors to be populated. To generate these vector embeddings, we'll utilize OpenAI's text-embedding-ada-002 model. This model can be accessed through LangChain in the following way:

In [26]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

  warn_deprecated(


In [27]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)


This process results in two 1536-dimensional embeddings, corresponding to our two segments of text.

We are now prepared to embed and index all of our data! This is accomplished by iterating through our dataset, embedding, and inserting the data in batches.

In [28]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

  0%|          | 0/49 [00:00<?, ?it/s]

In [29]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

**Retrieval Augmented Generation**

We have successfully created an extensive knowledge base. The next step is to integrate this knowledge base with our chatbot. For this, we'll revisit LangChain and utilize the template prompt we set up earlier.

In this phase, we'll employ LangChain's abstraction for a vector index, known as a vectorstore. To get this up and running, we input our vector index to initialize the object.

In [35]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

In [36]:
query = "What is so special about Llama 2 ?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

In this scenario, we're dealing with an abundance of text, and it may not be immediately obvious which parts are necessary or relevant. Luckily, our Large Language Model (LLM) can process this information more efficiently than we can. Our primary task is to link the output from our vectorstore to our chatbot. This can be achieved using the same methodology we applied previously.

In [37]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

In [38]:
print(augment_prompt(query))


Using the contexts below, answer the query.

    Contexts:
    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety
asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwith

In [39]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging from 7 billion to 70 billion parameters. These LLMs, named L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are specifically optimized for dialogue use cases. They have been developed to outperform open-source chat models on various benchmarks and have shown promising results in terms of helpfulness and safety.

The special aspect of Llama 2 is that it provides pretrained and fine-tuned LLMs that can potentially serve as suitable substitutes for closed-source models. Closed-source models are heavily fine-tuned to align with human preferences, which enhances their usability and safety. However, this fine-tuning process is often costly and not easily reproducible, hindering progress in AI alignment research within the community.

Llama 2 aims to bridge this gap by offering LLMs that perform well on benchmarks and human evaluations, potentially serving as alternatives to closed-source models.

In [40]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

According to the provided context, the safety measures used in the development of Llama 2 are mentioned in the abstract. It states that Llama 2 models are optimized for dialogue use cases and have undergone fine-tuning and safety measures similar to those employed in closed-source models like ChatGPT, BARD, and Claude. These closed-product LLMs are heavily fine-tuned to align with human preferences, enhancing their usability and safety. The abstract also mentions that the fine-tuning step can require significant costs in compute and human annotation, and may not always be transparent or easily reproducible. However, specific details about the safety measures implemented in Llama 2 are not provided in the given context.


The chatbot can provide responses regarding Llama 2 by referencing its conversational history stored in messages. Yet, it lacks knowledge about the safety measures, since this information hasn't been supplied through the RAG (Retrieval Augmented Generation) pipeline. Let's attempt the process once more, this time incorporating RAG.

In [41]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, the paper mentions that safety measures were taken to increase the safety of the models. These measures include safety-specific data annotation and tuning, red-teaming, and iterative evaluations. These steps were implemented to ensure that the fine-tuned LLMs in Llama 2 are safe and suitable for dialogue use cases.

The paper also emphasizes the importance of improving the safety of LLMs and aims to enable the community to reproduce and further enhance the safety of these models. This openness and transparency in the development process are key to promoting responsible development and advancing the field of LLMs.

It's worth noting that the specific details of the safety measures, such as the exact methodologies employed, are not provided in the given context. For more comprehensive information, it would be beneficial to refer to the complete paper or any additional resources related to Llama 2.


In [42]:
pc.delete_index(index_name)