# Building interactive Chatbots with the OpenAI API and Pinecone using a RAG Workflow

In this project, we dive deep into the exciting realm of AI chatbots. Leveraging LangChain, OpenAI, and Pinecone vector DB, we aim to construct a chatbot that learns and evolves using **R**etrieval **A**ugmented **G**eneration (RAG).

Our journey begins with a dataset sourced from the Llama 2 ArXiv paper and other related research papers. This rich dataset will empower our chatbot to answer questions about the latest advancements in Generative AI (GenAI).

Throughout this project, we explore various concepts and workflows, including:

1. **Data Ingestion and Preprocessing**: We start by ingesting and preprocessing our dataset to ensure it is clean and ready for use.
2. **Vectorization and Indexing**: Using Pinecone, we vectorize our data and create an efficient index for fast retrieval.
3. **Integration with OpenAI API**: We integrate our system with the OpenAI API to generate responses based on the retrieved information.
4. **Building the RAG Pipeline**: We construct a robust RAG pipeline that combines retrieval and generation to provide accurate and informative responses.

The tech stack and skills we utilize include:
- **LangChain**: For building and managing the chatbot's conversational flow.
- **OpenAI API**: For generating human-like responses.
- **Pinecone**: For efficient vector storage and retrieval.
- **Python**: As our primary programming language.
- **NLP and Machine Learning**: Fundamental concepts that underpin our chatbot's functionality.

By the end of this project, we develop a fully functional chatbot and RAG pipeline capable of engaging in meaningful conversations and providing insightful answers based on a comprehensive knowledge base. This project serves as a stepping stone towards mastering more complex AI systems and gaining hands-on experience in the fields of AI, machine learning, and natural language processing.

## Setup

Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

- **openai**: This is the official OpenAI Python client. We'll use it to interact with the GPT large language model.
- **pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone vector DB where we will store our chatbot's knowledge base.
- **langchain**, **langchain-openai**, **langchain-pinecone**: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.
- **tiktoken**: This is a library from OpenAI that allows you to count the number of tokens in a text string without making an API call.
- **datasets**: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.

You can install these libraries using pip like so:

In [2]:
!pip install openai==1.27
!pip install pinecone-client==4.0.0
!pip install langchain==0.1.19
!pip install langchain-openai==0.1.6
!pip install langchain-pinecone==0.1.0
!pip install tiktoken==0.7.0
!pip install datasets==2.19.1
!pip install typing_extensions==4.11.0

Defaulting to user installation because normal site-packages is not writeable
Collecting openai==1.27
  Downloading openai-1.27.0-py3-none-any.whl.metadata (21 kB)
Downloading openai-1.27.0-py3-none-any.whl (314 kB)
Installing collected packages: openai
Successfully installed openai-1.27.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable
Collecting pinecone-client==4.0.0
  Downloading pinecone_client-4.0.0-py3-none-any.whl.metadata (16 kB)
Downloading pinecone_client-4.0.0-py3-none-any.whl (214 kB)
Installing collected packages: pinecone-client
Successfully installed pinecone-client-4.0.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[

## Building the Chatbot

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To get more familiar with the library let's first create a chatbot _without_ RAG.


### Initialize the chat model object.

- *Make sure you have defined the `OPENAI_API_KEY` environment variable and connected it. See the 'Setting up DataLab Integrations' section of getting-started.ipynb.*
- From the `langchain_openai` package, import `ChatOpenAI`.
- Initialize a `ChatOpenAI` object with the `gpt-3.5-turbo` model. Assign to `chat`.

In [3]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model="gpt-3.5-turbo")

### Instructions

Create a conversation.

- From langchain's schema module, import the three message types: `SystemMessage`, `HumanMessage`, and `AIMessage`.
- Create a conversation as a list of messages. Assign to `messages`.
    1. A system message with content `"You are a helpful assistant."`
    2. A human message with content `"Hi AI, how are you today?"`
    3. An AI message with content `"I'm great thank you. How can I help you?"`
    4. A human message with content `"I'd like to understand string theory."`


In [4]:
from langchain.schema import SystemMessage, HumanMessage, AIMessage

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

We generate the next response from the AI by passing these messages to the `ChatOpenAI` object. You can call `chat` as though it is a function.

### Chat with GPT.

- Invoke a chat with GPT, passing the messages, and get a response. Assign to `res`.
- Print the response.

In [5]:
res = chat.invoke(messages)
res

AIMessage(content="String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. According to string theory, the fundamental building blocks of the universe are not particles, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles and forces in the universe.\n\nString theory proposes that there are multiple dimensions beyond the familiar four dimensions of space and time. In fact, there are various versions of string theory, such as superstring theory and M-theory, that attempt to unify all known fundamental forces of nature.\n\nOne of the key ideas in string theory is the concept of supersymmetry, which posits a symmetry between particles with integer spin (bosons) and particles with half-integer spin (fermions). Supersymmetry could potentially explain the existence of dark matter and provide a framework for understanding the behavior of particles at very

Notice that the `AIMessage` object looks a bit like a dictionary. The most important element is `content`, which contains the chat text.

Print only the contents of the response.

In [6]:
# Print the contents of the response
print("\nContent Answer:\n")
print(res.content)


Content Answer:

String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. According to string theory, the fundamental building blocks of the universe are not particles, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles and forces in the universe.

String theory proposes that there are multiple dimensions beyond the familiar four dimensions of space and time. In fact, there are various versions of string theory, such as superstring theory and M-theory, that attempt to unify all known fundamental forces of nature.

One of the key ideas in string theory is the concept of supersymmetry, which posits a symmetry between particles with integer spin (bosons) and particles with half-integer spin (fermions). Supersymmetry could potentially explain the existence of dark matter and provide a framework for understanding the behavior of particles at very high

Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

### Continue the conversation with GPT.

- Append the latest AI response to `messages`.
- Create a new human message. Assign to `prompt`.
    - Use the content `"Why do physicists believe it can produce a 'unified theory'?"`
- Append the prompt to messages.

In [7]:
messages.append(res)
messages

[SystemMessage(content='You are a helpful assistant.'),
 HumanMessage(content='Hi AI, how are you today?'),
 AIMessage(content="I'm great thank you. How can I help you?"),
 HumanMessage(content="I'd like to understand string theory."),
 AIMessage(content="String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. According to string theory, the fundamental building blocks of the universe are not particles, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles and forces in the universe.\n\nString theory proposes that there are multiple dimensions beyond the familiar four dimensions of space and time. In fact, there are various versions of string theory, such as superstring theory and M-theory, that attempt to unify all known fundamental forces of nature.\n\nOne of the key ideas in string theory is the concept of supersymmetry, which posits a symmetry b

In [8]:
prompt = HumanMessage(content="Why do physicists believe it can produce a 'unified theory'?")
messages.append(prompt)

In [9]:
messages

[SystemMessage(content='You are a helpful assistant.'),
 HumanMessage(content='Hi AI, how are you today?'),
 AIMessage(content="I'm great thank you. How can I help you?"),
 HumanMessage(content="I'd like to understand string theory."),
 AIMessage(content="String theory is a theoretical framework in physics that attempts to reconcile quantum mechanics and general relativity. According to string theory, the fundamental building blocks of the universe are not particles, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles and forces in the universe.\n\nString theory proposes that there are multiple dimensions beyond the familiar four dimensions of space and time. In fact, there are various versions of string theory, such as superstring theory and M-theory, that attempt to unify all known fundamental forces of nature.\n\nOne of the key ideas in string theory is the concept of supersymmetry, which posits a symmetry b

### We keep going:

- Invoke the chat again to send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [10]:
res = chat.invoke(messages)
res

AIMessage(content='Physicists are interested in string theory as a potential candidate for a unified theory because it has the potential to encompass all known fundamental forces of nature within a single framework. Currently, the four fundamental forces of nature are gravity, electromagnetism, the weak nuclear force, and the strong nuclear force. These forces are described by different theories, such as general relativity for gravity and the Standard Model of particle physics for the other three forces.\n\nString theory offers the possibility of unifying these forces by describing them all in terms of the same underlying structure: vibrating strings. By incorporating gravity into the framework of quantum mechanics, string theory aims to provide a consistent description of the universe at both the smallest scales (quantum realm) and the largest scales (cosmological realm).\n\nIn addition, string theory naturally incorporates supersymmetry, which could help resolve some of the outstandi

## Hallucinations

## Understanding the Limitations of LLMs and the Concept of Hallucinations

We have our chatbot, but it's important to understand that the knowledge of Large Language Models (LLMs) can be limited. The reason for this is that LLMs learn all they know during their training phase. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. This knowledge is referred to as the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world. This means that GPT (or any other LLM) will perform poorly on certain types of questions:

- **Recent Events**: The chatbot doesn't know about recent events. For example, if you ask about the weather in your city today, it won't be able to provide an accurate answer.
- **Recent Code or Products**: It can't answer questions about recent code or products. Try asking it `"Can you tell me about the latest features in LangChain?"` or `"What was the latest course released regarding LLM concepts?"` and you'll see it struggle.
- **Confidential Information**: It can't answer questions about confidential corporate information that hasn't been released on the internet.

### Hallucinations

One critical issue to be aware of is the phenomenon of _hallucinations_. Hallucinations occur when the model generates information that is not based on its training data or any real-world facts. This can lead to the model providing incorrect or misleading information confidently. For example, the model might fabricate details about a non-existent scientific theory or provide inaccurate statistics.

Understanding these limitations and the potential for hallucinations is crucial for effectively using LLMs and interpreting their responses.

### Append the AI response to the list of messages.

- Print the number of messages in the conversation.
- Append the latest AI response to `messages`.
- Print the number of messages in the conversation again.

In [11]:
print("Total number of messages so far:\n", len(messages))

messages.append(res)

print("Total number of messages so far:\n", len(messages))

Total number of messages so far:
 6
Total number of messages so far:
 7


### Ask GPT about Llama 3.

- Create a new human message. Assign to `prompt`.
    - Use the content `"What is so special about Llama 3?"`.
- Append the prompt to `messages`.
- Invoke the chat to send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [12]:
promt = HumanMessage(content="What is so special about Llama 3?")
messages.append(promt)
res = chat.invoke(messages)
print(res.content)

I'm not sure what you are referring to as "Llama 3." Could you provide more context or clarify your question so I can better assist you?


### Ask GPT about LangChain.

- Append the latest AI response to `messages`.
- Create a new human message. Assign to `prompt`.
    - Use the content `"Can you tell me about the LLMChain in LangChain?"`.
- Append the prompt to `messages`.
- Send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [13]:
messages.append(res)
prompt = HumanMessage(content="Can you tell me about the LLMChain in LangChain?")
messages.append(prompt)
res = chat.invoke(messages)
print(res.content)

I'm sorry, but I am not familiar with a specific technology or concept called "LLMChain" in LangChain. It's possible that it may be a specific term or project within a particular context that I am not aware of. If you can provide more information or context about LLMChain and LangChain, I may be able to assist you further.


There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

### Create a string of knowledge about chains.

- *Read the descriptions of LLMChains, Chains, and LangChain given in `llmchain_information`.*
- Combine the list of description strings into a single string. Assign to `source_knowledge`.

In [14]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]
len(llmchain_information)

3

In [15]:
source_knowledge = "\n".join(llmchain_information)
source_knowledge

'A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.\nChains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.\nLangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

### Feeding extra additional knowledge for better answers and context

- Define a question. Assign to `query`.
    - Use the text `"Can you tell me about the LLMChain in LangChain?"`
- Create an augmented prompt containing the context and query. We assign to `augmented_prompt` varable.

In [16]:
query = "Can you tell me about the LLMChain in LangChain?"


augmented_prompt = f"""Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with the below information, say "I don't know".

  Contexts: {source_knowledge}

  Query: {query}"""

In [17]:
print(augmented_prompt)

Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with the below information, say "I don't know".

  Contexts: A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-a

Now we feed this into our chatbot as we did before.

Don't append the previous AI message, since it wasn't a good answer.

### We will include the augmented prompt in the conversation.

- Print the last message in the list.
- Replace the last message with a human message containing the augmented prompt.

In [18]:
print(messages[-1])

content='Can you tell me about the LLMChain in LangChain?'


In [19]:
messages[-1] = HumanMessage(content=augmented_prompt)

### Ask GPT about LangChain again, this time providing source knowledge.

- Send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [20]:
res = chat.invoke(messages)
res.content

'Based on the provided context, the LLMChain in LangChain is the most common type of chain within the LangChain framework. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. The LLMChain takes multiple input variables, formats them into a prompt using the PromptTemplate, passes the prompt to the model, and then uses the OutputParser (if provided) to parse the output of the language model into a final format. This chain is a key component in the LangChain framework for developing applications powered by language models.'

The quality of this answer is phenomenal! This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem—how do we get this information in the first place?

We learned in the previous code-alongs about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset.

## Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library and [the `"jamescalam/llama-2-arxiv-papers"` dataset](https://huggingface.co/datasets/jamescalam/llama-2-arxiv-papers-chunked). This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

### So, we satart loading the ArXiv papers dataset.

- From the *datasets* package, import `load_dataset`.
- Load the train split of the `jamescalam/llama-2-arxiv-papers-chunked` dataset. Assign to `dataset`.
- Print the dataset object to see the structure of the data.
- *Look at the structure. Which fields should we keep?*

In [21]:
from datasets import load_dataset

dataset = load_dataset("jamescalam/llama-2-arxiv-papers-chunked", split="train")

dataset

Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4838 [00:00<?, ? examples/s]

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

### Print a record of dataset to get a feel for what they contain.

In [22]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

### Dataset Summary

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2—at least not without this data.

## Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

### The workflow for setting up a chatbot is much the same as for setting up semantic serach and retrieval augmented generation, as seen in previous code-alongs.

- Initialize your connection to the Pinecone vector DB.
- Create an index (remember to consider the dimensionality of `text-embedding-ada-002`).
- Initialize OpenAI's `text-embedding-ada-002` model with LangChain.
- Populate the index with records (in this case from the Llama 2 dataset).

### Initialize Pinecone, getting setup details from Workspace environment variables.

- Import the os package.
- Import the pinecone package.
- Initialize pinecone, setting the API key. Assign to `pc`.

In [23]:
import os
import pinecone

pc = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"])

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

### Create a vector index in the Pinecone database.

- Import the time package.
- Choose a name for the vector index. Assign to `index_name`.
- Check if index_name is not in Pinecone's list of existing indexes.
    -  Create an index named index_name, dimension 1536, cosine similarity as its metric.
    -  While the index status is not ready, sleep for one second.

In [24]:
import time

index_name = "vector-index"

existing_index_names  = [idx.name for idx in pc.list_indexes().indexes]

if index_name not in existing_index_names:
    pc.create_index(
        index_name,
        dimension=1536,
        metric="cosine",
        spec=pinecone.ServerlessSpec(cloud="aws", region="us-east-1")
    )
    
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(1)

- Connect to the index using its name. Assign to `index`.
- View the index stats.

In [25]:
index = pc.Index(index_name)

index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model—we can access it via LangChain.

### Create an embeddings model.

- From the `langchain_openai` package, import `OpenAIEmbeddings`.
- Create an embedings model object for `text-embedding-ada-002`. Assign to `embed_model`.

In [26]:
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

Using this model we can create embeddings like so:

In [27]:
texts = [
    "this is a sentence",
    "this is another sentence"
]

res = embed_model.embed_documents(texts=texts)
len(res), len(res[0])

(2, 1536)

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

### Prepare the data for upserting to Pinecone.

- From *tqdm*, import `tqdm` (a progress bar).
- Select these columns: `doi`, `chunk-id`, `chunk`, `title`, `source`. Assign to `data_selected`.
- Convert `data_selected` to a pandas DataFrame in batch sizes of `100`. Assign to `data_batched`.

In [28]:
from tqdm import tqdm

data_selected = dataset.select_columns(["doi", "chunk-id", "chunk", "title", "source"])

data_batched = data_selected.to_pandas(batched=True, batch_size=100)

### Instructions

Split the dataset into batches and add it to the vector index.

- Loop over each batch in `data_batched`, adding a progress bar.
    - Concatenate the `doi` and `chunk-id` columns separated by `-`, then convert to a list. Assign to `ids`.
    - Get the `chunk` column and convert to a list. Assign to `texts`.
    - Use the embedding model to embed the texts. Assign to `embeds`.
    - Get the metadata from the batch. Assign to metadata.
        - Select the `chunk`, `title`, and `source` columns.
        - Apply the `dict` function to the columns axis.
        - Convert to a list.
    - Combine IDs, embeddings, and metadata as list of tuples. Assign to `to_upsert`.
    - Upsert to Pinecone.

In [29]:
from tqdm import tqdm

data_selected = dataset.select_columns(["doi", "chunk-id", "chunk", "title", "source"])
data_batched = data_selected.to_pandas(batched=True, batch_size=100)

for batch in tqdm(data_batched):
    ids = (batch["doi"] + "-" + batch["chunk-id"]).to_list()
    texts = batch["chunk"].to_list()
    embeds = embed_model.embed_documents(texts)
    metadata = batch[["chunk", "title", "source"]].apply(dict, axis="columns").to_list()
    to_upsert = zip(ids, embeds, metadata)
    index.upsert(vectors=to_upsert)

49it [01:49,  2.24s/it]


We can check that the vector index has been populated using `describe_index_stats` like before:

### Check on updates to the vector index now that it contains the ArXiv dataset.

- View the index stats again.
- *What has changed since you last looked?*

In [30]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

## Retrieval Augmented Generation

In the previous steps, we built a comprehensive knowledge base. Now, it's time to connect that knowledge base to our chatbot. We'll dive back into LangChain and reuse our template prompt from earlier.

### Workflow

1. **Create a LangChain `vectorstore` object**:
   - Utilize our existing `index` and `embed_model` to instantiate the `vectorstore`.

2. **Search for relevant information**:
   - Perform a search within the `vectorstore` for information related to "Llama 2".

3. **Define the `augment_prompt` function**:
   - This function will take a user query, retrieve relevant information from the `vectorstore`, and merge the results into a single retrieval-augmented prompt.

4. **Compare chatbot responses**:
   - Ask the chatbot questions about Llama 2 with and without Retrieval-Augmented Generation (RAG) and compare the differences in responses.

To use LangChain's RAG pipeline we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

### Initialize the vector store object.

- From the `langchain_pinecone` package, import `PineconeVectorStore`.
- State the metadata field that contains our text (`"chunk"`). Assign to `text_field`.
- Create a `PineconeVectorStore` from the index, the embedding model, and the text field. Assign to `vectorstore`.

In [31]:
from langchain_pinecone import PineconeVectorStore

text_field = "chunk"

vectorstore = PineconeVectorStore(index, embed_model, text_field)

Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

### We then perform similarity search against a question.

- Define a question. Assign to query.
    - Use the text `"What is so special about Llama 2?"`.
- Perform a similarity search for the query, returning the top 3 results.

In [32]:
query = "What is so special about Llama 2?"
vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [33]:
def augment_prompt(query: str):
    results = vectorstore.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augmented_prompt = f"""Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with 
                            the below information, say "I don't know".

Contexts: {source_knowledge}

Query: {query}
"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [34]:
print(augment_prompt(query))

Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with 
                            the below information, say "I don't know".

Contexts: Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesub

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

### Ask now GPT about LLama2, augmenting the prompt with source knowledge from the Pinecone vector index.

- Create a new human message. Assign to `prompt`.
    - Call `augment_prompt()` on the query and use this as the content.
- Append the prompt to `messages`.
- Send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [38]:
promt = HumanMessage(content=augment_prompt(query))
messages.append(promt)
res = chat.invoke(messages)
res.content

'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) that are optimized for dialogue use cases. These models range in scale from 7 billion to 70 billion parameters. Llama 2 models outperform open-source chat models on various benchmarks and have been evaluated positively for helpfulness and safety. They are considered to potentially be suitable substitutes for closed-source models. The detailed approach to fine-tuning and safety in Llama 2 models is highlighted, showcasing their performance and potential as advanced language models within the AI community.'

We can continue with more Llama 2 questions. Let's try _without_ RAG first:

### Ask GPT about LLama 2.

- Create a new human message. Assign to `prompt`.
    - Use the context `"What safety measures were used in the development of llama 2?"`.
- Invoke a chat with GPT sending the messages plus the prompt. Assign to `res`.
    - *Don't use `.append()` here, as we don't want to store the latest message in the conversation. The prompt needs to be converted to a list to add it to the existing list.*
- Print the contents of the response.

In [39]:
prompt = HumanMessage(content="What safety measures were used in the development of llama 2?")
res = chat.invoke(messages + [prompt])
res.content

'Safety measures used in the development of Llama 2 included conducting human evaluations for helpfulness and safety. The Llama 2 models were fine-tuned to align with human preferences, enhancing their usability and safety. The models outperformed open-source chat models on various benchmarks tested. Additionally, the approach to fine-tuning and safety was detailed, and comparisons were made with closed-source models in terms of performance and human evaluations. These safety measures aimed to ensure that the Llama 2 language models were effective, safe, and aligned with human expectations.'

The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

### Ask GPT about LLama 2 again.

- Do the same thing again, but this time augment the prompt using `augment_prompt()`.

In [37]:
prompt = HumanMessage(content=augment_prompt("What safety measures were used in the development of llama 2?"))
res = chat.invoke(messages + [prompt])
res.content

'The safety measures used in the development of Llama 2 included:\n\n1. Safety-specific data annotation and tuning: This involved annotating and tuning the data specifically to enhance the safety aspects of the models.\n2. Conducting red-teaming: Red-teaming involves simulating attacks or adversarial scenarios to test the robustness and security of the models.\n3. Employing iterative evaluations: Continuously evaluating the models through iterations to identify and address any potential safety concerns or vulnerabilities.\n\nThese measures were implemented to increase the safety of the Llama 2 models and ensure responsible development of large language models (LLMs).'

We get a much better informed response that includes several items missing in the previous non-RAG response, such as "red-teaming", "iterative evaluations", and the intention of the researchers to share this research to help "improve their safety, promoting responsible development in the field".

## Summary & Conclusion

A sophisticated chatbot capable of answering questions about cutting-edge large language models, specifically focusing on Llama 2, was built. This project is a significant part of a Generative AI workflow, showcasing the ability to integrate various advanced technologies and methodologies. Here are the major insights and skills demonstrated throughout this project:

### Key Insights

1. **Conversational AI Development**:
   - Learned how to maintain a coherent conversation with GPT by appending messages, ensuring context is preserved across interactions.
   - This involved understanding the structure of conversational history and effectively managing it to improve the chatbot's responses.

2. **Contextual Prompting**:
   - Explored the importance of providing context within prompts to enhance the quality of GPT's answers.
   - By augmenting prompts with relevant information, the chatbot's ability to provide accurate and detailed responses was significantly improved.

3. **Vector Databases and Retrieval-Augmented Generation (RAG)**:
   - Set up a Pinecone database, a vector database that allows for efficient storage and retrieval of high-dimensional data.
   - Added data to a vector index, enabling the chatbot to retrieve relevant text chunks based on user queries.
   - This integration of RAG techniques allowed the chatbot to access external knowledge, thereby answering questions that GPT alone could not handle.

4. **Data Handling and Preprocessing**:
   - Demonstrated proficiency in handling and preprocessing data, as evidenced by the creation and management of a DataFrame containing relevant information.
   - This skill is crucial for ensuring that the data fed into the model is clean, relevant, and structured appropriately.
     

### Skills and Thought Process

1. **Technical Proficiency**:
   - The ability to work with advanced AI models like GPT and Llama 2 highlights technical expertise in the field of generative AI.
   - Setting up and managing a Pinecone database showcases skills in working with modern data storage solutions.

2. **Problem-Solving and Innovation**:
   - The project required innovative problem-solving, particularly in enhancing the chatbot's responses through contextual prompting and RAG.
   - The approach to augmenting prompts and integrating external data sources demonstrates a deep understanding of how to leverage AI capabilities effectively.

3. **Attention to Detail**:
   - The meticulous management of conversational history and the careful augmentation of prompts reflect attention to detail.
   - Ensuring that the chatbot provides accurate and contextually relevant answers required a thorough and thoughtful approach.

4. **Project Management**:
   - Successfully combining various components—conversational AI, vector databases, and RAG—into a cohesive project highlights project management skills.
   - This project serves as a testament to the ability to plan, execute, and refine complex AI systems.

### Final Thoughts

This project not only showcases technical skills and innovative thinking but also underscores the ability to manage and execute complex AI projects. By building a chatbot that can answer questions about any topic, a comprehensive understanding of conversational AI, contextual prompting, and the integration of external data sources is ~~demonstrated.~~