# RAG Chatbot with PineCone VectorDB

In [None]:
!pip install -qU \
    langchain==0.0.292 \
    openai==0.28.0 \
    datasets==2.14.5 \
    pinecone-client==2.2.4 \
    tiktoken==0.5.1 \
    cohere==4.27

### Building a Chatbot (no RAG)

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a `ChatOpenAI` object. For this we do need an [OpenAI API key](https://platform.openai.com/account/api-keys).

In [None]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = "MY OPENAI KEY"  # Changed due to security reasons

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```



In [None]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

Swap the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

Response from Chat Open AI - ----------

In [None]:
res = chat(messages)
res

In response we get another AI message object. We can print it more clearly like so:

In [None]:
print(res.content)

Add another message and generate the response

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

### Importing the Data

Dataset -  Hugging Face Datasets library to load our data. Dataset - scientific_papers , which will serve as the external knowledge base for the chatbot.

#Login to HuggingFace

In [None]:
from huggingface_hub import notebook_login
notebook_login()

In [None]:
!pip install -U datasets

In [None]:
from datasets import load_dataset

In [None]:
from datasets import load_dataset
dataset = load_dataset("scientific_papers","arxiv")
#dataset = load_dataset("arxiv_dataset","arxiv")

dataset

In [None]:

!pip install -qU \
  transformers==4.33.1 \
  sentence-transformers==2.2.2 \
  pinecone-client==2.2.2 \
  datasets==2.14.0 \
  accelerate==0.21.0 \
  einops==0.6.1 \
  langchain==0.0.240 \
  xformers==0.0.20 \
  bitsandbytes==0.41.0

In [None]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)

In [None]:

docs = [
    "this is one document",
    "and another document"
]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with "
      f"a dimensionality of {len(embeddings[0])}.")

### Use PineCone DB as Vector DB 

In [None]:
import os
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key='My Pinecone Key', # Changed due to security reasons
    environment='gcp-starter'
)

In [None]:
import time

index_name = 'llama-2-rag'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=len(embeddings[0]),
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

In [None]:

index = pinecone.Index(index_name)
index.describe_index_stats()

In [None]:
!pip install datasets==2.15.0

In [None]:
#!pip install -U transformers --no-index --find-links=file:///kaggle/input/huggingfaces/transformers

In [None]:
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"

### List Huggingface datasets to verify

In [None]:
from huggingface_hub import list_datasets
print([dataset.id for dataset in list_datasets()])

In [None]:
import torchtext
print(torchtext.__version__)


In [None]:
!pip install torchtext==0.16.2

#### Dataset 

**L**arge **L**anguage **M**odels (LLMs) cannot answer the queries specific to our dataset - scientific papers from HuggingFace

### Task 4: Building the Knowledge Base

Vector DB - Pinecone 

Setup

In [None]:
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key="My PineCone Key", # Changed due to security reasons
    environment="gcp-starter"
)

Then we initialize the index.Use OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

In [None]:
import time

index_name = 'llama-2-rag'
pinecone.delete_index(index_name)
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pinecone.Index(index_name)

Use the index

In [None]:
index.describe_index_stats()

Use the model from OpenAI using Langchain

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

Embedding creation

In [None]:
! pip install tiktoken 

In [None]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [None]:
import uuid
from tqdm import tqdm

documents = dataset["train"]["abstract"]
batch_size = 20
for i in tqdm(range(0, len(documents), batch_size)):
    i_end = min(len(documents), i+batch_size)
    batch = documents[i:i_end]
    embeddings = embed_model.embed_documents(batch)
    ids = [uuid.uuid4().hex for _ in batch]
    metadata = [
        {'text': x} for x in batch
    ]
    print(i_end)
    index.upsert(vectors=zip(ids, embeddings, metadata))

Even though we have error , small amount of data is loaded for our current chatbot ,  vector index has been populated - `describe_index_stats` 

In [None]:
index.describe_index_stats()

### RAG Chatbot - Retrieval Augmented Generation

load the LangChain abstraction for a vector index, called a `vectorstore`

In [None]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field)

Query the index and see if we have any relevant information for the question from the scientific paper

In [None]:
query = "What is leptonic delay?"

vectorstore.similarity_search(query, k=3)

This information is not clear . Our LLM will be able to parse this information much faster than us. Lets  connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [None]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

Create our augmented prompt:

In [None]:
print(augment_prompt(query))

There is still a lot of text here, so let's pass it onto our chat model by converting to Human Message 

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

## WITHOUT RAG 

In [None]:
prompt = HumanMessage(
    content="what is leptonic decay?give study details about xmath"
)

res = chat(messages + [prompt])
print(res.content)

The chatbot is able to respond about Leptonic decay  However, it doesn't know anything about the safety measures xmath parameters inside the scientific papers, lets use RAG chatbot for this purpose, 

## WITH RAG

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what is leptonic decay? give study details about xmath"
    )
)

res = chat(messages + [prompt])
print(res.content)

This response clearly highlights the xmath parameters specific to scientific papers unlike the LLM response , hence RAG chatbot is working as expected by giving detail response regarding Leptonic Decay and the xmath parameters