[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)

## Usingg RAG with LangChain

Installing necessary libraries:

In [1]:
%pip install -qU \
    langchain==0.0.292 \
    openai==0.28.0 \
    datasets==2.10.1 \
    pinecone-client==2.2.4 \
    tiktoken==0.5.1
print("DONE")

Note: you may need to restart the kernel to use updated packages.
DONE


##### Building chatbot with llm (no RAG)

In [2]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = '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'

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)
print("Chat model uploaded successfully")

Chat model uploaded successfully


Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.
User: Hi AI, how are you today?
Assistant: I'm great thank you. How can I help you?
User: I'd like to understand string theory.
Assistant:
```
The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```

In [15]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a sarcastic assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to learn driving in short.")
]

In [6]:
res = chat(messages)
print(res.content)

Oh, I see. So you're looking for a crash course in driving? Well, I hate to break it to you, but driving isn't exactly something you can learn in a short amount of time. It takes practice, patience, and a lot of trial and error. But hey, if you're up for the challenge, I'm here to help. What specifically would you like to know about driving?


Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

 Checking Hallucinations - By default, LLMs have no access to the external world. For example: 

Our chatbot can no longer help us, it doesn't contain the information we need to answer the question but it may like to inform - OpenAI have since adjusted the behavior for this particular example as we can see below:

In [5]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)
print(res.content)

 Feed knowledge into LLMs =  _source knowledge_ and it refers to any information fed into the LLM via the prompt.

In [None]:
llmchain_information = [
    "A LLMChain is the most common type of chain developed by COCA_COLA.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains)", 
    "LangChain is a framework for developing applications powered by language models"
]

source_knowledge = "\n".join(llmchain_information)
print(source_knowledge)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [None]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

##### Importing Data & Building  Knowledge Base

In [17]:
import pinecone
import os

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or "105a834c-ac4d-4356-9438-c623bff17907",
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'gcp-starter'
)
print("DONE")

DONE


Initialing the index. To use OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

In [18]:
'''Creating an index is necessary for efficient retrieval of data from a large dataset.When you create an index,
 you are essentially creating a data structure that allows you to quickly locate specific 
 records based on certain criteria '''

import time

index_name = 'llama-2-rag'
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pinecone.Index(index_name)
print("DONE")

DONE


In [19]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model — we can access it via LangChain like so:

In [20]:
import warnings
warnings.filterwarnings('ignore')

import openai
api_key = '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'

from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002", api_key=api_key)
print("DONE")

DONE


We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [21]:
import pandas as pd
import requests
import pinecone
from tqdm.auto import tqdm

macro_data = pd.read_csv("macro_1.csv")
weather_data = pd.read_csv("weather_1.csv")
business_performance_data = pd.read_csv("business_performance_1.csv")
media_data = pd.read_csv("media_1.csv")

# Combine all CSV data into a single DataFrame
data = pd.concat([macro_data, weather_data, business_performance_data, media_data], axis = 1)
duplicate_columns = data.columns.duplicated()
data = data.loc[:, ~duplicate_columns]


# Select columns to use for text embedding
text_columns = ["state_gdp($_million_(USD))", "population", "pcc_category", 
                "temperature (Celsius)", "rainfall (mm)", 
                "value_sales", "volume_sales", "spend($)"]

# Columns to include as explicit metadata
explicit_metadata_columns = ["geography", "days", "platform", "business_entity"]


batch_size = 100  # Adjust batch size as needed

for batch_id, i in enumerate(tqdm(range(0, len(data), batch_size))):
    i_end = min(len(data), i + batch_size)
    
    # Get batch of data
    batch = data.iloc[i:i_end]
    #print(f"Processing the {batch_id}: ")
    #print(batch)
    
    # Generate texts for embedding
    texts = batch[text_columns].apply(lambda row: " ".join(row.values.astype(str)), axis=1).tolist()
    #print(f"Texts in Batch {batch_id}:")
    #print(texts)
    #print(type(texts))
    
    # Generate embeddings for the batch of texts
    embeddings = embed_model.embed_documents(texts=texts)
    
    print(f"Columns considered for metadata: {', '.join([col for col in data.columns if col not in text_columns])}")
   
    # Generate explicit metadata for the batch
    metadata = [
        {
            column: row[column]
            for column in explicit_metadata_columns
        }
        for _, row in batch.iterrows()
    ]

    # Print a sample row and its metadata
    print(f"Metadata for Batch {batch_id}:")
    for idx, meta in enumerate(metadata):
        print(f"Row {idx + i}: {meta}")

    # Add embeddings and metadata to Pinecone index
    index.upsert(vectors=zip(texts, embeddings, metadata), ids=[f"Batch_{batch_id}_{j}" for j in range(len(texts))])

print("\nEmbeddings and metadata successfully added to Pinecone index.")


  0%|          | 0/2 [00:00<?, ?it/s]

Columns considered for metadata: days, geography, business_entity, platform
Metadata for Batch 0:
Row 0: {'geography': 'UK', 'days': 'Tuesday', 'platform': 'Facebook', 'business_entity': 'Marketing'}
Row 1: {'geography': 'Tokyo', 'days': 'Monday', 'platform': 'Facebook', 'business_entity': 'Distribution'}
Row 2: {'geography': 'UK', 'days': 'Thursday', 'platform': 'Youtube', 'business_entity': 'Packaging'}
Row 3: {'geography': 'USA', 'days': 'Monday', 'platform': 'Youtube', 'business_entity': 'Marketing'}
Row 4: {'geography': 'Sydney', 'days': 'Friday', 'platform': 'Twitter', 'business_entity': 'Bottling'}
Row 5: {'geography': 'Brazil', 'days': 'Thursday', 'platform': 'Facebook', 'business_entity': 'Logistics'}
Row 6: {'geography': 'Sydney', 'days': 'Saturday', 'platform': 'Facebook', 'business_entity': 'R&D'}
Row 7: {'geography': 'Tokyo', 'days': 'Friday', 'platform': 'Instagram', 'business_entity': 'Packaging'}
Row 8: {'geography': 'Germany', 'days': 'Sunday', 'platform': 'Twitter', '

In [22]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.002,
 'namespaces': {'': {'vector_count': 200}},
 'total_vector_count': 200}

In [23]:
for embedding in embeddings[:5]:
    print(embedding)
    
print("\nDimentions of each embedding as explicitly while defining pinecone create index") 
print(len(embedding))

[-0.017077334158274, 0.003922260816675285, 0.011153507934028048, -0.036284282661384754, -0.039761748009117845, 0.009165420161043127, -0.027361474824046203, -0.023210078386929518, -0.0272266900962856, -0.00464336339427835, 0.004218788800996988, 0.012164399911490584, -0.004704017173696422, 0.004282812105476818, 0.0005286122302304434, 0.04296964762662001, 0.017980398539792566, -0.011942003806833985, 0.019382167531612458, -0.0219296160226232, -0.020123489122231213, 0.013694216443592705, -0.006789823967440038, -0.004562492464489729, -0.008633017506156509, -0.0039795446053703136, 0.007723214540175712, -0.020676109858810948, -0.0032820291688608412, -0.036284282661384754, -0.003036045457449366, 0.00954282000647602, -0.024598371141147518, 0.0018482474770631432, -0.01965173884977881, -0.0016384873242187806, 0.00701559006281968, -0.029437173042604545, 0.026229275287747524, -0.008545406129260516, 0.022414844395322635, 0.010506536770428801, -0.009940437002279461, 0.014529886598585826, -0.0124541883

##### Retrieval Augmented Generation

We've fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [24]:
from langchain.vectorstores import Pinecone

# my data doesn't contain any textual description. 
#It only contains categorical or numerical description

text_field = "business_entity"  

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [None]:
'''
Given the text_columns:

"Sales on maximum number of days"
"Geography with highest sales volume"
"Platform with most sales value"
Considering the metadata columns (explicit_metadata_columns):

"Geography with highest population"
"Days with highest value sales"
"Business entity with the highest sales volume"
'''

In [26]:
query = "business_entity with highest sales volume"

# This assumes that the similarity_search() method returns a list of tuples, 
# where each tuple contains the entity and its corresponding similarity score
results = vectorstore.similarity_search(query, k=5)

for result in results:
    entity = result.fields['business_entity']  # Get the entity/item attribute
    score = result.score   # Get the score or similarity
    print(f"Entity: {entity}, Score: {score}")

TypeError: 'Document' object is not subscriptable

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [None]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [None]:
print(augment_prompt(query))

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

We can continue with more Llama 2 questions. Let's try _without_ RAG first:

In [None]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

We get a much more informed response that includes several items missing in the previous non-RAG response, such as "red-teaming", "iterative evaluations", and the intention of the researchers to share this research to help "improve their safety, promoting responsible development in the field".