[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/gpt-4-langchain-docs.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/gpt-4-langchain-docs.ipynb)

# GPT4 with Retrieval Augmentation over LangChain Docs

In this notebook we'll work through an example of using GPT-4 with retrieval augmentation to answer questions about the LangChain Python library.

[![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/full-link.svg)](https://github.com/pinecone-io/examples/blob/master/learn/generation/openai/gpt-4-langchain-docs.ipynb)

To begin we must install the prerequisite libraries:

In [14]:
!pip install -qU \
    openai==1.66.3 \
    pinecone==5.4.2 \
    pinecone-datasets==1.0.2 \
    pinecone-notebooks==0.1.1 \
    tqdm

---

🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._

---

In this example, we will use a pre-embedding dataset of the LangChain docs from [python.langchain.readthedocs.com/](https://python.langchain.com/en/latest/). If you'd like to see how we perform the data preparation refer to [this notebook]().

The embeddings were produced with OpenAI's `text-embedding-ada-002` model which outputs embeddings with dimension `1536`.

Let's go ahead and download the dataset.

In [15]:
from pinecone_datasets import load_dataset

dataset = load_dataset('langchain-python-docs-text-embedding-ada-002')

In [16]:
# We drop the sparse_values column since it is not needed in this demo
dataset.documents.drop(['sparse_values'], axis=1, inplace=True)

# We rename the blob column to metadata
dataset.documents.drop(['metadata'], axis=1, inplace=True)
dataset.documents.rename(columns={'blob': 'metadata'}, inplace=True)

dataset.head()

Loading documents parquet files: 100%|██████████| 2/2 [00:32<00:00, 16.08s/it]


Unnamed: 0,id,values,metadata
0,417ede5d-39be-498f-b518-f47ed4e53b90,"[0.005949743557721376, 0.01983247883617878, -0...","{'chunk': 0, 'text': '.rst .pdf Welcome to Lan..."
1,110f550d-110b-4378-b95e-141397fa21bc,"[0.009401749819517136, 0.02443608082830906, 0....","{'chunk': 1, 'text': 'Use Cases# Best practice..."
2,d5f00f02-3295-4567-b297-5e3262dc2728,"[-0.005517194513231516, 0.0208403542637825, 0....","{'chunk': 2, 'text': 'Gallery: A collection of..."
3,0b6fe3c6-1f0e-4608-a950-43231e46b08a,"[-0.006499645300209522, 0.0011573900701478124,...","{'chunk': 0, 'text': 'Search Error Please acti..."
4,39d5f15f-b973-42c0-8c9b-a2df49b627dc,"[-0.005658374633640051, 0.00817849114537239, 0...","{'chunk': 0, 'text': '.md .pdf Dependents Depe..."


Let's take a look at what sort of metadata we're working with in this dataset.

In [18]:
from pprint import pprint

print("Here are some example entries in our Knowledge Base:\n")
for r in dataset.documents.iloc[0:3].to_dict(orient="records"):
    pprint(r['metadata'])

Here are some example entries in our Knowledge Base:

{'chunk': 0,
 'text': '.rst\n'
         '.pdf\n'
         'Welcome to LangChain\n'
         ' Contents \n'
         'Getting Started\n'
         'Modules\n'
         'Use Cases\n'
         'Reference Docs\n'
         'Ecosystem\n'
         'Additional Resources\n'
         'Welcome to LangChain#\n'
         'LangChain is a framework for developing applications powered by '
         'language models. We believe that the most powerful and '
         'differentiated applications will not only call out to a language '
         'model, but will also be:\n'
         'Data-aware: connect a language model to other sources of data\n'
         'Agentic: allow a language model to interact with its environment\n'
         'The LangChain framework is designed around these principles.\n'
         'This is the Python specific portion of the documentation. For a '
         'purely conceptual guide to LangChain, see here. For the JavaScript '
      

Our chunks are ready so now we move onto embedding and indexing everything.

## Initializing the Pinecone client

Now the data is ready, we can set up our index to store it.

We begin by instantiating the Pinecone client. To do this we need a [free API key](https://app.pinecone.io).

In [20]:
import os

if not os.environ.get("PINECONE_API_KEY"):
    from pinecone_notebooks.colab import Authenticate
    Authenticate()

In [22]:
from pinecone import Pinecone

api_key = os.environ.get("PINECONE_API_KEY")

# Configure client
pc = Pinecone(api_key=api_key)

### Creating a Pinecone Index

When creating the index we need to define several configuration properties. 

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. 
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

There are more configurations available, but this minimal set will get us started.

In [23]:
from pinecone import ServerlessSpec

index_name = 'gpt-4-langchain-docs-fast'

# check if index already exists (it shouldn't if this is first time)
if not pc.has_index(name=index_name):
    # if does not exist, create index
    pc.create_index(
        name=index_name,
        dimension=1536,  # dimensionality of text-embedding-ada-002
        metric='cosine',
        spec=ServerlessSpec(cloud='aws', region='us-east-1')
    )

pc.describe_index(name=index_name)

{
    "name": "gpt-4-langchain-docs-fast",
    "dimension": 1536,
    "metric": "cosine",
    "host": "gpt-4-langchain-docs-fast-dojoi3u.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "deletion_protection": "disabled"
}

## Storing data in the Index

First we need to instantiate an Index client that can interact with the index we just created.

In [24]:
# Instantiate an Index client
index = pc.Index(name=index_name)

# View index stats for the new index
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:

In [26]:
index.upsert_from_dataframe(
    df=dataset.documents, 
    batch_size=100
)

sending upsert requests: 100%|██████████| 6952/6952 [01:00<00:00, 114.03it/s]


{'upserted_count': 6952}

Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4.

## Retrieval

To search through our documents we first need to create a query vector `xq`. Using `xq` we will retrieve the most relevant chunks from the LangChain docs. To create that query vector we must initialize a `text-embedding-ada-002` embedding model with OpenAI. For this, you need an [OpenAI API key](https://platform.openai.com/).

In [33]:
def create_embedding(query):
    from openai import OpenAI

    # Get OpenAI api key from platform.openai.com
    openai_api_key = os.getenv('OPENAI_API_KEY') or 'sk-...'

    # Instantiate the OpenAI client
    client = OpenAI(api_key=openai_api_key)

    # Create an embedding
    res = client.embeddings.create(
      model="text-embedding-ada-002",
      input=[query],
    )
    return res.data[0].embedding

In [35]:
query = "how do I use the LLMChain in LangChain?"

# retrieve from Pinecone
xq = create_embedding(query)

# get relevant contexts (including the questions)
res = index.query(vector=xq, top_k=5, include_metadata=True)
res

{'matches': [{'id': '2f66a6a5-c829-4118-acb8-f08667f3f95d',
              'metadata': {'chunk': 2.0,
                           'text': 'for full documentation on:\\n\\nGetting '
                                   'started (installation, setting up the '
                                   'environment, simple examples)\\n\\nHow-To '
                                   'examples (demos, integrations, helper '
                                   'functions)\\n\\nReference (full API '
                                   'docs)\\n\\nResources (high-level '
                                   'explanation of core '
                                   'concepts)\\n\\nð\\x9f\\x9a\\x80 What can '
                                   'this help with?\\n\\nThere are six main '
                                   'areas that LangChain is designed to help '
                                   'with.\\nThese are, in increasing order of '
                                   'complexity:\\n\\nð\\x9f“\\x83 LLMs

With retrieval complete, we move on to feeding these into GPT-4 to produce answers.

## Retrieval Augmented Generation

GPT-4 is currently accessed via the `ChatCompletions` endpoint of OpenAI. 

To get a richer response from the LLM that includes context from our knowledge base, we need to retrieve context relevant to the query and then include it into the chat completion prompt.

In [37]:
def retrieval_augmented_prompt(query):
    context_limit = 3750
    xq = create_embedding(query)

    # Get relevant contexts
    query_results = index.query(vector=xq, top_k=3, include_metadata=True)
    contexts = [
        x.metadata['text'] for x in query_results.matches
    ]

    # Build our prompt with the retrieved contexts included
    prompt_start = (
        "Answer the question based on the context below.\n\n"+
        "Context:\n"
    )
    prompt_end = (
        f"\n\nQuestion: {query}\nAnswer:"
    )
    context_separator = "\n\n---\n\n"

    # Join contexts and trim to fit within limit
    combined_contexts = []
    total_length = 0
    
    for context in contexts:
        new_length = total_length + len(context) + len(context_separator)
        if new_length >= context_limit:
            break
        combined_contexts.append(context)
        total_length = new_length
    
    return prompt_start + context_separator.join(combined_contexts) + prompt_end

In [39]:
prompt = retrieval_augmented_prompt(query)
print(prompt)

Answer the question based on the context below.

Context:
for full documentation on:\n\nGetting started (installation, setting up the environment, simple examples)\n\nHow-To examples (demos, integrations, helper functions)\n\nReference (full API docs)\n\nResources (high-level explanation of core concepts)\n\nð\x9f\x9a\x80 What can this help with?\n\nThere are six main areas that LangChain is designed to help with.\nThese are, in increasing order of complexity:\n\nð\x9f“\x83 LLMs and Prompts:\n\nThis includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.\n\nð\x9f”\x97 Chains:\n\nChains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.\n\nð\x9f“\x9a Data Augmented Generation:\n\nData Augmented Generation involves specific types o

Now we ask the question of the LLM using chat completion:

In [41]:
def chat_completion(prompt):
    from openai import OpenAI

    # Get OpenAI api key from platform.openai.com
    openai_api_key = os.getenv('OPENAI_API_KEY') or 'sk-...'

    # Instantiate the OpenAI client
    client = OpenAI(api_key=openai_api_key)
    
    # Instructions
    sys_prompt = f"""You are Q&A bot. A highly intelligent system that answers
    user questions based on the information provided by the user above
    each question. If the information can not be found in the information
    provided by the user you truthfully say "I don't know".
    """
    
    res = client.chat.completions.create(
        model='gpt-4o-mini-2024-07-18',
        messages=[
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    return res.choices[0].message.content.strip()

def rag(query):
    prompt = retrieval_augmented_prompt(query)
    return chat_completion(prompt)

In [45]:
answer = rag("How do I build a chatbot with LangChain?")
answer

"To build a chatbot with LangChain, you can follow these general steps:\n\n1. **Set Up Your Environment**: Ensure you have Python or JavaScript installed, as LangChain supports these programming languages.\n\n2. **Install LangChain**: Use pip for Python or npm for JavaScript to install the LangChain framework.\n\n3. **Define Your Chatbot Logic**: Utilize LangChain's abstractions to define how your chatbot will process user inputs and generate responses. This may involve setting up a conversational flow and integrating with a language model like OpenAI's GPT-3.\n\n4. **Integrate Data Sources**: If your chatbot needs to access specific information, you can connect it to databases or APIs using LangChain's capabilities.\n\n5. **Test Your Chatbot**: Run your application and test the chatbot's responses to ensure it behaves as expected.\n\n6. **Deploy Your Chatbot**: Once satisfied with its performance, deploy your chatbot to a platform where users can interact with it.\n\nLangChain provide

To display this response nicely, we will display it in markdown.

In [46]:
from IPython.display import Markdown

display(Markdown(answer))

To build a chatbot with LangChain, you can follow these general steps:

1. **Set Up Your Environment**: Ensure you have Python or JavaScript installed, as LangChain supports these programming languages.

2. **Install LangChain**: Use pip for Python or npm for JavaScript to install the LangChain framework.

3. **Define Your Chatbot Logic**: Utilize LangChain's abstractions to define how your chatbot will process user inputs and generate responses. This may involve setting up a conversational flow and integrating with a language model like OpenAI's GPT-3.

4. **Integrate Data Sources**: If your chatbot needs to access specific information, you can connect it to databases or APIs using LangChain's capabilities.

5. **Test Your Chatbot**: Run your application and test the chatbot's responses to ensure it behaves as expected.

6. **Deploy Your Chatbot**: Once satisfied with its performance, deploy your chatbot to a platform where users can interact with it.

LangChain provides the necessary tools and frameworks to simplify these steps, making it easier to develop a chatbot powered by large language models.

Let's compare this to a non-augmented query...

In [55]:
def non_augmented_prompt(query):
    return f"""
Question: {query}
Answer:
"""

answer2 = chat_completion(non_augmented_prompt("How do I create a chatbot with Langchain?"))

display(Markdown(answer2))

I don't know.

If we drop the `"I don't know"` part of the `sys_prompt`, the LLM will try to pull an answer out of things it already knows. These may or may not be correct.

In [None]:
def hallucinating_chat_completion(prompt):
    from openai import OpenAI

    # Get OpenAI api key from platform.openai.com
    openai_api_key = os.getenv('OPENAI_API_KEY') or 'sk-...'

    # Instantiate the OpenAI client
    client = OpenAI(api_key=openai_api_key)
    
    # Instructions
    sys_prompt = f"""You are helpful Q&A bot."""
    
    res = client.chat.completions.create(
        model='gpt-4o-mini-2024-07-18',
        messages=[
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    return res.choices[0].message.content.strip()

answer3 = hallucinating_chat_completion(non_augmented_prompt("How do I create a chatbot with Langchain?"))
display(Markdown(answer3))

Then we see something even worse than `"I don't know"` — hallucinations. Clearly augmenting our queries with additional context can make a huge difference to the performance of our system and ensure that trusted information is given priority when composing a response.

Great, we've seen how to augment GPT-4 with semantic search to allow us to answer LangChain specific queries.

## Demo cleanup

Once you're finished, we delete the index to save resources.

In [34]:
pc.delete_index(name=index_name)

---