[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-augmentation.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-augmentation.ipynb)

# Retrieval-Augmented Generation with Pinecone, LangChain and OpenAI

## Fixing LLMs that Hallucinate

In this notebook, you'll learn one of the most common applications of retrieval-augmented generation: giving large language models access to up-to-date information. 


### Demo Data: Pinecone Documentation

A great example to use RAG is when augmenting LLMs with information that may not exist in their training data. This could private data, internal company information, or data that has been updated post a training cutoff. In our case, many modern LLMs are rained on Pinecone data that has since been updated, such as release notes, quickstart guides, blog posts, docs, and code.

In this example, we'll show the differences in generation from OpenAI's LLMs when asked about Pinecone's release notes! We'll orchestrate our RAG workflow using LangChain, a popular framework for AI applications.

In [1]:
!pip install -qU \
    "langchain[openai]"\
    langchain-text-splitters==0.3.8 \
    langchain-pinecone==0.2.1 \
    pinecone-notebooks==0.1.1

---

## Building a Knowledge Base with our Vector Database

Building more reliable LLMs tools requires an external _"Knowledge Base"_, a database that we can query and update periodically with information.

Specifically, we will need to retrieve information that is relevant to our queries. To do this we need to use _"dense vector embeddings"_. These can be thought of as numerical representations of the *meaning* behind our sentences.

There are many options for creating these dense vectors, like open source [sentence transformers embedding models](https://www.pinecone.io/learn/series/nlp/) or OpenAI's [text-embedding-3-small model](https://platform.openai.com/docs/models/text-embedding-3-small). We will use OpenAI's offering in this example.

### Before you begin...

Be sure to grab a [free Pinecone account](https://app.pinecone.io/?sessionType=signup) and an OpenAI API key, [located here](https://platform.openai.com/api-keys)!

In [2]:
## Getting our Dataset: 

# These are markdown versions of our release notes from 2025 and 2024
release_notes_2025 = "https://docs.pinecone.io/release-notes/2025.md"
release_notes_2024 = "https://docs.pinecone.io/release-notes/2024.md"


## Preprocessing our data

We'll use Requests and LangChain to pull down the release notes, and process the associated Markdown. We've used splitters that correspond to the release year, month/year, and features. 
This will take us from raw Markdown files to LangChain Documents, which we'll embed and store in Pinecone.

In [3]:
# We'll grab these urls and parse them using LangChain's textsplitter for markdown
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_core.documents import Document
import requests

splitter = MarkdownHeaderTextSplitter(headers_to_split_on=[("#", "release"), ("##", "month_year"), ("###", "feature")])

def download_link(url):
    response = requests.get(url)
    response.raise_for_status()
    return response.text


def add_document_metadata(doc, new_metadata):
    # returns new documents with updated metadata
    old_metadata = doc.metadata
    new_metadata = {**old_metadata, **new_metadata}
    return Document(page_content=doc.page_content, metadata=new_metadata)


def preprocess_pinecone_docs(urls):

    pinecone_docs = []
    for url in urls:
        # download the markdown
        response = download_link(url)
        split_text = splitter.split_text(response)
        # Update metadata to include url as source  
        split_text = [add_document_metadata(doc, {"source": url, "chunk_num": num}) for num, doc in enumerate(split_text)]
        pinecone_docs.extend(split_text)
    return pinecone_docs


pinecone_docs = preprocess_pinecone_docs([release_notes_2024,release_notes_2025])

Let's take a closer look at one of these notes

In [4]:
print("Document content: ", pinecone_docs[2].page_content)
print("Document metadata: ", pinecone_docs[2].metadata)

Document content:  Pinecone Assistant can now [return a JSON response](/guides/assistant/chat-with-assistant#json-response).  
***  
You can now [create an assistant](/reference/api/2025-01/assistant/create_assistant) in the `eu` region.
</Update>  
<Update label="2024-12-17" tags={["Database"]}>
Document metadata:  {'release': '2024 releases', 'month_year': 'December 2024', 'feature': 'Pinecone Assistant JSON mode and EU region deployment', 'source': 'https://docs.pinecone.io/release-notes/2024.md', 'chunk_num': 2}


In [5]:
print("Document content: ", pinecone_docs[-1].page_content)
print("Document metadata: ", pinecone_docs[-1].metadata)

Document content:  Released [`v2.2.0`](https://github.com/pinecone-io/go-pinecone/releases/tag/v2.2.0) of the [Pinecone Go SDK](/reference/go-sdk). This version adds support for [index tags](/guides/manage-data/manage-indexes#configure-index-tags) when creating or configuring indexes.
</Update>
Document metadata:  {'release': '2025 releases', 'month_year': 'January 2025', 'feature': 'Released Go SDK v2.2.0', 'source': 'https://docs.pinecone.io/release-notes/2025.md', 'chunk_num': 61}


## Setting up Pinecone


Next, we'll setup our API keys. For notebooks in Colab environments, we've included a handy block that helps set a Pinecone API key in your environment. In all other contexts, it's sufficient to save your Pinecone and OpenAI keys in your local environment. 

Run the next two blocks and enter your Pinecone and OpenAI keys as needed:

In [6]:

import os
from getpass import getpass

def get_pinecone_api_key():
    """
    Get Pinecone API key from environment variable or prompt user for input.
    Returns the API key as a string.

    Only necessary for notebooks. When using Pinecone yourself, 
    you can use environment variables or the like to set your API key.
    """
    api_key = os.environ.get("PINECONE_API_KEY")
    
    if api_key is None:
        try:
            # Try Colab authentication if available
            from pinecone_notebooks.colab import Authenticate
            Authenticate()
            # If successful, key will now be in environment
            api_key = os.environ.get("PINECONE_API_KEY")
        except ImportError:
            # If not in Colab or authentication fails, prompt user for API key
            print("Pinecone API key not found in environment.")
            api_key = getpass("Please enter your Pinecone API key: ")
            # Save to environment for future use in session
            os.environ["PINECONE_API_KEY"] = api_key
    
    return api_key

PINECONE_API_KEY = get_pinecone_api_key()


Pinecone API key not found in environment.


## Setup OpenAI API Key



In [7]:
def get_openai_api_key():
    """
    Get OpenAI API key from environment variable or prompt user for input.
    Returns the API key as a string.
    """

    api_key = os.environ.get("OPENAI_API_KEY")
    
    if api_key is None:
        try:
            api_key = getpass("Please enter your OpenAI API key: ")
            # Save to environment for future use in session
            os.environ["OPENAI_API_KEY"] = api_key
        except Exception as e:
            print(f"Error getting OpenAI API key: {e}")
            return None
    
    return api_key

In [8]:
OPENAI_API_KEY = get_openai_api_key()

## Setting up Pinecone Index
We'll instantiate a Pinecone client, and create an index with a few key properties:
- index_name, which identifies our index
- dimension, which corresponds to the OpenAI embedding model vector size we'll use
- metric, which corresponds to the way "closeness" is evaluated with our vectors
- a spec, which determines the kind of index we are setting up. In this case, it's a free tier Pinecone serverless index

In [9]:
from pinecone import Pinecone

pc = Pinecone(
        api_key=PINECONE_API_KEY,
        # You can remove this parameterfor your own projects
        source_tag="pinecone_examples:docs:langchain_retrieval_augmentation"
    )


In [10]:
index_name = "langchain-pinecone-rag"
from pinecone import ServerlessSpec

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        # dimension of the vector embeddings produced by OpenAI's text-embedding-3-small
        dimension=1536,
        metric="cosine",
        # parameters for the free tier index
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

# Initialize index client
index = pc.Index(name=index_name)

# View index stats
index.describe_index_stats()


{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

## Embedding our documents and upserting into Pinecone

Next, we'll setup our OpenAI embedding model and Pinecone vector database within LangChain. To do this, we import the related abstractions from LangChain and pass in our API keys and model names.

After that, we generate ids for each document-chunk to better manage them within Pinecone.

In [11]:
[doc.metadata for doc in pinecone_docs]

[{'release': '2024 releases',
  'month_year': 'December 2024',
  'source': 'https://docs.pinecone.io/release-notes/2024.md',
  'chunk_num': 0},
 {'release': '2024 releases',
  'month_year': 'December 2024',
  'feature': 'Increased namespaces limit',
  'source': 'https://docs.pinecone.io/release-notes/2024.md',
  'chunk_num': 1},
 {'release': '2024 releases',
  'month_year': 'December 2024',
  'feature': 'Pinecone Assistant JSON mode and EU region deployment',
  'source': 'https://docs.pinecone.io/release-notes/2024.md',
  'chunk_num': 2},
 {'release': '2024 releases',
  'month_year': 'December 2024',
  'feature': 'Released Spark-Pinecone connector v1.2.0',
  'source': 'https://docs.pinecone.io/release-notes/2024.md',
  'chunk_num': 3},
 {'release': '2024 releases',
  'month_year': 'December 2024',
  'feature': 'New integration with HoneyHive',
  'source': 'https://docs.pinecone.io/release-notes/2024.md',
  'chunk_num': 4},
 {'release': '2024 releases',
  'month_year': 'December 2024',


In [None]:

from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

vector_store = PineconeVectorStore(index=index, embedding=embeddings)


# do url_title, chunk_num to enable subscriptable hashing and replacement

def clean_url_for_title(url):
    # grabs the end of the url minus .md
    return url.split("/")[-1].replace(".md", "")

# Here, we follow a schema that puts the document name, and the chunk number together, like doc1#chunk1

def generate_ids(doc_chunk):

    title = clean_url_for_title(doc_chunk.metadata['source'])
    chunk_num = doc_chunk.metadata['chunk_num']
    feature = doc_chunk.metadata['feature'] if 'feature' in doc_chunk.metadata else "na"
    return f"release_{title}#feature_{feature}#chunk_num{chunk_num}"

ids = [generate_ids(doc) for doc in pinecone_docs]


# To learn more, look here: https://docs.pinecone.io/guides/index-data/data-modeling

vector_store.add_documents(documents=pinecone_docs, ids=ids)


['release_2024#feature_na#chunk_num0',
 'release_2024#feature_Increased namespaces limit#chunk_num1',
 'release_2024#feature_Pinecone Assistant JSON mode and EU region deployment#chunk_num2',
 'release_2024#feature_Released Spark-Pinecone connector v1.2.0#chunk_num3',
 'release_2024#feature_New integration with HoneyHive#chunk_num4',
 'release_2024#feature_Released Python SDK v5.4.2#chunk_num5',
 'release_2024#feature_Launch week: Pinecone Local#chunk_num6',
 'release_2024#feature_Launch week: Enhanced security and access controls#chunk_num7',
 'release_2024#feature_Launch week: `pinecone-rerank-v0` and `cohere-rerank-3.5` on Pinecone Inference#chunk_num8',
 'release_2024#feature_Launch week: Integrated Inference#chunk_num9',
 'release_2024#feature_Released .NET SDK v2.1.0#chunk_num10',
 'release_2024#feature_Improved batch deletion guidance#chunk_num11',
 'release_2024#feature_Launch week: Released `pinecone-sparse-english-v0`#chunk_num12',
 'release_2024#feature_na#chunk_num13',
 're

## Bringing it all together: Using OpenAI to learn about our releases

Finally, we'll setup an OpenAI LLM endpoint to generate responses given a user's query about our release notes!

In [13]:
from langchain.chat_models import init_chat_model


# Note that as of 08-12-2025, the cutoff for GPT-5 is Sept 29, 2024.
# If you don't have access yet, gpt-4o-mini works great!

llm = init_chat_model("gpt-5", model_provider="openai")

Next, let's run a query and retrieve some documents. These will be what is ultimately passed to our LLM that uses Pinecone to answer queries. 

In [20]:
# OpenAI models will be unable to answer this due to training cutoffs in 2023-2024

query = "Tell me about version 7.0 of the Pinecone Python SDK"

retrieved_docs = vector_store.similarity_search(query, k=5)
docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

We can peek at the retrieved documents to confirm up to date information is being passed in:

In [21]:
for num, d in enumerate(retrieved_docs):
    print(f"Doc number: {num+1}")
    print(d.page_content)
    print("Metadata:")
    print(d.metadata)
    print("-"*100)

Doc number: 1
Released [`v7.0.1`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.0.1) and [`v7.0.2`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.0.2) of the [Pinecone Python SDK](/reference/python-sdk). These versions fix minor bugs discovered since the release of the `v7.0.0` major version.
</Update>  
<Update label="2025-05-29" tags={["SDK"]}>
Metadata:
{'chunk_num': 18.0, 'feature': 'Released Python SDK v7.0.1 and v7.0.2', 'month_year': 'May 2025', 'release': '2025 releases', 'source': 'https://docs.pinecone.io/release-notes/2025.md'}
----------------------------------------------------------------------------------------------------
Doc number: 2
Released [`v7.1.0`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.1.0), [`v7.2.0`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.2.0), and [`v7.3.0`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.3.0) of the [Pinecone Py

### Comparing responses with/without RAG

Without our RAG pipeline with release notes indexed, OpenAI models will be
unable to answer questions about new versions of Pinecone. They may even "hallucinate", or fabricate information about the versions that may not exist!

In [22]:
print(llm.invoke(query).content)

I don’t have release notes for Pinecone’s Python SDK version 7.0 in my training data (my knowledge goes up to Oct 2024), and I don’t want to guess. A couple of quick clarifications and ways to get the exact details:

Questions
- Do you mean the “pinecone” package or “pinecone-client”? They are different packages, and their version numbers may not match.
- Are you looking for new features, breaking changes, or migration guidance?

How to find the v7.0 details right now
- PyPI release notes:
  - pinecone: https://pypi.org/project/pinecone/
  - pinecone-client: https://pypi.org/project/pinecone-client/
- GitHub changelog (pinecone-io org): look for the repository of the package you use (commonly “pinecone-python”) and open CHANGELOG.md or Releases.
- Docs: https://docs.pinecone.io (search “Python SDK” and “migration guide”).

Commands to check and upgrade locally
- Check which package/version you have:
  - pip show pinecone
  - pip show pinecone-client
  - python -c "import pinecone, sys;

However, with our pipeline in place, we get an answer that is more likely to be correct, and definitely grounded. 

In [23]:
prompt = f'''You are an assistant that answers questions exclusively about the 
Pinecone SDK release notes:

Here's a question: {query}

Here's some context from the release notes:

{docs_content}

Answer:
'''

# This will take a few seconds to run, due to the generation of the response from OpenAI
answer = llm.invoke(prompt)

print(answer.content)

Here’s what’s in the Pinecone Python SDK 7.0 line:

- v7.0.0 (2025-06-16): Moves to API version 2025-04 and adds:
  - Creating and managing backups
  - Restoring indexes from backups
  - Listing and describing Pinecone-hosted embedding and reranking models
  - Creating Bring Your Own Cloud (BYOC) indexes
  - pinecone-plugin-assistant included by default (no separate install needed)
- v7.0.1 and v7.0.2: Patch releases that fix minor bugs discovered after v7.0.0.


## Wrapping it all in a function

In [33]:
def generate_response(query, use_pinecone=False):
    # Function to easily generate a response with and without Pinecone data

    if use_pinecone:
        retrieved_docs = vector_store.similarity_search(query, k=5)
        docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)
        prompt = f'''
        You are an assistant that answers questions exclusively about the 
        Pinecone SDK release notes:

        Here's a question: {query}

        Here's some context from the release notes:

        {docs_content}

        Answer: '''

        #print out retrieved documents
        print("Retrieved documents:.....")
        for num, d in enumerate(retrieved_docs):
            print(f"Doc number: {num+1}")
            print(d.page_content)
            print("Metadata:")
            print(d.metadata)
            print("-"*100)
        print("Chatbot response:.....")

        return llm.invoke(prompt).content
    else:
        # no context is passed
        return llm.invoke(query).content



In [34]:
query = "Tell me about recent changes to the pinecone-sparse embedding model context window"

print(generate_response(query, use_pinecone=False))

Short answer: Pinecone’s sparse encoder used to be effectively limited to ~512 tokens (BERT-era limit). Recently, Pinecone raised the usable context window by adding server-side chunk-and-merge, so you can pass much longer inputs and get a single sparse vector back. In practice, this means you no longer need to manually chunk long documents just to get a usable sparse embedding.

What changed
- Longer inputs accepted: You can now send texts far beyond 512 tokens; the service splits them internally and merges the term weights into one sparse vector. This improves recall on long documents and simplifies pipelines.
- Smarter merging: The merger reduces double-counting of repeated terms across chunks and normalizes weights, helping stability vs. naive summation.
- Progressive limits: The maximum accepted input length has been increased from the original 512-token ceiling to a much larger cap (the exact hard limit can change as they roll out updates).

How to confirm the current limit
- Che

In [35]:
print(generate_response(query, use_pinecone=True))

Retrieved documents:.....
Doc number: 1
You can now raise the context window for Pinecone's hosted [`pinecone-sparse-english-v0`](/guides/index-data/create-an-index#pinecone-sparse-english-v0) embedding model from `512` to `2048` using the `max_tokens_per_sequence` parameter.
</Update>  
<Update label="2025-07-23" tags={["SDK"]}>
Metadata:
{'chunk_num': 5.0, 'feature': 'Increased context window for `pinecone-sparse-english-v0`', 'month_year': 'July 2025', 'release': '2025 releases', 'source': 'https://docs.pinecone.io/release-notes/2025.md'}
----------------------------------------------------------------------------------------------------
Doc number: 2
Pinecone Inference now supports [`pinecone-sparse-english-v0`](/guides/search/rerank-results#pinecone-sparse-english-v0), Pinecone's sparse embedding model, which estimates the lexical importance of tokens by leveraging their context, unlike traditional retrieval models like BM25, which rely solely on term frequency. This model is in [

## Wrapping up

And that's that! You've successfully implemented retrieval augmented generation with Pinecone, OpenAI and LangChain. Wanna learn more? Try implementing the following:

- Sparse search to enable precise time, date and feature recognition in query results
- Expanding the set of documents to encompass all Pinecone documentaiotn
- Learning how to chunk and process code data, to build your own code assistant

To finish, let's delete our index:

In [36]:
pc.delete_index(name=index_name)

---