[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/gen-qa-openai.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/gen-qa-openai.ipynb)

# Retrieval Augemented Generation with Pinecone, Langchain and OpenAI

#### Fixing LLMs that Hallucinate

In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a generative OpenAI model to generate an answer backed by real data sources. Required installs for this notebook are:

In [138]:
!pip install -qU \
    "langchain[openai]"\
    langchain-text-splitters==0.3.8 \
    langchain-pinecone==0.2.1 \
    pinecone-notebooks==0.1.1

---

## Building a Knowledge Base

Building more reliable LLMs tools requires an external _"Knowledge Base"_, a place where we can store and use to efficiently retrieve information. We can think of this as the external _long-term memory_ of our LLM. Typically, this takes the form of a vector database like Pinecone.

We will need to retrieve information that is semantically related to our queries, to do this we need to use _"dense vector embeddings"_. These can be thought of as numerical representations of the *meaning* behind our sentences.

There are many options for creating these dense vectors, like open source [sentence transformers embedding models](https://pinecone.io/learn/nlp/) or OpenAI's [text-embedding-3-small model](https://platform.openai.com/docs/models/text-embedding-3-small). We will use OpenAI's offering in this example.

### Demo Data: Pinecone Documentation

A great example to use RAG is when augmented LLMs with information that may not exist in their training data. This could private data, internal company information, or data that has been updated post a training cutoff. In our case, many modern LLMs are trained on Pinecone libraries that have since been updated, such as release notes or quickstart guides.

In this example, we'll show the differences in generation from OpenAI's LLMs when asked about implementing Pinecone! We'll orchestrate our RAG workflow using Langchain, a popular framework for AI applications.

In [192]:
## Getting our Dataset: 

release_notes_2025 = "https://docs.pinecone.io/release-notes/2025.md"
release_notes_2024 = "https://docs.pinecone.io/release-notes/2024.md"


In [193]:
# We'll grab these urls and parse them using Langchain's textsplitter for markdown
from langchain_text_splitters import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter
from langchain_core.documents import Document
import requests

splitter = MarkdownHeaderTextSplitter(headers_to_split_on=[("#", "release"), ("##", "month_year"), ("###", "feature")])

def download_link(url):
    response = requests.get(url)
    response.raise_for_status()
    return response.text


def add_document_metadata(doc, new_metadata):
    # returns new documents with updated metadata
    old_metadata = doc.metadata
    new_metadata = {**old_metadata, **new_metadata}
    return Document(page_content=doc.page_content, metadata=new_metadata)


def preprocess_pinecone_docs(urls):

    pinecone_docs = []
    for url in urls:
        # download the markdown
        response = download_link(url)
        split_text = splitter.split_text(response)
        # Update metadata to include url as source  
        split_text = [add_document_metadata(doc, {"source": url, "chunk_num": num}) for num, doc in enumerate(split_text)]
        pinecone_docs.extend(split_text)
    return pinecone_docs


pinecone_docs = preprocess_pinecone_docs([release_notes_2024,release_notes_2025])

Let's take a closer look at one of these rows

In [197]:
print("Document content: ", pinecone_docs[2].page_content)
print("Document metadata: ", pinecone_docs[2].metadata)

Document content:  Pinecone Assistant can now [return a JSON response](/guides/assistant/chat-with-assistant#json-response).  
***  
You can now [create an assistant](/reference/api/2025-01/assistant/create_assistant) in the `eu` region.
</Update>  
<Update label="2024-12-17" tags={["Database"]}>
Document metadata:  {'release': '2024 releases', 'month_year': 'December 2024', 'feature': 'Pinecone Assistant JSON mode and EU region deployment', 'source': 'https://docs.pinecone.io/release-notes/2024.md', 'chunk_num': 2}


In [195]:
print("Document content: ", pinecone_docs[-1].page_content)
print("Document metadata: ", pinecone_docs[-1].metadata)

Document content:  Released [`v2.2.0`](https://github.com/pinecone-io/go-pinecone/releases/tag/v2.2.0) of the [Pinecone Go SDK](/reference/go-sdk). This version adds support for [index tags](/guides/manage-data/manage-indexes#configure-index-tags) when creating or configuring indexes.
</Update>
Document metadata:  {'release': '2025 releases', 'month_year': 'January 2025', 'feature': 'Released Go SDK v2.2.0', 'source': 'https://docs.pinecone.io/release-notes/2025.md', 'chunk_num': 47}


## Setting up Pinecone


In [77]:

import os
from getpass import getpass

def get_pinecone_api_key():
    """
    Get Pinecone API key from environment variable or prompt user for input.
    Returns the API key as a string.

    Only necessary for notebooks. When using Pinecone yourself, 
    you can use environment variables or the like to set your API key.
    """
    api_key = os.environ.get("PINECONE_API_KEY")
    
    if api_key is None:
        try:
            # Try Colab authentication if available
            from pinecone_notebooks.colab import Authenticate
            Authenticate()
            # If successful, key will now be in environment
            api_key = os.environ.get("PINECONE_API_KEY")
        except ImportError:
            # If not in Colab or authentication fails, prompt user for API key
            print("Pinecone API key not found in environment.")
            api_key = getpass("Please enter your Pinecone API key: ")
            # Save to environment for future use in session
            os.environ["PINECONE_API_KEY"] = api_key
    
    return api_key

PINECONE_API_KEY = get_pinecone_api_key()


## Setup OpenAI API Key



In [78]:
def get_openai_api_key():
    """
    Get OpenAI API key from environment variable or prompt user for input.
    Returns the API key as a string.
    """

    api_key = os.environ.get("OPENAI_API_KEY")
    
    if api_key is None:
        try:
            api_key = getpass("Please enter your OpenAI API key: ")
            # Save to environment for future use in session
            os.environ["OPENAI_API_KEY"] = api_key
        except Exception as e:
            print(f"Error getting OpenAI API key: {e}")
            return None
    
    return api_key

In [79]:
OPENAI_API_KEY = get_openai_api_key()

In [80]:
from pinecone import Pinecone

pc = Pinecone(
        api_key=PINECONE_API_KEY,
        # You can remove this parameterfor your own projects
        source_tag="pinecone_examples:docs:langchain_retrieval_augmentation"
    )


In [198]:
index_name = "langchain-pinecone-rag"
from pinecone import ServerlessSpec

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        # dimension of the vector embeddings produced by OpenAI's text-embedding-3-small
        dimension=1536,
        metric="cosine",
        # parameters for the free tier index
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

# Initialize index client
index = pc.Index(name=index_name)

# View index stats
index.describe_index_stats()


{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 18}},
 'total_vector_count': 18}

## Embedding our documents and upserting into Pinecone

In [199]:

from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

vector_store = PineconeVectorStore(index=index, embedding=embeddings)


# do url_title, chunk_num to enable subscriptable hashing and replacement

def clean_url_for_title(url):
    # grabs the end of the url minus .md
    return url.split("/")[-1].replace(".md", "")

# Here, we follow a schema that puts the document name, and the chunk number together, like doc1#chunk1

def generate_ids(docs):
    return f"release{clean_url_for_title(docs.metadata['source'])}#chunk_num{docs.metadata['chunk_num']}"

ids = [generate_ids(doc) for doc in pinecone_docs]


# To learn more, look here: https://docs.pinecone.io/guides/manage-data/manage-document-chunks

vector_store.add_documents(documents=pinecone_docs, ids=ids)


['2024#0',
 '2024#1',
 '2024#2',
 '2024#3',
 '2024#4',
 '2024#5',
 '2024#6',
 '2024#7',
 '2024#8',
 '2024#9',
 '2024#10',
 '2024#11',
 '2024#12',
 '2024#13',
 '2024#14',
 '2024#15',
 '2024#16',
 '2024#17',
 '2024#18',
 '2024#19',
 '2024#20',
 '2024#21',
 '2024#22',
 '2024#23',
 '2024#24',
 '2024#25',
 '2024#26',
 '2024#27',
 '2024#28',
 '2024#29',
 '2024#30',
 '2024#31',
 '2024#32',
 '2024#33',
 '2024#34',
 '2024#35',
 '2024#36',
 '2024#37',
 '2024#38',
 '2024#39',
 '2024#40',
 '2024#41',
 '2025#0',
 '2025#1',
 '2025#2',
 '2025#3',
 '2025#4',
 '2025#5',
 '2025#6',
 '2025#7',
 '2025#8',
 '2025#9',
 '2025#10',
 '2025#11',
 '2025#12',
 '2025#13',
 '2025#14',
 '2025#15',
 '2025#16',
 '2025#17',
 '2025#18',
 '2025#19',
 '2025#20',
 '2025#21',
 '2025#22',
 '2025#23',
 '2025#24',
 '2025#25',
 '2025#26',
 '2025#27',
 '2025#28',
 '2025#29',
 '2025#30',
 '2025#31',
 '2025#32',
 '2025#33',
 '2025#34',
 '2025#35',
 '2025#36',
 '2025#37',
 '2025#38',
 '2025#39',
 '2025#40',
 '2025#41',
 '2025#42',


## Building a chat completion prompt with relevant context

Next, we write some functions to retrieve these relevant contexts from Pinecone and incorporate them into a richer chat completion prompt.

In [200]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

In [205]:
query = "Tell me about versions 7.0 of the Pinecone Python SDK"

retrieved_docs = vector_store.similarity_search(query)
docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

prompt = f'''You are an assistant that answers question exclusively about the Pinecone SDK release notes:

Here's a question: {query}

Here's some context from the release notes:

{docs_content}


Question: {query}

Answer:
'''

answer = llm.invoke(prompt)

Once we're done with the index we can delete our index to save resources:

In [206]:
for d in retrieved_docs:
    print(d.page_content)
    print(d.metadata)
    print("-"*100)

Released [`v7.0.1`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.0.1) and [`v7.0.2`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.0.2) of the [Pinecone Python SDK](/reference/python-sdk). These versions fix minor bugs discovered since the release of the `v7.0.0` major version.
</Update>  
<Update label="2025-05-29" tags={["SDK"]}>
{'chunk_num': 4.0, 'feature': 'Released Python SDK v7.0.1 and v7.0.2', 'month_year': 'May 2025', 'release': '2025 releases', 'source': 'https://docs.pinecone.io/release-notes/2025.md'}
----------------------------------------------------------------------------------------------------
Released [`v7.0.0`](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.0.0) of the [Pinecone Python SDK](/reference/python-sdk). This version uses the latest stable API version, `2025-04`, and includes support for the following:  
* [Creating and managing backups](/guides/manage-data/back-up-an-index)
* [Restor

In [207]:
print(answer.content)

Versions 7.0 of the Pinecone Python SDK include the following releases:

1. **v7.0.0** (Released on May 29, 2025):
   - This major version uses the latest stable API version, `2025-04`.
   - New features include:
     - Support for creating and managing backups.
     - Ability to restore indexes from backups.
     - Listing embedding and reranking models hosted by Pinecone.
     - Getting details about a model hosted by Pinecone.
     - Creating a Bring Your Own Cloud (BYOC) index.
   - The `pinecone-plugin-assistant` package is now included by default, eliminating the need for separate installation.

2. **v7.0.1** and **v7.0.2** (Released after v7.0.0):
   - These versions fix minor bugs discovered since the release of v7.0.0.

For further details, you can reference the links to the release notes for each version:
- [v7.0.0 Release Notes](https://github.com/pinecone-io/pinecone-python-client/releases/tag/v7.0.0)
- [v7.0.1 Release Notes](https://github.com/pinecone-io/pinecone-python-c

In [208]:
print(llm.invoke("Tell about the most recent major feature release in the Pinecone Python SDK").content)

As of my last update in October 2023, the Pinecone Python SDK had undergone several updates and feature releases that improved its usability and functionality. However, for the most recent specific feature release details, I would recommend checking the official Pinecone GitHub repository or the Pinecone documentation site, as these resources would provide the latest updates, feature announcements, and version release notes.

Typically, significant feature releases in SDKs like Pinecone might include improvements in performance, additional methods for data querying, enhanced support for vector similarity searches, or new integrations with other machine learning tools and frameworks. Always refer to the official communication channels for the most accurate and up-to-date information.


In [209]:
pc.delete_index(name=index_name)

---