[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-pinecone/blob/main/examples/embedding-and-reranking.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/langchain-ai/langchain-pinecone/blob/main/examples/embedding-and-reranking.ipynb)

# Embedding and Reranking

This notebook acts as an introduction to using the Pinecone vectorstore implementation in LangChain.

Alongside basic usage, we also show you how to build a two-step retrieval pipeline using embedding and rerank models.

Both models are hosted by Pinecone — Pinecone is _not_ solely a vector database. The service can be used to create embeddings that power your vector search, and even rerank results to enhance result precision.

# Install necessary libraries

In [None]:
!pip install -qU \
    "langchain-pinecone==0.2.13" \
    "datasets==4.3.0"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m506.8/506.8 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/81.9 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.3/469.3 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m75.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.6/587.6 kB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.7/47.7 MB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m259.3/259.3 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Load dataset
For this demo we will use jamescalam/ai-arxiv2-semantic-chunks which research paper data chunked

In [31]:
# Download a pre-chunked dataset using the 'datasets' library
from datasets import load_dataset

# Load a public dataset containing semantic chunks of AI research papers
data = load_dataset(
    "jamescalam/ai-arxiv2-semantic-chunks",
    split="train[:100]",  # Load only the first 100 samples for demo purposes
)

print(f"Loaded {len(data)} documents.")

# Preview the first 3 documents to understand the structure
for i, doc in enumerate(data.select(range(3))):
    print(f"Document {i + 1}: {doc['content'][:200]}\n---\n")

Loaded 100 documents.
Document 1: 4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Sin
---

Document 2: Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/mixtral-of-experts/ # Introduction In this paper, we present Mixtral 8x7B, a sparse mixture of experts model (SMoE) with
---

Document 3: expertsâ ) to process the token and combine their output additively. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction 
---



Now we transform the dataset to match the expected format for Pinecone and LangChain. That means mapping each document to have `id`, `page_content`, and `metadata` fields.


In [32]:
# Transform the dataset to match the expected format for Pinecone and LangChain
# Map each document to have 'id', 'page_content', and 'metadata' fields
data = data.map(
    lambda x: {
        "id": x["id"],
        "page_content": x["content"],
        "metadata": {
            "title": str(x["title"]),
            "prechunk_id": str(x["prechunk_id"]),
            "postchunk_id": str(x["postchunk_id"]),
            "arxiv_id": str(x["arxiv_id"]),
        },
    }
)
# Remove columns that are no longer needed after mapping
data = data.remove_columns(
    ["title", "content", "prechunk_id", "postchunk_id", "arxiv_id", "references"]
)
data  # Display the processed dataset

Dataset({
    features: ['id', 'page_content', 'metadata'],
    num_rows: 100
})

## Create Pinecone Index
A Pinecone index is a data structure that stores vector embeddings and allows for efficient similarity search. Before you can store or query embeddings, you need to create an index with the appropriate configuration (such as dimension and metric).

The following code will connect to Pinecone using your API key, check if an index with the specified name exists, and create it if necessary. This step is essential for managing and querying your vector data.

In [33]:
# Import Pinecone classes and utilities for index management
from pinecone import ServerlessSpec, Pinecone
import os
import getpass

# Retrieve Pinecone API key from environment or prompt user if not set
os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY") or getpass.getpass(
    "Enter your Pinecone API key: "
)
# Initialize Pinecone client
pc = Pinecone()

# Define serverless deployment specification (cloud provider and region)
spec = ServerlessSpec(
    cloud="aws",
    region="us-west-2",  # You can change region as needed
)

Before initializing our vector store, let's connect to a Pinecone index. If one named index_name doesn't exist, it will be created.

In [34]:
import time

index_name = "langchain-embedding-and-reranking"
# List all existing indexes in your Pinecone project
existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

# Check if the index already exists; if not, create it
if index_name not in existing_indexes:
    # Create a new index with specified dimension and metric
    pc.create_index(
        index_name,
        dimension=1024,  # Must match embedding output size
        metric="dotproduct",  # Similarity metric
        spec=spec,
    )

# Connect to the index for further operations
index = pc.Index(index_name)
# View index statistics to confirm connection
index.describe_index_stats()

{'dimension': 1024,
 'index_fullness': 0.0,
 'metric': 'dotproduct',
 'namespaces': {},
 'total_vector_count': 0,
 'vector_type': 'dense'}

## Let's define the embedding engine
An embedding engine converts text data into high-dimensional vectors that capture semantic meaning. These vectors are used for similarity search in Pinecone.

In the next code cell, we initialize the embedding model that will be used to generate vector representations for our documents. You can choose from several supported models depending on your use case.

In [35]:
# Import PineconeEmbeddings to generate vector representations of text
from langchain_pinecone.embeddings import PineconeEmbeddings

# Initialize the embedding model (default: multilingual-e5-large)
# You can specify a different model if needed
embedder = PineconeEmbeddings()

**Tip:** Before using any embedding model, you can call `list_supported_models()` to see all available models.

In [36]:
# List all supported embedding models for PineconeEmbeddings
PineconeEmbeddings().list_supported_models()

{
    "models": [
        {
            "model": "llama-text-embed-v2",
            "short_description": "A high performance dense embedding model optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 2048 tokens) and dynamic embedding size (Matryoshka Embeddings).",
            "type": "embed",
            "supported_parameters": [
                {
                    "parameter": "input_type",
                    "type": "one_of",
                    "value_type": "string",
                    "required": true,
                    "allowed_values": [
                        "query",
                        "passage"
                    ]
                },
                {
                    "parameter": "truncate",
                    "type": "one_of",
                    "value_type": "string",
                    "required": false,
                    "default": "END",
                    "allowed_values": [
      

## Building an Index
After creating a Pinecone index and initializing the embedding engine, the next step is to build a vector store. A vector store is an interface that allows you to add, search, and manage vectorized documents within your Pinecone index.

The following code demonstrates how to create a vector store using the Pinecone index and embedding engine.

In [37]:
# Import PineconeVectorStore to manage vector data in Pinecone
from langchain_pinecone import PineconeVectorStore

# Create a vector store using the connected index and embedding engine
vector_store = PineconeVectorStore(index=index, embedding=embedder)

## Manage vector store
Managing a vector store involves adding, updating, and deleting documents. Once your vector store is set up, you can interact with it to store new documents, remove outdated ones, or update existing entries.

The next code cell shows how to add documents to your Pinecone vector store, preparing them for efficient similarity search and retrieval.

In [38]:
# Import utilities for document creation and unique IDs
from uuid import uuid4

from langchain_core.documents import Document

documents = []
for i in data:
    # Print document details for debugging and inspection
    print(f"Processing document {i['id']}...")
    print(
        "page_content:", i["page_content"][:100], "..."
    )  # Preview first 100 characters
    print("metadata:", i["metadata"])
    print("metadata:", type(i["metadata"]))
    # Create a Document object for each entry
    documents.append(
        Document(
            page_content=str(data["page_content"]),
            metadata=data["metadata"]
            if isinstance(data["metadata"], dict)
            else data["metadata"][0],
        )
    )

# Generate unique IDs for each document
uuids = [str(uuid4()) for _ in range(len(data))]
# Add documents to the Pinecone vector store
vector_store.add_documents(documents=documents, ids=uuids)

Processing document 2401.04088#0...
page_content: 4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jia ...
metadata: {'arxiv_id': '2401.04088', 'postchunk_id': '2401.04088#1', 'prechunk_id': '', 'title': 'Mixtral of Experts'}
metadata: <class 'dict'>
Processing document 2401.04088#1...
page_content: Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/mixtral-of-experts/  ...
metadata: {'arxiv_id': '2401.04088', 'postchunk_id': '2401.04088#2', 'prechunk_id': '2401.04088#0', 'title': 'Mixtral of Experts'}
metadata: <class 'dict'>
Processing document 2401.04088#2...
page_content: expertsâ ) to process the token and combine their output additively. This technique increases the nu ...
metadata: {'arxiv_id': '2401.04088', 'postchunk_id': '2401.04088#3', 'prechunk_id': '2401.04088#1', 'title': 'Mixtral of Experts'}
metadata: <class 'dict'>
Processing document 2401.04088#3...
page_content: Instruct, a chat model fine-t

['6a492517-50f7-4c7a-a010-7366fcd0d427',
 '5fc47369-c665-4049-847c-207dc575773b',
 '7a33b3f0-28d2-420f-8e55-22b16011c6a7',
 '56810ccb-2b90-4466-bbe8-b6b8f4bb9d8d',
 'f7369f1d-ff07-4882-b729-e4a5958cd0a4',
 'f6374c0b-5147-41a8-8373-e23703b8d687',
 'bd9c27f0-373b-4362-a255-84e7a05ba10d',
 'c5c4a097-2135-4e68-bee2-5073dc19f7a0',
 'a052acf0-c241-4a40-aa81-7b0066e9a7a2',
 '1364dc05-b4de-4aee-82fd-711ec6bac5a9',
 '42cc7a97-5ef5-4be1-adf0-e49320f9dc32',
 '35a620ad-a794-4a96-9a7a-3ebdb26603b4',
 '275d9335-98dc-4ddf-8362-de616e0b932a',
 '36920e8f-dc23-4482-a0f5-9f8eba1b395f',
 '2a1631fa-f26f-433c-95b4-700991e0f044',
 '860bad44-ea46-4d23-b5b1-d2c4c8664b26',
 'cf3236f0-040f-43ca-8ab1-5f2ecedc9e88',
 'c9985e0e-8f96-49d7-b40c-bc0632e77783',
 '7701bd23-9382-449a-98ea-270ce7738aa2',
 '1733d9de-a806-4c25-8009-465e61f14ff4',
 'd3145b3c-2c07-4144-9cc3-38d602b0642c',
 '40dcefe8-66a1-488c-8f43-f0510ec91957',
 '24f5f6d5-a251-4cf3-aaff-1cd53f392344',
 'fa06c7cf-3e13-4353-a925-2ca148c25907',
 '132d9da7-bdb4-

## Retrieval
Retrieval is the process of searching for documents in your vector store that are most similar to a given query. This is done using vector similarity search, which finds documents whose embeddings are closest to the query embedding.

The following section demonstrates how to retrieve documents using similarity search. First, we do _without_ reranking.

In [39]:
# Define a function to retrieve documents from the vector store using similarity search
def get_docs(query: str, top_k: int) -> list[str]:
    # Perform similarity search for the query
    docs = vector_store.similarity_search(query, k=top_k)
    print(f"Found {len(docs)} documents for query '{query}'")
    response_content = []
    for doc in docs:
        # Format the output for each document
        response_content.append(
            f"""
                Arxiv ID: {doc.metadata.get("arxiv_id", "N/A")}
                Title: {doc.metadata.get("title", "")}
                Content: {doc.page_content}
            """
        )

    return response_content

In [40]:
# Example query to retrieve documents about Mistral LLM
query = "can you tell me about mistral LLM?"
# Retrieve top 5 documents using similarity search
docs = get_docs(query, top_k=5)
# Print the retrieved documents
print("\n---\n".join(docs))

Found 5 documents for query 'can you tell me about mistral LLM?'

                Arxiv ID: 2401.04088
                Title: Mixtral of Experts
                Content: Column(['4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, LÃ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, ThÃ©ophile Gervet, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects t

### Adding Reranking
Reranking is a process that improves the relevance of search results by reordering them based on a specialized model. After retrieving documents using vector similarity, reranking models (such as BGE or Pinecone's default) analyze the results and score them according to their relevance to the query.

This step is especially useful when you want to ensure that the most relevant documents appear at the top, even if the initial vector search returns many similar items.

In the following code, we show how to initialize a reranker and use it to enhance your search results.

In [41]:
# Import PineconeRerank to improve search result relevance
from langchain_pinecone import PineconeRerank

# Initialize the reranker (default model or specify one)
# reranker = PineconeRerank(model="bge-reranker-v2-m3")
reranker = PineconeRerank()

**Tip:** Before using any rerank model, you can call `list_supported_models()` to see all available models.

In [42]:
# List all supported rerank models for PineconeRerank
PineconeRerank().list_supported_models()

{
    "models": [
        {
            "model": "bge-reranker-v2-m3",
            "short_description": "A high-performance, multilingual reranking model that works well on messy data and short queries expected to return medium-length passages of text (1-2 paragraphs)",
            "type": "rerank",
            "supported_parameters": [
                {
                    "parameter": "truncate",
                    "type": "one_of",
                    "value_type": "string",
                    "required": false,
                    "default": "NONE",
                    "allowed_values": [
                        "END",
                        "NONE"
                    ]
                }
            ],
            "modality": "text",
            "max_sequence_length": 1024,
            "max_batch_size": 100,
            "provider_name": "BAAI",
            "supported_metrics": []
        },
        {
            "model": "cohere-rerank-3.5",
            "short_description": "Coh

In [43]:
# Define a function to retrieve and rerank documents for improved relevance
def get_docs_rerank(query: str) -> list[str]:
    # Retrieve more documents than needed for reranking
    docs = vector_store.similarity_search(query, k=25)
    # Rerank the retrieved documents and select the top N
    top5_docs = reranker.rerank(docs, query, top_n=5)
    print(f"Found {len(top5_docs)} documents for query '{query}'")
    response_content = []
    for doc in top5_docs:
        # Format the output for each reranked document
        response_content.append(
            f"""
                Score: {doc.get("score", "0")}
                Arxiv ID: {doc.get("document").get("arxiv_id", "N/A")}
                Title: {doc.get("document").get("title", "")}
                Content: {doc.get("document").get("text", "N/A")}
            """
        )

    return response_content

In [44]:
# Example query to retrieve and rerank documents about Mistral LLM
query = "can you tell me about mistral LLM?"
# Retrieve top 5 documents using reranking
docs = get_docs_rerank(query)
# Print the reranked documents
print("\n---\n".join(docs))

Found 5 documents for query 'can you tell me about mistral LLM?'

                Score: 0.055823144
                Arxiv ID: 2401.04088
                Title: Mixtral of Experts
                Content: Column(['4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, LÃ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, ThÃ©ophile Gervet, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at eac

## Async Client

This section demonstrates how to use the async API for embedding, upserting, retrieval, and reranking. Async is recommended for best performance in production and large-scale scenarios.

### Async Embedding and Upsert

This section demonstrates how to use Pinecone's async API for embedding and upserting documents. Async operations are recommended for production and large-scale scenarios, as they provide better performance and scalability compared to synchronous calls.

- **Embedding:** Converts text data into vector representations using the selected model.
- **Upsert:** Adds or updates documents in the Pinecone index asynchronously.

You should use async workflows when working with large datasets or when you need non-blocking operations in your application.

In [45]:
from pinecone import PineconeAsyncio

index_name = "langchain-embedding-and-reranking-async"

# Use async context manager to interact with Pinecone
async with PineconeAsyncio() as pc:
    # Create index if it does not exist
    if not await pc.has_index(index_name):
        await pc.create_index(
            name=index_name,
            dimension=1024,
            metric="dotproduct",
            spec=ServerlessSpec(
                cloud="aws",
                region="us-west-2"
            ),
        )
    # Get index description and host info
    pc_desc = await pc.describe_index(name=index_name)
    print(pc_desc)

{'deletion_protection': 'disabled',
 'dimension': 1024,
 'host': 'langchain-embedding-and-reranking-async-swepyyp.svc.apw5-4e34-81fa.pinecone.io',
 'metric': 'dotproduct',
 'name': 'langchain-embedding-and-reranking-async',
 'spec': {'serverless': {'cloud': 'aws', 'region': 'us-west-2'}},
 'status': {'ready': True, 'state': 'Ready'},
 'tags': None,
 'vector_type': 'dense'}


We use the async context manager for index operations like so:

In [46]:
async with pc.IndexAsyncio(host=pc_desc.host) as idx:
    aembedder = PineconeEmbeddings()
    vector_store = PineconeVectorStore(index=idx, embedding=aembedder)
    documents = []
    for i in data:
        # Create Document objects for each entry
        documents.append(
            Document(
                page_content=str(i["page_content"]),
                metadata=i["metadata"]
                if isinstance(i["metadata"], dict)
                else i["metadata"][0],
            )
        )
    # Generate unique IDs and upsert documents asynchronously
    uuids = [str(uuid4()) for _ in range(len(data))]
    # Use async context manager for vector_store to ensure proper session cleanup
    async with vector_store:
        await vector_store.aadd_documents(documents=documents, ids=uuids)

### Async Retrieval and Reranking

Now let's see how to retrieve documents and apply reranking using Pinecone's async client. Async retrieval is ideal for applications that require high throughput and responsiveness while avoiding blocking a full thread — making this ideal for agentic AI applications.

In [47]:
# Async example: retrieval and reranking for best performance
# Create a managed PineconeAsyncio client for the reranker to avoid unclosed client sessions
async with PineconeAsyncio() as pc_rerank:
    async with pc.IndexAsyncio(host=pc_desc.host) as idx:
        aembedder = PineconeEmbeddings()
        # Pass the managed async client to PineconeRerank to ensure proper cleanup
        reranker = PineconeRerank(async_client=pc_rerank)
        vector_store = PineconeVectorStore(index=idx, embedding=aembedder)
        query = "can you tell me about mistral LLM?"
        # Use async context manager for vector_store to ensure proper session cleanup
        async with vector_store:
            # Retrieve documents asynchronously
            docs = await vector_store.asimilarity_search(query, k=25)
            # Rerank the retrieved documents asynchronously
            top5_docs = await reranker.arerank(docs, query, top_n=5)
            print(f"Found {len(top5_docs)} documents for query '{query}'")
            response_content = []
            for doc in top5_docs:
                # Format and print each document's details
                response_content.append(
                    f"""
                    Arxiv ID: {doc["document"].get("arxiv_id", "N/A")}
                    Title: {doc["document"].get("title", "")}
                    Content: {doc["document"].get("text")}
                """
                )
            print("\n---\n".join(response_content))

Found 5 documents for query 'can you tell me about mistral LLM?'

                    Arxiv ID: 2401.04088
                    Title: Mixtral of Experts
                    Content: We compare Mixtral to Llama, and re-run all benchmarks with our own evaluation pipeline for fair comparison. We measure performance on a wide variety of tasks categorized as follow: â ¢ Commonsense Reasoning (0-shot): Hellaswag [32], Winogrande [26], PIQA [3], SIQA [27], OpenbookQA [22], ARC-Easy, ARC-Challenge [8], CommonsenseQA [30] World Knowledge (5-shot): NaturalQuestions [20], TriviaQA [19] â ¢ Reading Comprehension (0-shot): BoolQ [7], QuAC [5] â ¢ Math: GSM8K [9] (8-shot) with maj@8 and MATH [17] (4-shot) with maj@4 â ¢ Code: Humaneval [4] (0-shot) and MBPP [1] (3-shot) â ¢ Popular aggregated results: MMLU [16] (5-shot), BBH [29] (3-shot), and AGI Eval [34] (3-5-shot, English multiple-choice questions only) 80 SE Mistral 78 = LLaMA27B = Sl LLaMA134B, jam Mistral 78 = LlaMA27B Ss LLAMA 1348, cee Mixt

That covers everything we need for using the Pinecone vectorstore in LangChain.

---