# Load the enviroment variables from .env file
## Credentials
Create a new Pinecone account, or sign into your existing one, and create an API key to use in this notebook.

In [None]:
# Import the load_dotenv function to read environment variables from a .env file
from dotenv import load_dotenv

# Load environment variables (such as API keys) from the .env file in the current directory.
load_dotenv()

True

# Embedding and Reranking

Pinecone is not solely a vector database. The service can be used to create embeddings that power your vector search, and even rerank results to enhance result precision.

# Install necessary libraries

In [None]:
# Install required libraries for Pinecone integration and dataset loading
# langchain-pinecone: Pinecone integration for LangChain
# datasets: HuggingFace datasets library for loading and processing datasets
!pip install -qU \
    "langchain-pinecone==0.2.12" \
    "datasets==4.0.0"

# Load dataset
For this demo we will use jamescalam/ai-arxiv2-semantic-chunks which research paper data chunked

In [None]:
# Download a pre-chunked dataset using the 'datasets' library
from datasets import load_dataset

# Load a public dataset containing semantic chunks of AI research papers
data = load_dataset(
    "jamescalam/ai-arxiv2-semantic-chunks",
    split="train[:100]" # Load only the first 100 samples for demo purposes
)

print(f"Loaded {len(data)} documents.")

# Preview the first 3 documents to understand the structure
for i, doc in enumerate(data.select(range(3))):
    print(f"Document {i+1}: {doc["content"][:200]}\n---\n")

Loaded 100 documents.
Document 1: 4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Sin
---

Document 2: Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/mixtral-of-experts/ # Introduction In this paper, we present Mixtral 8x7B, a sparse mixture of experts model (SMoE) with
---

Document 3: expertsâ ) to process the token and combine their output additively. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction 
---



In [None]:
# Transform the dataset to match the expected format for Pinecone and LangChain
# Map each document to have 'id', 'page_content', and 'metadata' fields
data = data.map(lambda x: {
    "id": x["id"],
    "page_content": x["content"],
    "metadata": {
        "title": str(x["title"]),
        "prechunk_id": str(x["prechunk_id"]),
        "postchunk_id": str(x["postchunk_id"]),
        "arxiv_id": str(x["arxiv_id"]),
    }
})
# Remove columns that are no longer needed after mapping
data = data.remove_columns([
    "title", "content", "prechunk_id",
    "postchunk_id", "arxiv_id", "references"
])
data # Display the processed dataset

Dataset({
    features: ['id', 'page_content', 'metadata'],
    num_rows: 100
})

## Create Pinecone Index
A Pinecone index is a data structure that stores vector embeddings and allows for efficient similarity search. Before you can store or query embeddings, you need to create an index with the appropriate configuration (such as dimension and metric).

The following code will connect to Pinecone using your API key, check if an index with the specified name exists, and create it if necessary. This step is essential for managing and querying your vector data.

In [None]:
# Import Pinecone classes and utilities for index management
from pinecone import ServerlessSpec, Pinecone
import os
import getpass

# Retrieve Pinecone API key from environment or prompt user if not set
api_key = os.getenv("PINECONE_API_KEY") or getpass.getpass("Enter your Pinecone API key: ")
# Initialize Pinecone client
pc = Pinecone(api_key=api_key)

# Define serverless deployment specification (cloud provider and region)
spec = ServerlessSpec(
    cloud="aws", region="us-west-2" # You can change region as needed
)


Before initializing our vector store, let's connect to a Pinecone index. If one named index_name doesn't exist, it will be created.

In [None]:
import time

index_name = "rag-embedding-index"
# List all existing indexes in your Pinecone project
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# Check if the index already exists; if not, create it
if index_name not in existing_indexes:
    # Create a new index with specified dimension and metric
    pc.create_index(
        index_name,
        dimension=1024, # Must match embedding output size
        metric='dotproduct', # Similarity metric
        spec=spec
    )
    # Wait for the index to be ready before proceeding
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# Connect to the index for further operations
index = pc.Index(index_name)
time.sleep(1)
# View index statistics to confirm connection
index.describe_index_stats()

{'dimension': 1024,
 'index_fullness': 0.0,
 'metric': 'dotproduct',
 'namespaces': {},
 'total_vector_count': 0,
 'vector_type': 'dense'}

## Let's define the embedding engine
An embedding engine converts text data into high-dimensional vectors that capture semantic meaning. These vectors are used for similarity search in Pinecone.

In the next code cell, we initialize the embedding model that will be used to generate vector representations for our documents. You can choose from several supported models depending on your use case.

In [None]:
# Import PineconeEmbeddings to generate vector representations of text
from langchain_pinecone.embeddings import PineconeEmbeddings

# Initialize the embedding model (default: multilingual-e5-large)
# You can specify a different model if needed
embedder = PineconeEmbeddings()

Setting default config for model: multilingual-e5-large


**Tip:** Before using any embedding model, you can call `list_supported_models()` to see all available models.

In [None]:
# List all supported embedding models for PineconeEmbeddings
PineconeEmbeddings().list_supported_models()

{
    "models": [
        {
            "model": "llama-text-embed-v2",
            "short_description": "A high performance dense embedding model optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 2048 tokens) and dynamic embedding size (Matryoshka Embeddings).",
            "type": "embed",
            "supported_parameters": [
                {
                    "parameter": "input_type",
                    "type": "one_of",
                    "value_type": "string",
                    "required": true,
                    "allowed_values": [
                        "query",
                        "passage"
                    ]
                },
                {
                    "parameter": "truncate",
                    "type": "one_of",
                    "value_type": "string",
                    "required": false,
                    "default": "END",
                    "allowed_values": [
      

## Building an Index
After creating a Pinecone index and initializing the embedding engine, the next step is to build a vector store. A vector store is an interface that allows you to add, search, and manage vectorized documents within your Pinecone index.

The following code demonstrates how to create a vector store using the Pinecone index and embedding engine.

In [None]:
# Import PineconeVectorStore to manage vector data in Pinecone
from langchain_pinecone import PineconeVectorStore

# Create a vector store using the connected index and embedding engine
vector_store = PineconeVectorStore(index=index, embedding=embedder)

## Manage vector store
Managing a vector store involves adding, updating, and deleting documents. Once your vector store is set up, you can interact with it to store new documents, remove outdated ones, or update existing entries.

The next code cell shows how to add documents to your Pinecone vector store, preparing them for efficient similarity search and retrieval.

In [None]:
# Import utilities for document creation and unique IDs
from uuid import uuid4

from langchain_core.documents import Document

documents = []
for i in data:
    # Print document details for debugging and inspection
    print(f"Processing document {i['id']}...")
    print("page_content:", i["page_content"][:100], "...")  # Preview first 100 characters
    print("metadata:", i["metadata"])
    print("metadata:", type(i["metadata"]))
    # Create a Document object for each entry
    documents.append(
            Document(
                page_content=str(data["page_content"]),
                metadata=data["metadata"] if isinstance(data["metadata"], dict) else data["metadata"][0]
            )
        )

# Generate unique IDs for each document
uuids = [str(uuid4()) for _ in range(len(data))]
# Add documents to the Pinecone vector store
vector_store.add_documents(documents=documents, ids=uuids)

Processing document 2401.04088#0...
page_content: 4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jia ...
metadata: {'arxiv_id': '2401.04088', 'postchunk_id': '2401.04088#1', 'prechunk_id': '', 'title': 'Mixtral of Experts'}
metadata: <class 'dict'>
Processing document 2401.04088#1...
page_content: Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/mixtral-of-experts/  ...
metadata: {'arxiv_id': '2401.04088', 'postchunk_id': '2401.04088#2', 'prechunk_id': '2401.04088#0', 'title': 'Mixtral of Experts'}
metadata: <class 'dict'>
Processing document 2401.04088#2...
page_content: expertsâ ) to process the token and combine their output additively. This technique increases the nu ...
metadata: {'arxiv_id': '2401.04088', 'postchunk_id': '2401.04088#3', 'prechunk_id': '2401.04088#1', 'title': 'Mixtral of Experts'}
metadata: <class 'dict'>
Processing document 2401.04088#3...
page_content: Instruct, a chat model fine-t

['2e5b889d-3898-40d7-b5d9-9b13f9b79a5a',
 '6a2b82d8-818a-4fb5-996b-afd39af5cdbf',
 '76c55b18-1691-44a7-b5b8-0f0ee388a4d2',
 'cc78c83c-26ef-4eea-af6a-492b646fd46f',
 '14e06793-5c32-42dd-9a06-3a8e0e0748bf',
 'cb27a503-915b-46ec-b8f2-08a9ccbae627',
 '668d04bb-b466-4477-98e5-f217e52fd132',
 '3775d6f5-0f53-4e4b-800e-a147c7f67541',
 'ba8c8260-a32b-410b-9ab5-4188b2786a0c',
 '358815bf-cffa-4fe9-809d-6cb3be54f0bf',
 '2ae55522-4608-48a3-aefe-f00eca760941',
 '894f223b-5d33-492b-bb8c-698be0c6a6a8',
 '4f072b9d-a2e5-4d4f-a4b5-242286e07e0d',
 '43ac41e6-22a8-4605-b7ad-03567c19d2cb',
 '426494b1-0005-458f-802b-1c6d24c9c36e',
 'dbe3e0e0-4c48-42ad-92bb-fcb715c341c1',
 'dac1f181-45a5-458a-8c99-07796f4b4202',
 '31ca6222-5e44-4a0e-a839-0279cf9e8d2f',
 'd0a34feb-c350-4350-8661-e18719f53a6d',
 '9409ffea-962a-411b-a811-84b11740b0ec',
 'ce5696dc-8675-4499-a02a-f5bcfce352ce',
 'a0bc08df-aa37-4ff8-8ec0-0757da3ecb34',
 'dc6724b6-f35a-42af-a9b3-63a2a03afbbf',
 'f1068a32-59a2-4b22-ae0e-7acbe582437d',
 'ee000b7f-0900-

We can see the index is currently empty with a total_vector_count of 0. We can begin populating it with mistral-embed built embeddings like so:

⚠️ WARNING: Embedding costs for the full dataset as of 3 Jan 2024 is ~$5.70

## Retrieval
Retrieval is the process of searching for documents in your vector store that are most similar to a given query. This is done using vector similarity search, which finds documents whose embeddings are closest to the query embedding.

The following section demonstrates how to retrieve documents using similarity search, both with and without reranking.

TK in here we do retrieval _without_ reranking

In [None]:
# Define a function to retrieve documents from the vector store using similarity search
def get_docs(query: str, top_k: int) -> list[str]:
    # Perform similarity search for the query
    docs = vector_store.similarity_search(query, k=top_k)
    print(f"Found {len(docs)} documents for query '{query}'")
    response_content = []
    for doc in docs:
        # Format the output for each document
        response_content.append(
            f"""
                Arxiv ID: {doc.metadata.get('arxiv_id', 'N/A')}
                Title: {doc.metadata.get('title','')}
                Content: {doc.page_content}
            """
        )

    return response_content

In [None]:
# Example query to retrieve documents about Mistral LLM
query = "can you tell me about mistral LLM?"
# Retrieve top 5 documents using similarity search
docs = get_docs(query, top_k=5)
# Print the retrieved documents
print("\n---\n".join(docs))

Found 5 documents for query 'can you tell me about mistral LLM?'

                Arxiv ID: 2401.04088
                Title: Mixtral of Experts
                Content: Column(['4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, LÃ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, ThÃ©ophile Gervet, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects t

## Adding Reranking

### Adding Reranking
Reranking is a process that improves the relevance of search results by reordering them based on a specialized model. After retrieving documents using vector similarity, reranking models (such as BGE or Pinecone's default) analyze the results and score them according to their relevance to the query.

This step is especially useful when you want to ensure that the most relevant documents appear at the top, even if the initial vector search returns many similar items.

In the following code, we show how to initialize a reranker and use it to enhance your search results.

In [None]:
# Import PineconeRerank to improve search result relevance
from langchain_pinecone import PineconeRerank

# Initialize the reranker (default model or specify one)
# reranker = PineconeRerank(model="bge-reranker-v2-m3")
reranker = PineconeRerank()

**Tip:** Before using any rerank model, you can call `list_supported_models()` to see all available models.

In [None]:
# List all supported rerank models for PineconeRerank
PineconeRerank().list_supported_models()

{
    "models": [
        {
            "model": "bge-reranker-v2-m3",
            "short_description": "A high-performance, multilingual reranking model that works well on messy data and short queries expected to return medium-length passages of text (1-2 paragraphs)",
            "type": "rerank",
            "supported_parameters": [
                {
                    "parameter": "truncate",
                    "type": "one_of",
                    "value_type": "string",
                    "required": false,
                    "default": "NONE",
                    "allowed_values": [
                        "END",
                        "NONE"
                    ]
                }
            ],
            "modality": "text",
            "max_sequence_length": 1024,
            "max_batch_size": 100,
            "provider_name": "BAAI",
            "supported_metrics": []
        },
        {
            "model": "cohere-rerank-3.5",
            "short_description": "Coh

In [None]:
# Define a function to retrieve and rerank documents for improved relevance
def get_docs_rerank(query: str, top_k: int) -> list[str]:
    # Retrieve more documents than needed for reranking
    docs = vector_store.similarity_search(query, k=top_k+10)

    # Rerank the retrieved documents and select the top N
    top1_docs = reranker.rerank(docs, query, top_n=5)

    print(f"Found {len(top1_docs)} documents for query '{query}'")
    response_content = []
    for doc in top1_docs:
        # Format the output for each reranked document
        response_content.append(
            f"""
                Score: {doc.get("score", "0")}
                Arxiv ID: {doc.get("document").get('arxiv_id', 'N/A')}
                Title: {doc.get("document").get('title','')}
                Content: {doc.get("document").get("text", "N/A")}
            """
        )

    return response_content

In [None]:
# Example query to retrieve and rerank documents about Mistral LLM
query = "can you tell me about mistral LLM?"
# Retrieve top 5 documents using reranking
docs = get_docs_rerank(query, top_k=5)
# Print the reranked documents
print("\n---\n".join(docs))

Found 5 documents for query 'can you tell me about mistral LLM?'

                Score: 0.05572029
                Arxiv ID: 2401.04088
                Title: Mixtral of Experts
                Content: Column(['4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, LÃ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, ThÃ©ophile Gervet, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each

## Async Embedding and Reranking Example
This section demonstrates how to use the async API for embedding, upserting, retrieval, and reranking. Async is recommended for best performance in production and large-scale scenarios.

### Async Embedding and Upsert
This section demonstrates how to use Pinecone's async API for embedding and upserting documents. Async operations are recommended for production and large-scale scenarios, as they provide better performance and scalability compared to synchronous calls.

- **Embedding:** Converts text data into vector representations using the selected model.
- **Upsert:** Adds or updates documents in the Pinecone index asynchronously.

You should use async workflows when working with large datasets or when you need non-blocking operations in your application.

In [None]:
# Async example: embedding, upserting, and index management for large-scale scenarios
import os
import getpass
from pinecone import PineconeAsyncio, ServerlessSpec
from uuid import uuid4
from langchain_core.documents import Document
from langchain_pinecone import PineconeVectorStore
from langchain_pinecone.embeddings import PineconeEmbeddings
from langchain_pinecone import PineconeRerank

# Get Pinecone API key and set index name
api_key = os.getenv("PINECONE_API_KEY") or getpass.getpass("Enter your Pinecone API key: ")
index_name = "rag-embedding-index"

# Use async context manager to interact with Pinecone
async with PineconeAsyncio(api_key=api_key) as pc:
        # Create index if it does not exist
        if not await pc.has_index(index_name):
            await pc.create_index(
                name=index_name,
                dimension=1024,
                metric="dotproduct",
                spec=ServerlessSpec(
                    cloud="aws",
                    region="us-east-2"
                ),
                deletion_protection="disabled",
                tags={
                    "environment": "development"
                }
            )

        # Get index description and host info
        pc_desc = await pc.describe_index(name=index_name)
        print(pc_desc)
        # aindex = pc.IndexAsyncio(host=pc_desc.host)

# Use async context manager for index operations
async with pc.IndexAsyncio(host=pc_desc.host) as idx:
        aembedder = PineconeEmbeddings()
        vector_store = PineconeVectorStore(index=idx, embedding=aembedder)

        documents = []
        for i in data:
            # Create Document objects for each entry
            documents.append(
                Document(
                    page_content=str(i["page_content"]),
                    metadata=i["metadata"] if isinstance(i["metadata"], dict) else i["metadata"][0]
                )
            )

        # Generate unique IDs and upsert documents asynchronously
        uuids = [str(uuid4()) for _ in range(len(data))]
        await vector_store.aadd_documents(documents=documents, ids=uuids)

{'deletion_protection': 'disabled',
 'dimension': 1024,
 'host': 'rag-embedding-index-yrrgefy.svc.apw5-4e34-81fa.pinecone.io',
 'metric': 'dotproduct',
 'name': 'rag-embedding-index',
 'spec': {'serverless': {'cloud': 'aws', 'region': 'us-west-2'}},
 'status': {'ready': True, 'state': 'Ready'},
 'tags': None,
 'vector_type': 'dense'}
Setting default config for model: multilingual-e5-large


### Async Retrieval and Reranking
This section shows how to retrieve documents and apply reranking using Pinecone's async API. Async retrieval is ideal for applications that require high throughput and responsiveness.

- **Retrieval:** Finds documents similar to a query using vector search.
- **Reranking:** Improves the relevance of results by reordering them based on a rerank model.

Async operations allow you to efficiently handle large queries and datasets, making your search and ranking processes scalable and fast.

In [None]:
# Async example: retrieval and reranking for best performance
async with pc.IndexAsyncio(host=pc_desc.host) as idx:
        aembedder = PineconeEmbeddings()
        reranker = PineconeRerank()
        vector_store = PineconeVectorStore(index=idx, embedding=aembedder)
        query = "can you tell me about mistral LLM?"
# Retrieve documents asynchronously
        docs = await vector_store.asimilarity_search(query, k=5)

        # Rerank the retrieved documents asynchronously
        top1_docs = reranker.arerank(docs, query, top_n=5)

        print(f"Found {len(docs)} documents for query '{query}'")
        response_content = []
        for doc in docs:
            # Format and print each document's details
            response_content.append(
                f"""
                    Arxiv ID: {doc.metadata.get('arxiv_id', 'N/A')}
                    Title: {doc.metadata.get('title','')}
                    Content: {doc.page_content}
                """
            )
        print("\n---\n".join(response_content))



Setting default config for model: multilingual-e5-large
Found 5 documents for query 'can you tell me about mistral LLM?'

                    Arxiv ID: 2401.04088
                    Title: Mixtral of Experts
                    Content: Code: https://github.com/mistralai/mistral-src Webpage: https://mistral.ai/news/mixtral-of-experts/ # Introduction In this paper, we present Mixtral 8x7B, a sparse mixture of experts model (SMoE) with open weights, licensed under Apache 2.0. Mixtral outperforms Llama 2 70B and GPT-3.5 on most benchmarks. As it only uses a subset of its parameters for every token, Mixtral allows faster inference speed at low batch-sizes, and higher throughput at large batch-sizes. Mixtral is a sparse mixture-of-experts network. It is a decoder-only model where the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the â
                
---

                    Arxiv ID: 2