<a href="https://colab.research.google.com/github/solomontessema/Generative-AI-with-Python/blob/main/notebooks/Vector_Databases_with_Pinecone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!pip install pinecone

Collecting pinecone
  Downloading pinecone-7.3.0-py3-none-any.whl.metadata (9.5 kB)
Collecting pinecone-plugin-assistant<2.0.0,>=1.6.0 (from pinecone)
  Downloading pinecone_plugin_assistant-1.8.0-py3-none-any.whl.metadata (30 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone)
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Collecting packaging<25.0,>=24.2 (from pinecone-plugin-assistant<2.0.0,>=1.6.0->pinecone)
  Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Downloading pinecone-7.3.0-py3-none-any.whl (587 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.6/587.6 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pinecone_plugin_assistant-1.8.0-py3-none-any.whl (259 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m259.3/259.3 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pinecone_plugin_interface-0.0.7-py3-none-any.whl (6.2 kB)
Downloading packagin

In [4]:
!pip install vec2text

Collecting vec2text
  Downloading vec2text-0.0.13-py3-none-any.whl.metadata (632 bytes)
Collecting bert_score (from vec2text)
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Collecting evaluate (from vec2text)
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Collecting optimum (from vec2text)
  Downloading optimum-2.0.0-py3-none-any.whl.metadata (14 kB)
Collecting pre-commit (from vec2text)
  Downloading pre_commit-4.3.0-py2.py3-none-any.whl.metadata (1.2 kB)
Collecting rouge_score (from vec2text)
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sacrebleu (from vec2text)
  Downloading sacrebleu-2.5.1-py3-none-any.whl.metadata (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Collecting cfgv>=2.0.0 (from pre-commit->vec2text)
  Downloading cfgv-3.4.0-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting identify>=1.0.0 (

In [5]:

import pinecone
import openai
from pinecone import Pinecone, ServerlessSpec
from dotenv import load_dotenv
import os
from openai import OpenAI

load_dotenv()
api_key = os.getenv("PINECONE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Create Pinecone client
pc = Pinecone(api_key=api_key)

# Create index if it doesn't exist
if "my-index" not in pc.list_indexes().names():
    pc.create_index(
        name="my-index",
        dimension=1536,  # Match your embedding model output
        metric="cosine",  # Or 'euclidean', 'dotproduct'
        spec=ServerlessSpec(
            cloud="aws",  # Or 'gcp'
            region="us-east-1"
        )
    )

# Connect to index
index = pc.Index("my-index")
client = OpenAI(api_key=openai_api_key)

# Embed text using OpenAI
def embed(text):
    response = client.embeddings.create(
        input=[text],
        model="text-embedding-ada-002"
    )
    return response.data[0].embedding

# Upsert vector
index.upsert([
    ("concept1", embed("The theory of relativity and its implications"), {"topic": "physics"}),
    ("concept2", embed("The role of empathy in leadership"), {"topic": "psychology"})
])

# Query
query_vector = embed("My cat is so cute")
results = index.query(vector=query_vector, top_k=5, include_metadata=True)
results


{'matches': [{'id': 'concept1',
              'metadata': {'topic': 'physics'},
              'score': 0.718628049,
              'values': []},
             {'id': 'concept2',
              'metadata': {'topic': 'psychology'},
              'score': 0.712097168,
              'values': []}],
 'namespace': '',
 'usage': {'read_units': 1}}

In [6]:
query_vector = embed("The theory of relativity")
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True,
    filter={"topic": "physics"}  # Only return vectors tagged with topic 'physics'
)
print(results)


{'matches': [{'id': 'concept1',
              'metadata': {'topic': 'physics'},
              'score': 0.94769305,
              'values': []}],
 'namespace': '',
 'usage': {'read_units': 1}}


In [8]:
# Updating an existing vector by ID
index.upsert([
    ("concept1", embed("Updated theory of relativity and its modern implications"), {"topic": "physics"})
])

# Deleting a vector by ID
index.delete(ids=["concept2"])

all_vectors = index.query(
    vector=embed("The theory of relativity"),  # Use a relevant query string here
    top_k=10,
    include_metadata=True
)
print(all_vectors)


{'matches': [{'id': 'concept1',
              'metadata': {'topic': 'physics'},
              'score': 0.922424316,
              'values': []}],
 'namespace': '',
 'usage': {'read_units': 1}}


Batch Upserts and Queries Explore how to efficiently upsert and query large batches of vectors, which is essential for scaling your application.

Handling Different Similarity Metrics Experiment with different similarity metrics (cosine, euclidean, dotproduct) and understand how they affect search results.

Integrating Pinecone with Applications Build simple applications or APIs that use Pinecone for semantic search or recommendations, combining it with OpenAI for embedding generation.

Performance and Cost Optimization Learn best practices for index dimension sizing, vector pruning, and query tuning to optimize speed and cost.

In [None]:
# Batch upsert example
texts = [
    "Quantum mechanics and its applications",
    "The psychology of motivation",
    "Advances in renewable energy technology"
]

vectors_to_upsert = []
for i, text in enumerate(texts, start=3):
    vectors_to_upsert.append((f"concept{i}", embed(text), {"topic": "mixed"}))

index.upsert(vectors_to_upsert)

# Query after batch upsert
query_vector = embed("renewable energy")
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True
)
print(results)