<a href="https://colab.research.google.com/github/solomontessema/Generative-AI-with-Python/blob/main/notebooks/Vector_Databases_with_Pinecone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pinecone

In [None]:
!pip install vec2text

In [None]:

import pinecone
import openai
from pinecone import Pinecone, ServerlessSpec
from dotenv import load_dotenv
import os
from openai import OpenAI

load_dotenv()
api_key = os.getenv("PINECONE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Create Pinecone client
pc = Pinecone(api_key=api_key)

# Create index if it doesn't exist
if "my-index" not in pc.list_indexes().names():
    pc.create_index(
        name="my-index",
        dimension=1536,  # Match your embedding model output
        metric="cosine",  # Or 'euclidean', 'dotproduct'
        spec=ServerlessSpec(
            cloud="aws",  # Or 'gcp'
            region="us-east-1"
        )
    )

# Connect to index
index = pc.Index("my-index")
client = OpenAI(api_key=openai_api_key)

# Embed text using OpenAI
def embed(text):
    response = client.embeddings.create(
        input=[text],
        model="text-embedding-ada-002"
    )
    return response.data[0].embedding

# Upsert vector
index.upsert([
    ("concept1", embed("The theory of relativity and its implications"), {"topic": "physics"}),
    ("concept2", embed("The role of empathy in leadership"), {"topic": "psychology"})
])

# Query
query_vector = embed("My cat is so cute")
results = index.query(vector=query_vector, top_k=5, include_metadata=True)
results


In [None]:
query_vector = embed("The theory of relativity")
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True,
    filter={"topic": "physics"}  # Only return vectors tagged with topic 'physics'
)
print(results)


In [None]:
# Updating an existing vector by ID
index.upsert([
    ("concept1", embed("Updated theory of relativity and its modern implications"), {"topic": "physics"})
])

# Deleting a vector by ID
index.delete(ids=["concept2"])

all_vectors = index.query(
    vector=embed("The theory of relativity"),  # Use a relevant query string here
    top_k=10,
    include_metadata=True
)
print(all_vectors)


Batch Upserts and Queries Explore how to efficiently upsert and query large batches of vectors, which is essential for scaling your application.

Handling Different Similarity Metrics Experiment with different similarity metrics (cosine, euclidean, dotproduct) and understand how they affect search results.

Integrating Pinecone with Applications Build simple applications or APIs that use Pinecone for semantic search or recommendations, combining it with OpenAI for embedding generation.

Performance and Cost Optimization Learn best practices for index dimension sizing, vector pruning, and query tuning to optimize speed and cost.

In [None]:
# Batch upsert example
texts = [
    "Quantum mechanics and its applications",
    "The psychology of motivation",
    "Advances in renewable energy technology"
]

vectors_to_upsert = []
for i, text in enumerate(texts, start=3):
    vectors_to_upsert.append((f"concept{i}", embed(text), {"topic": "mixed"}))

index.upsert(vectors_to_upsert)

# Query after batch upsert
query_vector = embed("renewable energy")
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True
)
print(results)