# Pinecone Quickstart

This notebook introduces the basics of working with Pinecone, a serverless vector database. You'll learn how to:

1. Connect to Pinecone
2. Create an index
3. Upsert vectors
4. Query vectors
5. Use metadata filtering
6. Work with namespaces
7. Delete vectors and indexes

## Setup

First, make sure you have your Pinecone API key set as an environment variable named `PINECONE_API_KEY`. 

Let's import the necessary libraries:

In [None]:
import os
import time
import uuid
import numpy as np

from pinecone import Pinecone, ServerlessSpec

# Check if the API key is set
api_key = os.environ.get("PINECONE_API_KEY")
if not api_key:
    raise ValueError("PINECONE_API_KEY environment variable not set")

# Initialize the Pinecone client
pc = Pinecone(api_key=api_key)
print("Successfully connected to Pinecone!")

## 1. Creating an Index

Let's create a serverless index with the following parameters:
- Dimension: 768 (matches many embedding models)
- Metric: cosine similarity
- Cloud: AWS
- Region: us-east-1

In [None]:
# Create a unique index name
unique_id = str(uuid.uuid4())[:8]
index_name = f"quickstart-{unique_id}"

# Default settings
DIMENSION = 768
METRIC = "cosine"
CLOUD = "aws"
REGION = "us-east-1"

# Check if the index already exists
if pc.has_index(index_name):
    print(f"Index {index_name} already exists")
else:
    # Create the index
    print(f"Creating index: {index_name}")
    pc.create_index(
        name=index_name,
        vector_type="dense",
        dimension=DIMENSION,
        metric=METRIC,
        spec=ServerlessSpec(
            cloud=CLOUD,
            region=REGION
        )
    )
    
    # Wait for the index to be ready
    print("Waiting for index to be ready...")
    while True:
        try:
            status = pc.describe_index(index_name).status
            if status['ready']:
                print(f"Index {index_name} is ready")
                break
            else:
                print("Still waiting...")
                time.sleep(5)
        except Exception as e:
            print(f"Error checking index status: {e}")
            time.sleep(2)

## 2. Creating Sample Vectors

Let's create some random vectors to use in our examples. We'll normalize them to ensure they have unit length.

In [None]:
def create_random_vectors(count, dim):
    """Create random normalized vectors for testing."""
    vectors = np.random.rand(count, dim).astype(np.float32)
    # Normalize vectors to unit length
    vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)
    return vectors.tolist()

# Generate 100 random vectors
vector_count = 100
vectors = create_random_vectors(vector_count, DIMENSION)

# Create vector records with IDs and metadata
vector_data = [
    {
        "id": f"vec-{i}",
        "values": vectors[i],
        "metadata": {
            "category": f"category-{i % 5}",  # Creates 5 categories
            "score": round(np.random.random(), 2),  # Random score between 0 and 1
            "is_valid": bool(i % 2)  # Alternates between True and False
        }
    }
    for i in range(vector_count)
]

print(f"Created {vector_count} sample vectors")
print(f"Sample vector structure: {vector_data[0]}")

## 3. Upserting Vectors

Now let's upsert (insert or update) the vectors into our index.

In [None]:
# Get the index
index = pc.index(index_name)

# Upsert in batches of 100 (max batch size)
batch_size = 100
for i in range(0, len(vector_data), batch_size):
    batch = vector_data[i:i+batch_size]
    index.upsert(vectors=batch)
    print(f"Upserted vectors {i} to {i+len(batch)-1}")

# Allow time for indexing
time.sleep(2)

# Check vector count
stats = index.describe_index_stats()
print(f"Total vectors in index: {stats.namespaces.get('', {}).get('vector_count', 0)}")

## 4. Basic Vector Query

Let's perform a simple vector query to find similar vectors.

In [None]:
# Pick a random vector to query with
query_idx = np.random.randint(0, len(vector_data))
query_vector = vector_data[query_idx]["values"]
query_id = vector_data[query_idx]["id"]

print(f"Querying with vector {query_id}")

# Perform the query
results = index.query(
    vector=query_vector,
    top_k=5,  # Return top 5 matches
    include_metadata=True
)

# Print results
print(f"\nQuery results for {query_id}:")
for match in results.matches:
    print(f"ID: {match.id}, Score: {match.score:.4f}")
    if match.metadata:
        print(f"Metadata: {match.metadata}")
    print()

## 5. Metadata Filtering

Pinecone allows you to filter query results based on metadata.

In [None]:
# Query with metadata filter for category-0
filter_results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True,
    filter={
        "category": {"$eq": "category-0"}
    }
)

print(f"\nQuery results with filter for category-0:")
for match in filter_results.matches:
    print(f"ID: {match.id}, Score: {match.score:.4f}")
    if match.metadata:
        print(f"Metadata: {match.metadata}")
    print()

# More complex filter: score > 0.5 AND is_valid = True
complex_filter_results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True,
    filter={
        "$and": [
            {"score": {"$gt": 0.5}},
            {"is_valid": True}
        ]
    }
)

print(f"\nQuery results with complex filter (score > 0.5 AND is_valid = True):")
for match in complex_filter_results.matches:
    print(f"ID: {match.id}, Score: {match.score:.4f}")
    if match.metadata:
        print(f"Metadata: {match.metadata}")
    print()

## 6. Working with Namespaces

Namespaces allow you to partition your data within an index.

In [None]:
# Create new vectors for a different namespace
namespace_vectors = create_random_vectors(50, DIMENSION)
namespace_data = [
    {
        "id": f"ns-vec-{i}",
        "values": namespace_vectors[i],
        "metadata": {
            "namespace": "test-namespace",
            "value": round(np.random.random(), 2)
        }
    }
    for i in range(50)
]

# Upsert to the new namespace
index.upsert(vectors=namespace_data, namespace="test-namespace")
print("Upserted vectors to 'test-namespace'")

# Allow time for indexing
time.sleep(2)

# Check namespace statistics
stats = index.describe_index_stats()
print("\nNamespace Statistics:")
for ns_name, ns_stats in stats.namespaces.items():
    ns_display = ns_name if ns_name else "default"
    print(f"Namespace: {ns_display}, Vector Count: {ns_stats.vector_count}")
    
# Query in the new namespace
query_vector = namespace_data[0]["values"]
query_id = namespace_data[0]["id"]

namespace_results = index.query(
    vector=query_vector,
    top_k=3,
    include_metadata=True,
    namespace="test-namespace"
)

print(f"\nQuery results in 'test-namespace':")
for match in namespace_results.matches:
    print(f"ID: {match.id}, Score: {match.score:.4f}")
    if match.metadata:
        print(f"Metadata: {match.metadata}")
    print()

## 7. Deleting Vectors

Now let's delete some vectors from our index.

In [None]:
# Delete 5 random vectors from the default namespace
delete_indices = np.random.choice(len(vector_data), 5, replace=False)
ids_to_delete = [vector_data[i]["id"] for i in delete_indices]

print(f"Deleting vectors: {ids_to_delete}")
index.delete(ids=ids_to_delete)

# Allow time for deletion to process
time.sleep(2)

# Check vector count after deletion
stats = index.describe_index_stats()
print(f"Vectors in default namespace after deletion: {stats.namespaces.get('', {}).get('vector_count', 0)}")

# Delete all vectors in the test namespace
print("\nDeleting all vectors in 'test-namespace'")
index.delete(delete_all=True, namespace="test-namespace")

# Allow time for deletion to process
time.sleep(2)

# Check namespace statistics after deletion
stats = index.describe_index_stats()
print("\nNamespace Statistics after deletion:")
for ns_name, ns_stats in stats.namespaces.items():
    ns_display = ns_name if ns_name else "default"
    print(f"Namespace: {ns_display}, Vector Count: {ns_stats.vector_count}")

## 8. Cleaning Up

Finally, let's delete our index to clean up resources. Uncomment the cell below to run it.

In [None]:
# Uncomment to delete the index
# print(f"Deleting index: {index_name}")
# pc.delete_index(index_name)
# print(f"Index {index_name} deleted")

## Summary

In this notebook, you've learned the basics of working with Pinecone:

- Creating a serverless index
- Upserting vectors with metadata
- Querying vectors and understanding similarity scores
- Filtering query results using metadata filters
- Working with namespaces to partition your data
- Deleting vectors and indexes

These foundational concepts will help you build more complex applications with Pinecone.