# Pinecone Vector Database

## GenAI Foundation Training - Day 2

---

### What You'll Learn

In this notebook, you'll learn Pinecone-specific concepts that differ from ChromaDB:

1. **When to use Pinecone vs ChromaDB** - Decision framework for choosing the right database
2. **Serverless architecture** - Auto-scaling, managed infrastructure
3. **Namespaces** - Multi-tenancy within a single index (game-changer!)
4. **Production-ready patterns** - Building scalable search engines
5. **Free tier optimization** - Maximize 100K vector limit

### Prerequisites

‚úÖ **Completed Notebook 03** - Vector Databases and Embeddings  
‚úÖ **OpenAI API key** - For generating embeddings  
‚úÖ **Pinecone account** - Free Starter tier (create at [pinecone.io](https://pinecone.io))

### What We WON'T Repeat from Notebook 03

- ‚ùå Embeddings fundamentals ‚Üí Already covered in Section 3
- ‚ùå Chunking strategies ‚Üí Already covered in Section 7
- ‚ùå Similarity metrics ‚Üí Already covered in Section 6

**Focus**: Pinecone's unique features and when to use them.

### Duration

This notebook takes approximately **1 hour** to complete.

Let's get started!

---

## Section 1: Setup & Installation

### Package Installation

**Important**: As of 2025, the official package is `pinecone` (renamed from the deprecated `pinecone-client` in v5.1.0)

We'll need:
- **pinecone** - Pinecone Python SDK (v8.0.0, API version 2025-10)
- **openai** - For generating embeddings (same as notebook 03)

**Requirements**:
- Python 3.10 or later (Python 3.9 is no longer supported)

In [None]:
# Install required packages
!pip install pinecone openai -q

print("‚úÖ Packages installed successfully!")

### Import Libraries

In [None]:
import time
from typing import List, Dict

# Pinecone
from pinecone import Pinecone, ServerlessSpec

# OpenAI for embeddings
import openai

print("‚úÖ All imports successful!")

### Setup API Keys (Using Google Colab Secrets)

**Setting Up Google Colab Secrets:**

1. Click the **üîë (key icon)** in the left sidebar
2. Add these secrets:
   - `PINECONE_API_KEY` - From your Pinecone dashboard ([pinecone.io](https://pinecone.io))
   - `OPENAI_API_KEY` - From OpenAI platform
3. Toggle **"Notebook access"** ON for each key

**Getting your Pinecone API Key:**
1. Go to [https://www.pinecone.io/](https://www.pinecone.io/)
2. Sign up (free, no credit card required)
3. Navigate to **API Keys** in the dashboard
4. Copy your API key

In [None]:
# Import userdata for Colab secrets
from google.colab import userdata

# Retrieve API keys
try:
    PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
    print("‚úÖ Pinecone API key loaded!")
except Exception as e:
    print(f"‚ùå Error loading Pinecone API key: {e}")
    print("Please set PINECONE_API_KEY in Google Colab Secrets.")

try:
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    print("‚úÖ OpenAI API key loaded!")
except Exception as e:
    print(f"‚ùå Error loading OpenAI API key: {e}")
    print("Please set OPENAI_API_KEY in Google Colab Secrets.")

print("\n‚úÖ API keys configured!")

### ‚ö†Ô∏è Free Starter Tier Limits

Pinecone's free tier includes:

- **1 serverless index** (cannot create multiple)
- **100,000 vectors** (~50MB with 1536-dim embeddings)
- **5 queries/second**
- **No credit card required**

This is perfect for:
- Learning and experimentation
- Small production applications
- Prototyping before scaling

**Pro tip**: Use **namespaces** (covered in Section 3) to organize multiple projects within your single free index!

---

## Section 2: ChromaDB vs Pinecone - Decision Framework

### When Should You Use Each?

Let's understand the key differences to make informed decisions.

### Comparison Table

| Criterion | ChromaDB | Pinecone |
|-----------|----------|----------|
| **Deployment** | Self-hosted (local/server) | Fully managed cloud |
| **Cost** | Free (you host) | Free tier + paid ($0.096/hr starter) |
| **Setup** | `pip install` ‚Üí instant | Account + index creation (~2 min) |
| **Scalability** | Manual (upgrade RAM/server) | Automatic (serverless auto-scaling) |
| **Operations** | You manage backups/monitoring | Built-in monitoring, backups, SLA |
| **Latency** | Local: <10ms, Server: 50-100ms | 50-100ms (cloud API) |
| **Privacy** | 100% local/your infrastructure | Data in Pinecone cloud |
| **Multi-tenancy** | Multiple collections | **Namespaces** (unique feature!) |
| **Metadata Filtering** | Basic filtering | Rich operators ($eq, $gte, $in, etc.) |
| **Production Ready** | Requires setup | Managed, production-grade |

### Key Takeaway

**ChromaDB** = Local/self-hosted, full control, zero cost  
**Pinecone** = Managed cloud, auto-scaling, production features

### API Pattern Comparison

Let's see how the APIs differ:

In [None]:
# ChromaDB pattern (from notebook 03)
print("ChromaDB API Pattern:")
print("="*50)
print("""
from chromadb import Client
client = Client()
collection = client.create_collection("docs")
collection.add(
    documents=[...], 
    embeddings=[...], 
    ids=[...]
)
results = collection.query(
    query_embeddings=[...], 
    n_results=3
)
""")

print("\nPinecone API Pattern:")
print("="*50)
print("""
from pinecone import Pinecone
pc = Pinecone(api_key="...")
index = pc.Index("docs")
index.upsert(
    vectors=[
        (id, embedding, metadata), 
        ...
    ],
    namespace="project-a"  # Multi-tenancy!
)
results = index.query(
    vector=[...], 
    top_k=3, 
    namespace="project-a"
)
""")

print("\nKey Differences:")
print("1. Pinecone uses upsert() instead of add()")
print("2. Pinecone has namespaces for multi-tenancy")
print("3. Pinecone returns matches with similarity scores")
print("4. Pinecone vector format: (id, embedding, metadata) tuples")

### Decision Tree: When to Use Each

**Choose ChromaDB When:**

‚úÖ Prototyping or learning  
‚úÖ Small datasets (<100K vectors)  
‚úÖ Local development environment  
‚úÖ Privacy-sensitive data (healthcare, legal, finance)  
‚úÖ Budget constraints (free forever)  
‚úÖ Need fastest local queries (<10ms)  
‚úÖ Want full control over infrastructure

**Choose Pinecone When:**

‚úÖ Production applications at scale  
‚úÖ Unpredictable traffic (need auto-scaling)  
‚úÖ Multi-tenant applications (SaaS products)  
‚úÖ Want managed infrastructure (no DevOps overhead)  
‚úÖ Need advanced features (hybrid search, rich filtering)  
‚úÖ Global deployment with low latency  
‚úÖ Enterprise SLA and support

**Hybrid Approach (Common in Production):**

Use **both**!
- **ChromaDB** for development/testing (fast iteration, no costs)
- **Pinecone** for production (managed, scalable, reliable)

This is what many companies do - develop with ChromaDB, deploy with Pinecone.

### Real-World Example

**Scenario**: Building a RAG chatbot for a SaaS product

**Development Phase**:  
‚Üí Use ChromaDB locally  
‚Üí Fast iteration, no costs  
‚Üí Test chunking strategies, retrieval quality

**Production Phase**:  
‚Üí Deploy with Pinecone  
‚Üí Auto-scaling for traffic spikes  
‚Üí Namespaces for multi-tenant isolation  
‚Üí Managed backups and monitoring

**Cost**: ~$70/month for Pinecone Starter (production) + $0 for ChromaDB (dev)

This is a common and cost-effective pattern!

---

## Section 3: Pinecone Architecture & Core Concepts

Now let's get hands-on with Pinecone-specific features.

### Initialize Pinecone Client

In [None]:
# Initialize Pinecone
pc = Pinecone(api_key=PINECONE_API_KEY)

print("‚úÖ Pinecone client initialized!")
print(f"\nExisting indexes: {[idx.name for idx in pc.list_indexes()]}")

### Understanding Indexes

An **index** is a container for vectors with specific configuration:

- **Dimension**: Must match your embedding model (1536 for OpenAI text-embedding-3-small)
- **Metric**: cosine, euclidean, or dotproduct
- **Spec**: Serverless (auto-scaling) or Pods (fixed capacity)

**Free tier**: You can create **1 serverless index** only.

### Create a Serverless Index

In [None]:
INDEX_NAME = "rag-demo"
DIMENSION = 1536  # OpenAI text-embedding-3-small

# Check if index exists
existing = [idx.name for idx in pc.list_indexes()]

if INDEX_NAME in existing:
    print(f"‚ö†Ô∏è  Index '{INDEX_NAME}' already exists. Deleting to start fresh...")
    pc.delete_index(INDEX_NAME)
    time.sleep(1)

# Create serverless index
print(f"üì¶ Creating serverless index '{INDEX_NAME}'...")
pc.create_index(
    name=INDEX_NAME,
    dimension=DIMENSION,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

# Wait until ready
while not pc.describe_index(INDEX_NAME).status['ready']:
    print("‚è≥ Waiting for index to be ready...")
    time.sleep(1)

print(f"\n‚úÖ Index '{INDEX_NAME}' is ready!")

# Connect to index
index = pc.Index(INDEX_NAME)

# Display stats
stats = index.describe_index_stats()
print(f"\nüìä Index Stats:")
print(f"   Total vectors: {stats.get('total_vector_count', 0)}")
print(f"   Dimension: {DIMENSION}")
print(f"   Metric: cosine")

### Namespaces: The Multi-Tenancy Superpower

**Problem**: Free tier = 1 index. How do you separate different projects or customers?

**Solution**: **Namespaces!**

A **namespace** is a logical partition within an index:
- Same index, different "folders"
- Queries only search the specified namespace
- **Perfect for multi-tenant SaaS applications**

**Examples**:
```python
# E-commerce
namespace = "customer-123"

# SaaS product
namespace = "org-acme-corp"

# Multi-project
namespace = "project-research"
```

**Free tier hack**: 1 index with **unlimited namespaces**!

### Visual Example

```
Index: "rag-demo"
‚îú‚îÄ‚îÄ namespace: "customer-1"     (10K vectors)
‚îú‚îÄ‚îÄ namespace: "customer-2"     (15K vectors)
‚îú‚îÄ‚îÄ namespace: "customer-3"     (8K vectors)
‚îî‚îÄ‚îÄ namespace: "test"           (1K vectors)

Total: 34K vectors in 1 index
```

Each customer's data is isolated, but you only use 1 index!

### Metadata Filtering

Pinecone supports rich metadata queries with powerful operators:

**Available Operators**:
- `$eq`: Equal to
- `$ne`: Not equal to
- `$gt`, `$gte`: Greater than (or equal)
- `$lt`, `$lte`: Less than (or equal)
- `$in`: In a list
- `$nin`: Not in a list

**Examples**:
```python
# Equal to
filter = {"category": {"$eq": "ai"}}

# Greater than or equal
filter = {"year": {"$gte": 2020}}

# Multiple conditions (AND)
filter = {
    "category": "ai",
    "year": {"$gte": 2020}
}

# In a list
filter = {"author": {"$in": ["Brown", "Vaswani"]}}
```

We'll see these in action in Section 4!

---

## Section 4: Hands-On - Building a Pinecone Search Engine

Let's build a production-ready search engine using Pinecone.

### Sample Dataset: AI Research Summaries

We'll use 8 AI research summaries to demonstrate:
- Similarity search
- Namespace isolation
- Metadata filtering

In [None]:
# Sample documents with rich metadata
documents = [
    {
        "text": "Transformers revolutionized NLP using self-attention mechanisms.",
        "metadata": {"category": "nlp", "year": 2017, "author": "Vaswani"}
    },
    {
        "text": "BERT uses bidirectional transformers for language understanding.",
        "metadata": {"category": "nlp", "year": 2018, "author": "Devlin"}
    },
    {
        "text": "GPT-3 demonstrated few-shot learning with 175B parameters.",
        "metadata": {"category": "llm", "year": 2020, "author": "Brown"}
    },
    {
        "text": "Stable Diffusion enables text-to-image generation using latent diffusion.",
        "metadata": {"category": "cv", "year": 2022, "author": "Rombach"}
    },
    {
        "text": "RLHF aligns LLMs with human preferences through reward modeling.",
        "metadata": {"category": "llm", "year": 2022, "author": "Ouyang"}
    },
    {
        "text": "AlphaFold 2 predicts protein structures with atomic accuracy.",
        "metadata": {"category": "biology", "year": 2021, "author": "Jumper"}
    },
    {
        "text": "RAG combines retrieval with generation for knowledge-grounded responses.",
        "metadata": {"category": "llm", "year": 2020, "author": "Lewis"}
    },
    {
        "text": "Vision Transformers apply transformers to image classification.",
        "metadata": {"category": "cv", "year": 2020, "author": "Dosovitskiy"}
    },
]

print(f"‚úÖ Loaded {len(documents)} research summaries")
print(f"\nCategories: {set(d['metadata']['category'] for d in documents)}")
print(f"Year range: {min(d['metadata']['year'] for d in documents)} - {max(d['metadata']['year'] for d in documents)}")

### Generate Embeddings

We'll reuse the embedding pattern from notebook 03:

In [None]:
# Initialize OpenAI client
client = openai.OpenAI(api_key=OPENAI_API_KEY)

def get_embeddings(texts: List[str]) -> List[List[float]]:
    """
    Generate embeddings using OpenAI (same as notebook 03).
    
    Args:
        texts: List of texts to embed
    
    Returns:
        List of embeddings
    """
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return [item.embedding for item in response.data]

# Generate embeddings for all documents
texts = [doc["text"] for doc in documents]
embeddings = get_embeddings(texts)

print(f"‚úÖ Generated {len(embeddings)} embeddings")
print(f"Embedding dimension: {len(embeddings[0])}")

### Upsert Vectors to Pinecone

**Upsert** = Insert or Update (same API for both operations)

**Vector format**: List of tuples `(id, embedding, metadata)`

# Prepare vectors for Pinecone
vectors = [
    (
        f"doc_{i}",                    # ID
        embeddings[i],                  # Embedding vector
        {
            **documents[i]["metadata"],  # Spread existing metadata (category, year, author)
            "text": documents[i]["text"]  # Add the text! (Critical for RAG)
        }
    )
    for i in range(len(documents))
]

# Upsert with namespace
NAMESPACE = "research"
index.upsert(vectors=vectors, namespace=NAMESPACE)

print(f"‚úÖ Upserted {len(vectors)} vectors to namespace '{NAMESPACE}'")
print("   Each vector includes: embedding + metadata (category, year, author, TEXT)")

# Check stats
time.sleep(1)  # Wait for index to update
stats = index.describe_index_stats()
print(f"\nüìä Index Stats:")
print(f"   Total vectors: {stats.get('total_vector_count', 0)}")
print(f"   Namespaces: {stats.get('namespaces', {})}")

In [None]:
# Prepare vectors for Pinecone
vectors = [
    (
        f"doc_{i}",                    # ID
        embeddings[i],                  # Embedding vector
        documents[i]["metadata"]        # Metadata
    )
    for i in range(len(documents))
]

# Upsert with namespace
NAMESPACE = "research"
index.upsert(vectors=vectors, namespace=NAMESPACE)

print(f"‚úÖ Upserted {len(vectors)} vectors to namespace '{NAMESPACE}'")

# Check stats
time.sleep(1)  # Wait for index to update
stats = index.describe_index_stats()
print(f"\nüìä Index Stats:")
print(f"   Total vectors: {stats.get('total_vector_count', 0)}")
print(f"   Namespaces: {stats.get('namespaces', {})}")

def search(query: str, top_k: int = 3, namespace: str = NAMESPACE, filter_dict: Dict = None) -> Dict:
    """
    Search Pinecone for similar documents.
    
    Args:
        query: Search query
        top_k: Number of results to return
        namespace: Namespace to search in
        filter_dict: Optional metadata filter
    
    Returns:
        Query results
    """
    # Generate query embedding
    query_emb = get_embeddings([query])[0]
    
    # Search
    results = index.query(
        vector=query_emb,
        top_k=top_k,
        namespace=namespace,
        filter=filter_dict,
        include_metadata=True
    )
    
    return results

# Test search
query = "How do attention mechanisms work in neural networks?"
results = search(query)

print(f"Query: {query}\n")
print("Top 3 results:\n")
for i, match in enumerate(results['matches'], 1):
    print(f"{i}. Score: {match['score']:.4f}")
    print(f"   Text: {match['metadata']['text']}")
    print(f"   Category: {match['metadata']['category']}")
    print(f"   Author: {match['metadata']['author']} ({match['metadata']['year']})")
    print(f"   ID: {match['id']}")
    print()

In [None]:
def search(query: str, top_k: int = 3, namespace: str = NAMESPACE, filter_dict: Dict = None) -> Dict:
    """
    Search Pinecone for similar documents.
    
    Args:
        query: Search query
        top_k: Number of results to return
        namespace: Namespace to search in
        filter_dict: Optional metadata filter
    
    Returns:
        Query results
    """
    # Generate query embedding
    query_emb = get_embeddings([query])[0]
    
    # Search
    results = index.query(
        vector=query_emb,
        top_k=top_k,
        namespace=namespace,
        filter=filter_dict,
        include_metadata=True
    )
    
    return results

# Test search
query = "How do attention mechanisms work in neural networks?"
results = search(query)

print(f"Query: {query}\n")
print("Top 3 results:\n")
for i, match in enumerate(results['matches'], 1):
    print(f"{i}. Score: {match['score']:.4f}")
    print(f"   Category: {match['metadata']['category']}")
    print(f"   Author: {match['metadata']['author']} ({match['metadata']['year']})")
    print(f"   ID: {match['id']}")
    print()

# Search only in LLM papers
query = "What are recent advances in language models?"
results = search(query, filter_dict={"category": {"$eq": "llm"}})

print(f"Query: {query}")
print("Filter: category = 'llm'\n")
print("Results:\n")
for i, match in enumerate(results['matches'], 1):
    print(f"{i}. {match['metadata']['author']} ({match['metadata']['year']})")
    print(f"   Text: {match['metadata']['text']}")
    print(f"   Score: {match['score']:.4f}")
    print()

In [None]:
# Search papers from 2020 or later
query = "Recent AI research"
results = search(query, top_k=5, filter_dict={"year": {"$gte": 2020}})

print(f"Query: {query}")
print("Filter: year >= 2020\n")
print("Results:\n")
for i, match in enumerate(results['matches'], 1):
    print(f"{i}. {match['metadata']['author']} ({match['metadata']['year']}) - {match['metadata']['category']}")
    print(f"   Text: {match['metadata']['text']}")
    print(f"   Score: {match['score']:.4f}")
    print()

In [None]:
# Combined filters (AND logic)
query = "Language model research"
results = search(
    query, 
    filter_dict={
        "category": "llm",
        "year": {"$gte": 2020}
    }
)

print(f"Query: {query}")
print("Filter: category='llm' AND year >= 2020\n")
print("Results:\n")
for i, match in enumerate(results['matches'], 1):
    print(f"{i}. {match['metadata']['author']} ({match['metadata']['year']})")
    print(f"   Text: {match['metadata']['text']}")
    print(f"   Score: {match['score']:.4f}")
    print()

In [None]:
# Combined filters (AND logic)
query = "Language model research"
results = search(
    query, 
    filter_dict={
        "category": "llm",
        "year": {"$gte": 2020}
    }
)

print(f"Query: {query}")
print("Filter: category='llm' AND year >= 2020\n")
print("Results:\n")
for i, match in enumerate(results['matches'], 1):
    print(f"{i}. {match['metadata']['author']} ({match['metadata']['year']})")
    print(f"   Score: {match['score']:.4f}")
    print()

class PineconeSearchEngine:
    """
    Production-ready search engine using Pinecone.
    
    Features:
    - Automatic embedding generation
    - Namespace support for multi-tenancy
    - Metadata filtering
    - Batch operations
    """
    
    def __init__(self, pinecone_key: str, openai_key: str, index_name: str, namespace: str = "default"):
        """
        Initialize the search engine.
        
        Args:
            pinecone_key: Pinecone API key
            openai_key: OpenAI API key
            index_name: Name of Pinecone index
            namespace: Default namespace
        """
        self.pc = Pinecone(api_key=pinecone_key)
        self.index = self.pc.Index(index_name)
        self.client = openai.OpenAI(api_key=openai_key)
        self.namespace = namespace
    
    def _embed(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for texts."""
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=texts
        )
        return [item.embedding for item in response.data]
    
    def add_documents(self, documents: List[Dict], namespace: str = None) -> None:
        """
        Add documents with automatic embedding.
        
        Args:
            documents: List of dicts with 'text' and optional 'metadata'
            namespace: Namespace to use (defaults to self.namespace)
        """
        namespace = namespace or self.namespace
        
        # Extract texts and generate embeddings
        texts = [doc["text"] for doc in documents]
        embeddings = self._embed(texts)
        
        # Prepare vectors
        vectors = [
            (
                f"doc_{i}_{namespace}",
                embeddings[i],
                {
                    **doc.get("metadata", {}),
                    "text": doc["text"]  # Always include the text!
                }
            )
            for i, doc in enumerate(documents)
        ]
        
        # Upsert
        self.index.upsert(vectors=vectors, namespace=namespace)
        print(f"‚úÖ Added {len(vectors)} documents to namespace '{namespace}'")
    
    def search(self, query: str, top_k: int = 5, filter_dict: Dict = None, namespace: str = None) -> Dict:
        """
        Search for similar documents.
        
        Args:
            query: Search query
            top_k: Number of results
            filter_dict: Optional metadata filter
            namespace: Namespace to search (defaults to self.namespace)
        
        Returns:
            Dict with query and formatted results
        """
        namespace = namespace or self.namespace
        
        # Generate query embedding
        query_emb = self._embed([query])[0]
        
        # Search
        results = self.index.query(
            vector=query_emb,
            top_k=top_k,
            namespace=namespace,
            filter=filter_dict,
            include_metadata=True
        )
        
        # Format results
        return {
            "query": query,
            "namespace": namespace,
            "results": [
                {
                    "id": match['id'],
                    "score": match['score'],
                    "metadata": match.get('metadata', {})
                }
                for match in results['matches']
            ]
        }
    
    def get_stats(self) -> Dict:
        """Get index statistics."""
        return self.index.describe_index_stats()

print("‚úÖ PineconeSearchEngine class ready!")

In [None]:
class PineconeSearchEngine:
    """
    Production-ready search engine using Pinecone.
    
    Features:
    - Automatic embedding generation
    - Namespace support for multi-tenancy
    - Metadata filtering
    - Batch operations
    """
    
    def __init__(self, pinecone_key: str, openai_key: str, index_name: str, namespace: str = "default"):
        """
        Initialize the search engine.
        
        Args:
            pinecone_key: Pinecone API key
            openai_key: OpenAI API key
            index_name: Name of Pinecone index
            namespace: Default namespace
        """
        self.pc = Pinecone(api_key=pinecone_key)
        self.index = self.pc.Index(index_name)
        self.client = openai.OpenAI(api_key=openai_key)
        self.namespace = namespace
    
    def _embed(self, texts: List[str]) -> List[List[float]]:
        """Generate embeddings for texts."""
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=texts
        )
        return [item.embedding for item in response.data]
    
    def add_documents(self, documents: List[Dict], namespace: str = None) -> None:
        """
        Add documents with automatic embedding.
        
        Args:
            documents: List of dicts with 'text' and optional 'metadata'
            namespace: Namespace to use (defaults to self.namespace)
        """
        namespace = namespace or self.namespace
        
        # Extract texts and generate embeddings
        texts = [doc["text"] for doc in documents]
        embeddings = self._embed(texts)
        
        # Prepare vectors
        vectors = [
            (
                f"doc_{i}_{namespace}",
                embeddings[i],
                doc.get("metadata", {})
            )
            for i, doc in enumerate(documents)
        ]
        
        # Upsert
        self.index.upsert(vectors=vectors, namespace=namespace)
        print(f"‚úÖ Added {len(vectors)} documents to namespace '{namespace}'")
    
    def search(self, query: str, top_k: int = 5, filter_dict: Dict = None, namespace: str = None) -> Dict:
        """
        Search for similar documents.
        
        Args:
            query: Search query
            top_k: Number of results
            filter_dict: Optional metadata filter
            namespace: Namespace to search (defaults to self.namespace)
        
        Returns:
            Dict with query and formatted results
        """
        namespace = namespace or self.namespace
        
        # Generate query embedding
        query_emb = self._embed([query])[0]
        
        # Search
        results = self.index.query(
            vector=query_emb,
            top_k=top_k,
            namespace=namespace,
            filter=filter_dict,
            include_metadata=True
        )
        
        # Format results
        return {
            "query": query,
            "namespace": namespace,
            "results": [
                {
                    "id": match['id'],
                    "score": match['score'],
                    "metadata": match.get('metadata', {})
                }
                for match in results['matches']
            ]
        }
    
    def get_stats(self) -> Dict:
        """Get index statistics."""
        return self.index.describe_index_stats()

print("‚úÖ PineconeSearchEngine class ready!")

# Initialize search engine
engine = PineconeSearchEngine(
    pinecone_key=PINECONE_API_KEY,
    openai_key=OPENAI_API_KEY,
    index_name=INDEX_NAME,
    namespace="research"
)

# Search
result = engine.search("transformers in AI", top_k=3)

print(f"Query: {result['query']}")
print(f"Namespace: {result['namespace']}\n")
print("Results:\n")
for i, r in enumerate(result['results'], 1):
    print(f"{i}. {r['metadata']['author']} ({r['metadata']['year']})")
    print(f"   Category: {r['metadata']['category']}")
    print(f"   Text: {r['metadata']['text']}")
    print(f"   Score: {r['score']:.4f}")
    print()

In [None]:
# Search with filter
result = engine.search(
    "computer vision research",
    top_k=3,
    filter_dict={"category": "cv"}
)

print(f"Query: {result['query']}")
print("Filter: category='cv'\n")
print("Results:\n")
for i, r in enumerate(result['results'], 1):
    print(f"{i}. {r['metadata']['author']} ({r['metadata']['year']})")
    print(f"   Text: {r['metadata']['text']}")
    print(f"   Score: {r['score']:.4f}")
    print()

In [None]:
# Search with filter
result = engine.search(
    "computer vision research",
    top_k=3,
    filter_dict={"category": "cv"}
)

print(f"Query: {result['query']}")
print("Filter: category='cv'\n")
print("Results:\n")
for i, r in enumerate(result['results'], 1):
    print(f"{i}. {r['metadata']['author']} ({r['metadata']['year']})")
    print(f"   Score: {r['score']:.4f}")
    print()

In [None]:
# Get stats
stats = engine.get_stats()
print("üìä Index Statistics:\n")
print(f"Total vectors: {stats.get('total_vector_count', 0)}")
print(f"Namespaces: {list(stats.get('namespaces', {}).keys())}")
for ns, info in stats.get('namespaces', {}).items():
    print(f"  - {ns}: {info.get('vector_count', 0)} vectors")

---

## Section 5: Advanced Features & Free Tier Optimization

### Hybrid Search (Preview)

Pinecone supports **hybrid search** - combining semantic and keyword search:

**Semantic search** (what we did):
- Finds by meaning
- "ML" matches "machine learning"
- Uses dense vectors (OpenAI embeddings)

**Keyword search**:
- Exact terms: "GPT-3" finds documents with "GPT-3"
- Uses sparse vectors (BM25 weights)

**Hybrid = Semantic + Keyword**

Pinecone's hybrid search combines:
- **Dense vectors**: OpenAI embedding (semantic similarity)
- **Sparse vectors**: BM25 keyword weights (exact matches)

**Note**: Hybrid search requires the `pinecone-text` library and additional setup. The free tier supports it!

**When to use**:
- Legal/medical documents (exact terminology matters)
- Code search (function names, variable names)
- Product search (model numbers, SKUs)

**Learn more**: [Pinecone Hybrid Search Guide](https://docs.pinecone.io/guides/data/understanding-hybrid-search)

### Free Tier Optimization Strategies

**Limits**: 1 index, 100K vectors, 5 QPS

**How to maximize your free tier**:

#### 1. Use Namespaces (Not Multiple Indexes)

‚ùå **Wrong**: Create multiple indexes
```python
# This won't work on free tier!
pc.create_index("customer-1")
pc.create_index("customer-2")  # Error: free tier = 1 index
```

‚úÖ **Right**: Use namespaces
```python
# Single index, multiple namespaces
index.upsert(vectors=[...], namespace="customer-1")
index.upsert(vectors=[...], namespace="customer-2")
```

#### 2. Chunk Size Optimization

**Calculate your capacity**:
```
100K vectors √∑ chunks per document = number of documents

Example:
- 512 chars/chunk ‚Üí ~4K chunks per 2MB doc
- 100K √∑ 4K = ~25 documents (2MB each)
```

**Adjust based on your use case**:
- Smaller chunks (256 chars) = more precise, fewer docs
- Larger chunks (1024 chars) = less precise, more docs

#### 3. Selective Metadata

‚ùå **Don't store large metadata**:
```python
metadata = {
    "full_text": "...",  # Don't duplicate text!
    "large_field": "..."  # Keep <40KB per vector
}
```

‚úÖ **Only filterable fields**:
```python
metadata = {
    "source": "paper.pdf",
    "page": 5,
    "category": "ai",
    "year": 2023
}
```

#### 4. Batch Upserts

Batch up to **100 vectors** per upsert for better performance:

In [None]:
def batch_upsert(index, vectors: List, namespace: str, batch_size: int = 100):
    """
    Efficiently upsert vectors in batches.
    
    Args:
        index: Pinecone index
        vectors: List of (id, embedding, metadata) tuples
        namespace: Namespace to use
        batch_size: Vectors per batch (max 100)
    """
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i + batch_size]
        index.upsert(vectors=batch, namespace=namespace)
        print(f"‚úÖ Batch {i//batch_size + 1}: {len(batch)} vectors")

print("üí° Use batch operations to improve performance and reduce API calls")

#### 5. Cache Frequent Queries

For frequently asked questions, cache results:

```python
# Pseudocode
cache = {}  # Or use Redis/Memcached

def cached_search(query):
    if query in cache:
        return cache[query]
    
    results = engine.search(query)
    cache[query] = results
    return results
```

This reduces QPS usage and improves response time!

### Performance Comparison: ChromaDB vs Pinecone

| Metric | ChromaDB (local) | Pinecone (cloud) |
|--------|------------------|------------------|
| **Query Latency** | <10ms | 50-100ms |
| **Scalability** | Manual (upgrade server) | Automatic (serverless) |
| **Concurrent Users** | Limited by your server | Excellent (auto-scaling) |
| **Cold Start** | None | ~100ms (first query) |
| **Setup Time** | Instant | ~2 minutes |
| **Maintenance** | You manage | Fully managed |

### Key Takeaway

**ChromaDB**: Faster queries, but you handle scaling  
**Pinecone**: Slightly higher latency, but zero-ops scaling

For most production apps, the **50-100ms latency** is acceptable given the operational benefits!

---

## Section 6: Summary & Best Practices

### What We Learned

‚úÖ **Decision framework** - When to use ChromaDB vs Pinecone vs hybrid  
‚úÖ **Serverless architecture** - Auto-scaling managed infrastructure  
‚úÖ **Namespaces** - Multi-tenancy within single index (free tier hack!)  
‚úÖ **Metadata filtering** - Rich query operators ($eq, $gte, $in)  
‚úÖ **Production patterns** - PineconeSearchEngine class  
‚úÖ **Free tier optimization** - Maximize 100K vector limit

### Key Differences Recap

| Feature | ChromaDB | Pinecone |
|---------|----------|----------|
| **Architecture** | Self-hosted | Managed cloud |
| **Multi-tenancy** | Collections | **Namespaces** |
| **Scaling** | Manual | Automatic |
| **Cost** | Free (DIY) | Free tier + paid |
| **Best For** | Dev/prototyping | Production/scale |

### Best Practices

#### 1. Index Configuration

‚úÖ Match dimension to embedding model (1536 for OpenAI)  
‚úÖ Use `cosine` metric for most cases  
‚úÖ Start with serverless (auto-scaling)  
‚ùå Don't create multiple indexes on free tier

#### 2. Namespaces

‚úÖ Plan namespace strategy early (`customer-{id}`, `org-{name}`)  
‚úÖ Use consistent naming (lowercase, hyphens)  
‚úÖ Document your namespace schema  
‚ùå Don't mix data from different tenants

#### 3. Metadata

‚úÖ Only store filterable fields  
‚úÖ Keep metadata <40KB per vector  
‚úÖ Use consistent field names  
‚ùå Don't duplicate the text in metadata

#### 4. Queries

‚úÖ Set appropriate `top_k` (3-10 for most cases)  
‚úÖ Use metadata filters when possible  
‚úÖ Cache frequent queries  
‚ùå Don't query without filters if you have multi-tenant data

#### 5. Free Tier

‚úÖ Monitor vector count (100K max)  
‚úÖ Use namespaces, not multiple indexes  
‚úÖ Optimize chunk sizes for your use case  
‚úÖ Batch upsert operations (up to 100 vectors)  
‚ùå Don't exceed QPS limits (5/second)

### Common Pitfalls

‚ùå **Creating multiple indexes** (free tier = 1)  
‚Üí ‚úÖ Use namespaces instead

‚ùå **Mismatched dimensions** (index vs embedding model)  
‚Üí ‚úÖ Match exactly (1536 for OpenAI small)

‚ùå **Forgetting namespace in queries**  
‚Üí ‚úÖ Specify namespace or use default

‚ùå **Large metadata (>40KB per vector)**  
‚Üí ‚úÖ Store only filterable fields

‚ùå **Single-vector upserts** (slow)  
‚Üí ‚úÖ Batch up to 100 vectors

‚ùå **Not monitoring usage** (exceed free tier)  
‚Üí ‚úÖ Check stats regularly with `get_stats()`

### Resources

**Official Documentation**:
- [Pinecone Docs](https://docs.pinecone.io/)
- [Serverless Indexes Guide](https://docs.pinecone.io/guides/indexes/understanding-indexes)
- [Metadata Filtering](https://docs.pinecone.io/guides/data/filter-with-metadata)
- [Hybrid Search](https://docs.pinecone.io/guides/data/understanding-hybrid-search)

**Tutorials**:
- [Pinecone Quickstart](https://docs.pinecone.io/guides/get-started/quickstart)
- [Production Best Practices](https://docs.pinecone.io/guides/production/best-practices)

**Community**:
- [Pinecone Community Forum](https://community.pinecone.io/)
- [Pinecone GitHub Examples](https://github.com/pinecone-io/examples)

### Next Steps in This Course

**In upcoming notebooks**:
- **Notebook 04**: LangChain + Pinecone integration
- **Notebook 07**: Advanced RAG with Pinecone
- **LAB1**: Build a RAG chatbot with Pinecone backend

**You're now ready for**:
- Production RAG applications
- Multi-tenant SaaS products
- Scalable semantic search
- Hybrid search architectures

### Cleanup (Optional)

If you want to delete the index to stay within free tier limits:

In [None]:
# Uncomment to delete the index
# pc.delete_index(INDEX_NAME)
# print(f"‚úÖ Deleted index '{INDEX_NAME}'")

print("üí° Keep the index if you want to use it in future notebooks!")
print("üí° Free tier includes 1 index, so you can keep it without charges.")

---

## Congratulations!

You now understand:

‚úÖ When to use Pinecone vs ChromaDB  
‚úÖ How to build production-ready search with Pinecone  
‚úÖ Namespaces for multi-tenancy  
‚úÖ Metadata filtering for precise queries  
‚úÖ Free tier optimization strategies

**You're ready to build scalable, production-grade semantic search applications!**

See you in the next notebook! üöÄ