# Workout: Vector Stores

## Setup
```bash
uv add chromadb pinecone-client
```

---
## Drill 1: ChromaDB Basics 游릭
**Task:** Create a collection and add documents

In [None]:
import chromadb

client = chromadb.Client()

# Create collection "test_docs"
# Add 3 documents about programming
# Print the collection count

---
## Drill 2: ChromaDB Query 游릭
**Task:** Query the collection

In [None]:
import chromadb

client = chromadb.Client()
collection = client.create_collection("languages")

collection.add(
    documents=[
        "Python is great for data science",
        "JavaScript runs in the browser",
        "Rust is memory safe",
        "Go is good for servers"
    ],
    ids=["python", "js", "rust", "go"]
)

# Query: "What language is good for web?"
# Print top 2 results with their distances

---
## Drill 3: Metadata Filtering 游리
**Task:** Add and filter by metadata

In [None]:
import chromadb

client = chromadb.Client()
collection = client.create_collection("products")

collection.add(
    documents=[
        "iPhone 15 Pro",
        "Samsung Galaxy S24",
        "MacBook Pro",
        "Dell XPS"
    ],
    ids=["iphone", "samsung", "macbook", "dell"],
    metadatas=[
        {"category": "phone", "price": 1000},
        {"category": "phone", "price": 800},
        {"category": "laptop", "price": 2000},
        {"category": "laptop", "price": 1500}
    ]
)

# Query for "best device" but only in category "phone"
# Query for items with price > 1000

---
## Drill 4: Persistent Storage 游릭
**Task:** Use persistent ChromaDB

In [None]:
import chromadb

# Create persistent client in "./chroma_db" folder
# Add some documents
# Verify they persist after reloading

---
## Drill 5: Pre-computed Embeddings 游리
**Task:** Add documents with your own embeddings

In [None]:
import chromadb
import numpy as np

client = chromadb.Client()
collection = client.create_collection("custom_embeddings")

# Create fake embeddings (in real use, use OpenAI or Sentence Transformers)
fake_embeddings = [
    np.random.rand(384).tolist(),
    np.random.rand(384).tolist(),
    np.random.rand(384).tolist()
]

# Add with embeddings parameter instead of documents
# Then query using query_embeddings

---
## Drill 6: ChromaDB with OpenAI 游리
**Task:** Use OpenAI embeddings with ChromaDB

In [None]:
import chromadb
from chromadb.utils import embedding_functions

# Create OpenAI embedding function
# Create collection with this function
# Add documents and query

---
## Drill 7: Update and Delete 游릭
**Task:** Update and delete documents

In [None]:
import chromadb

client = chromadb.Client()
collection = client.create_collection("mutable")

collection.add(
    documents=["Original text"],
    ids=["doc1"]
)

# Update doc1 with new text
# Verify the update

# Delete doc1
# Verify deletion

---
## Drill 8: Complex Metadata Query 游댮
**Task:** Use complex where filters

In [None]:
import chromadb

# Create collection with documents having:
# - category: tech/finance/health
# - year: 2020-2024
# - priority: high/medium/low

# Query:
# 1. Category is "tech" AND year >= 2023
# 2. Priority is "high" OR category is "finance"
# 3. Year is NOT 2020

---
## Drill 9: Batch Operations 游리
**Task:** Implement efficient batch add

In [None]:
import chromadb

def batch_add(
    collection,
    documents: list[str],
    ids: list[str],
    batch_size: int = 100
):
    """Add documents in batches."""
    pass

# Test with 500 documents
documents = [f"Document {i}" for i in range(500)]
ids = [f"doc_{i}" for i in range(500)]

client = chromadb.Client()
collection = client.create_collection("batch_test")

batch_add(collection, documents, ids, batch_size=50)
print(f"Total docs: {collection.count()}")

---
## Drill 10: Vector Store Abstraction 游댮
**Task:** Create a unified interface

In [None]:
from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class SearchResult:
    id: str
    content: str
    score: float
    metadata: dict

class VectorStore(ABC):
    @abstractmethod
    def add(self, documents: list[str], ids: list[str], metadatas: list[dict] = None):
        pass

    @abstractmethod
    def search(self, query: str, k: int = 5) -> list[SearchResult]:
        pass

    @abstractmethod
    def delete(self, ids: list[str]):
        pass

class ChromaStore(VectorStore):
    """Implement using ChromaDB."""
    pass

# Test
# store = ChromaStore("test_collection")
# store.add(["Hello", "World"], ["1", "2"])
# results = store.search("greeting", k=1)

---
## Self-Check

- [ ] Can create and query ChromaDB collections
- [ ] Can add metadata and filter queries
- [ ] Can use persistent storage
- [ ] Can use custom embedding functions
- [ ] Can perform batch operations efficiently