# Scaling Strategies (ANN)

## Key Scaling Considerations

1. **Speed vs. Accuracy** - Understanding the tradeoffs between query performance and result quality
2. **Resource Limitations** - Managing memory, CPU, and storage constraints
3. **Horizontal Scaling** - Distributing the workload across multiple instances

## Approximate Nearest Neighbor (ANN) Implementations

ANN algorithms like HNSW (Hierarchical Navigable Small World) allow us to trade some accuracy for significant performance improvements at scale. We'll explore different HNSW configurations and their impact on search performance.

# Setup and Initialization

In [1]:
# Install the required packages
!uv pip install accelerate==1.6.0 sentence-transformers==4.0.2

[2mUsing Python 3.12.9 environment at: /workspaces/fundamentals-of-ai-engineering-principles-and-practical-applications-6026542/.venv[0m
[2mAudited [1m2 packages[0m [2min 9ms[0m[0m


In [2]:
import chromadb
from chromadb.utils import embedding_functions
import time

# Initialize ChromaDB client
client = chromadb.Client()
embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
  model_name="all-MiniLM-L6-v2"
)

### Creating Collections with Different HNSW Configurations

We'll create three collections with different index settings:

1. **Default** - Uses ChromaDB's default configuration
2. **High Accuracy** - Prioritizes result quality with higher `ef` and `M` values
3. **Fast Search** - Prioritizes speed with lower `ef` and `M` values

**Parameter Explanation:**
- `hnsw:space`: The distance metric used (cosine, euclidean, etc.)
- `hnsw:construction_ef`: Controls index build quality (higher = better quality, slower build)
- `hnsw:search_ef`: Controls search quality (higher = better quality, slower search)
- `hnsw:M`: Controls the maximum number of connections per node (higher = better quality, more memory)

In [3]:
# Create collections with different HNSW configurations
collections = {}

# 1. Default settings
collections["default"] = client.create_collection(
    name="default_index",
    embedding_function=embedding_function
)

# 2. High accuracy configuration
collections["high_accuracy"] = client.create_collection(
    name="high_accuracy_index",
    embedding_function=embedding_function,
    metadata={"hnsw:space": "cosine", "hnsw:construction_ef": 1000, "hnsw:search_ef": 1250, "hnsw:M": 36}
)

# 3. Fast search configuration
collections["fast_search"] = client.create_collection(
    name="fast_search_index",
    embedding_function=embedding_function,
    metadata={"hnsw:space": "cosine", "hnsw:construction_ef": 80, "hnsw:search_ef": 40, "hnsw:M": 12}
)

### Generating Sample Documents

Now let's create some sample documents across different categories to populate our collections.

In [4]:
# Generate sample data
num_docs = 10000
print(f"Generating {num_docs} sample documents...")

# Create documents with some patterns for testing
categories = ["technology", "science", "health", "business", "entertainment"]
documents = []
ids = []

for i in range(num_docs):
    category = categories[i % len(categories)]
    document = f"This is document {i} about {category} with some additional text to make it more unique."
    documents.append(document)
    ids.append(f"doc_{i}")

print(documents[0:5])

Generating 10000 sample documents...
['This is document 0 about technology with some additional text to make it more unique.', 'This is document 1 about science with some additional text to make it more unique.', 'This is document 2 about health with some additional text to make it more unique.', 'This is document 3 about business with some additional text to make it more unique.', 'This is document 4 about entertainment with some additional text to make it more unique.']


### Adding Documents to Collections

Let's add the generated documents to all three collections.

In [5]:
import time

print("Adding documents to collections with different index configurations...")
for name, collection in collections.items():
    start_time = time.time()

    collection.add(
        documents=documents,
        ids=ids
    )

    end_time = time.time()
    elapsed_time = end_time - start_time

    print(
        f"  Added {num_docs} documents to {name} collection in {elapsed_time:.4f} seconds")

Adding documents to collections with different index configurations...
  Added 10000 documents to default collection in 35.3865 seconds
  Added 10000 documents to high_accuracy collection in 36.5990 seconds
  Added 10000 documents to fast_search collection in 34.1713 seconds


### Benchmark Query Performance

Now let's evaluate how each configuration performs with a set of representative queries.

In [6]:
# Benchmark query performance
print("\nBenchmarking query performance across different configurations...")

# Prepare queries
query_texts = [
    "Latest technology trends in artificial intelligence",
    "Scientific research on climate change",
    "Health benefits of regular exercise",
    "Business strategies for startups",
    "Entertainment news about recent movie releases"
]

# Set up benchmark parameters
results = {}
num_trials = 5


Benchmarking query performance across different configurations...


In [7]:
# Run benchmark for each collection
for name, collection in collections.items():
    print(f"\nTesting {name} configuration:")
    times = []
    
    for query in query_texts:
        query_times = []
        
        for _ in range(num_trials):
            start_time = time.time()
            collection.query(
                query_texts=[query],
                n_results=10
            )
            query_time = time.time() - start_time
            query_times.append(query_time)
        
        avg_time = sum(query_times) / len(query_times)
        times.append(avg_time)
        print(f"  Query: '{query[:30]}...': {avg_time:.4f} seconds")
    
    results[name] = {
        "mean": sum(times) / len(times),
        "min": min(times),
        "max": max(times),
        "times": times
    }


Testing default configuration:
  Query: 'Latest technology trends in ar...': 0.0144 seconds
  Query: 'Scientific research on climate...': 0.0121 seconds
  Query: 'Health benefits of regular exe...': 0.0119 seconds
  Query: 'Business strategies for startu...': 0.0128 seconds
  Query: 'Entertainment news about recen...': 0.0125 seconds

Testing high_accuracy configuration:
  Query: 'Latest technology trends in ar...': 0.0134 seconds
  Query: 'Scientific research on climate...': 0.0148 seconds
  Query: 'Health benefits of regular exe...': 0.0140 seconds
  Query: 'Business strategies for startu...': 0.0128 seconds
  Query: 'Entertainment news about recen...': 0.0129 seconds

Testing fast_search configuration:
  Query: 'Latest technology trends in ar...': 0.0122 seconds
  Query: 'Scientific research on climate...': 0.0115 seconds
  Query: 'Health benefits of regular exe...': 0.0116 seconds
  Query: 'Business strategies for startu...': 0.0116 seconds
  Query: 'Entertainment news about recen

In [8]:
# Print summary of benchmark results
print("\nPerformance Summary:")
for name, metrics in results.items():
    print(f"  {name}: Mean={metrics['mean']:.4f}s, Min={metrics['min']:.4f}s, Max={metrics['max']:.4f}s")


Performance Summary:
  default: Mean=0.0127s, Min=0.0119s, Max=0.0144s
  high_accuracy: Mean=0.0136s, Min=0.0128s, Max=0.0148s
  fast_search: Mean=0.0117s, Min=0.0115s, Max=0.0122s


## Conclusion and Key Takeaways

In this notebook, we've explored practical approaches to scaling vector databases for production use using ANNs:

**ANN Implementations**
   - Configuring HNSW parameters allows for customized speed-accuracy tradeoffs
   - The right configuration depends on your specific application requirements