# Weaviate Hands-On Guide

> Practical, step-by-step tutorial for mastering Weaviate vector database with 2025 features.

## Prerequisites

```bash
# Install dependencies with uv
uv pip install weaviate-client>=4.17 numpy grpcio sentence-transformers

# Or with pip
pip install weaviate-client>=4.17 numpy grpcio sentence-transformers

# Start local Weaviate instance (v1.27+)
cd gen-ai/weaviate-setup
docker-compose up -d

# Or connect to remote server (see Section 1)
```

## Embedding Model

This notebook uses **sentence-transformers/all-MiniLM-L6-v2**:
- **Size**: 22M parameters (~80MB model)
- **Dimensions**: 384
- **Speed**: ~1000 sentences/sec on CPU
- **Quality**: High MTEB performance for semantic similarity

## Learning Objectives

By the end of this notebook, you will be able to:
1. Connect to Weaviate with production-ready configuration
2. Create collections with advanced vectorizer and quantization settings
3. Perform CRUD operations with error handling
4. Execute vector similarity searches with dynamic ef
5. Use hybrid search with alpha parameter tuning (2025)
6. Apply filters and aggregations
7. Implement batch operations with monitoring
8. Configure RQ quantization for 4x compression (2025)
9. Use AI Agents for intelligent queries (2025 Preview)
10. Build a production-ready RAG system with real embeddings

---

## Section 1: Setup and Connection

Production-ready connection with error handling and gRPC support.

In [None]:
"""Connect to Weaviate with production-ready configuration."""

from typing import List, Dict, Any, Optional
import sys
import time

import weaviate
from weaviate.classes.config import Configure, Property, DataType, VectorDistances
from weaviate.classes.query import Filter, MetadataQuery
import numpy as np

# Connection parameters
WEAVIATE_HOST = "localhost"
WEAVIATE_PORT = 8080
WEAVIATE_GRPC_PORT = 50051  # Required for v4 client

def connect_with_retry(max_retries: int = 3, retry_delay: int = 2) -> weaviate.WeaviateClient:
    """Connect to Weaviate with retry logic.
    
    Args:
        max_retries: Maximum number of connection attempts
        retry_delay: Seconds to wait between retries
        
    Returns:
        Connected Weaviate client
        
    Raises:
        RuntimeError: If connection fails after all retries
    """
    for attempt in range(1, max_retries + 1):
        try:
            client = weaviate.connect_to_local(
                host=WEAVIATE_HOST,
                port=WEAVIATE_PORT,
                grpc_port=WEAVIATE_GRPC_PORT
            )
            
            if client.is_ready():
                return client
                
        except Exception as e:
            if attempt < max_retries:
                print(f"Connection attempt {attempt} failed: {e}")
                print(f"Retrying in {retry_delay} seconds...")
                time.sleep(retry_delay)
            else:
                raise RuntimeError(f"Failed to connect after {max_retries} attempts") from e
    
    raise RuntimeError("Connection failed")

# Connect to Weaviate
try:
    client = connect_with_retry()
    print("SUCCESS: Connected to Weaviate\n")
    
    # Get metadata
    meta = client.get_meta()
    print(f"Weaviate version: {meta['version']}")
    print(f"Available modules: {list(meta['modules'].keys())}")
    
except RuntimeError as e:
    print(f"ERROR: {e}")
    print("\nMake sure Weaviate is running:")
    print("  cd gen-ai/weaviate-setup && docker-compose up -d")
    sys.exit(1)

## Section 2: Create Collection with Advanced Configuration

Configure HNSW indexing and RQ quantization (2025 default in v1.33+).

In [None]:
"""Create collection with advanced HNSW and quantization settings."""

COLLECTION_NAME = "Article"
VECTOR_DIMENSIONS = 384

# Delete collection if it exists
if client.collections.exists(COLLECTION_NAME):
    client.collections.delete(COLLECTION_NAME)
    print(f"Deleted existing '{COLLECTION_NAME}' collection\n")

# Create collection with optimized configuration
articles = client.collections.create(
    name=COLLECTION_NAME,
    
    # Manual vectorization (use text2vec-openai/cohere in production)
    vectorizer_config=Configure.Vectorizer.none(),
    
    # Vector index configuration
    vector_index_config=Configure.VectorIndex.hnsw(
        distance_metric=VectorDistances.COSINE,
        
        # HNSW parameters (tuned for <100K objects)
        max_connections=32,        # Higher for better recall
        ef_construction=128,       # Higher for better index quality
        
        # Dynamic ef (runtime query tuning)
        ef=-1,                     # Default: auto-tuned
        
        # RQ Quantization (2025 default in v1.33+)
        # Enables 4x compression with minimal recall loss
        quantizer=Configure.VectorIndex.Quantizer.rq(
            training_limit=50000,  # Use 50K vectors for training
            segments=3             # Balance compression vs accuracy
        )
    ),
    
    # Enable BM25 for hybrid search
    inverted_index_config=Configure.inverted_index(
        bm25_b=0.75,              # Document length normalization
        bm25_k1=1.2               # Term frequency saturation
    ),
    
    # Schema properties
    properties=[
        Property(
            name="title",
            data_type=DataType.TEXT,
            description="Article title",
            index_searchable=True   # Enable BM25 search
        ),
        Property(
            name="content",
            data_type=DataType.TEXT,
            description="Article content",
            index_searchable=True
        ),
        Property(
            name="author",
            data_type=DataType.TEXT,
            description="Article author",
            index_filterable=True   # Enable filtering
        ),
        Property(
            name="category",
            data_type=DataType.TEXT,
            description="Article category",
            index_filterable=True
        ),
        Property(
            name="views",
            data_type=DataType.INT,
            description="View count",
            index_range_filters=True  # Enable numeric range filters
        )
    ]
)

print(f"SUCCESS: Created '{COLLECTION_NAME}' collection")
print("\nConfiguration:")
print("  - HNSW Index: maxConnections=32, efConstruction=128")
print("  - RQ Quantization: Enabled (4x compression)")
print("  - BM25: Enabled for hybrid search")
print("  - Distance Metric: Cosine")

# Verify collection
if client.collections.exists(COLLECTION_NAME):
    print(f"\nVERIFIED: Collection '{COLLECTION_NAME}' is ready")

## Section 3: Insert Single Object

Insert a single article with vector embedding and error handling.

In [None]:
"""Insert a single article with real embeddings."""

from sentence_transformers import SentenceTransformer

# Initialize embedding model (cached after first run)
print("Loading embedding model: all-MiniLM-L6-v2 (384 dimensions)...")
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
print("Model loaded successfully\n")

def generate_embedding(text: str) -> List[float]:
    """Generate real embedding using sentence-transformers.
    
    Uses all-MiniLM-L6-v2 from MTEB leaderboard:
    - Size: 22M parameters (~80MB)
    - Dimensions: 384
    - Speed: ~1000 sentences/sec on CPU
    - Quality: High performance on semantic similarity tasks
    
    Args:
        text: Input text to embed
        
    Returns:
        384-dimensional embedding vector
    """
    embedding = embedding_model.encode(text, normalize_embeddings=True)
    return embedding.tolist()

# Get collection
articles = client.collections.get(COLLECTION_NAME)

try:
    article_text = "Introduction to Vector Databases. Vector databases are specialized databases designed to store and search high-dimensional vectors efficiently. They are essential for semantic search, recommendation systems, and RAG applications."
    
    # Generate real embedding
    print("Generating embedding for article...")
    vector = generate_embedding(article_text)
    
    # Insert article
    uuid = articles.data.insert(
        properties={
            "title": "Introduction to Vector Databases",
            "content": "Vector databases are specialized databases designed to store and search high-dimensional vectors efficiently. They are essential for semantic search, recommendation systems, and RAG applications.",
            "author": "John Doe",
            "category": "Machine Learning",
            "views": 1250
        },
        vector=vector
    )
    
    print(f"SUCCESS: Inserted article with real embeddings")
    print(f"UUID: {uuid}")
    print(f"Vector dimensions: {len(vector)}")
    print(f"Vector sample: {vector[:5]}...")
    
except Exception as e:
    print(f"ERROR: Insert failed: {e}")

## Section 4: Batch Insert with Monitoring

Efficiently insert multiple objects with error tracking.

In [None]:
"""Batch insert with comprehensive error handling."""

# Sample dataset
sample_articles = [
    {
        "title": "Understanding HNSW Algorithm",
        "content": "HNSW (Hierarchical Navigable Small World) is a graph-based algorithm used for approximate nearest neighbor search in high-dimensional spaces.",
        "author": "Jane Smith",
        "category": "Algorithms",
        "views": 890
    },
    {
        "title": "RAG Systems Explained",
        "content": "Retrieval-Augmented Generation (RAG) combines vector search with large language models to provide contextually relevant and accurate responses.",
        "author": "Alice Johnson",
        "category": "AI",
        "views": 2100
    },
    {
        "title": "Transformer Architecture Deep Dive",
        "content": "Transformers revolutionized NLP with attention mechanisms. They enable models like GPT and BERT to understand context and relationships in text.",
        "author": "Bob Williams",
        "category": "Deep Learning",
        "views": 3400
    },
    {
        "title": "LLM Fine-tuning with LoRA",
        "content": "Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by 10,000x.",
        "author": "John Doe",
        "category": "Machine Learning",
        "views": 1800
    },
    {
        "title": "Embeddings and Vector Space",
        "content": "Embeddings map discrete objects (words, images) into continuous vector spaces where semantic similarity is captured by distance metrics.",
        "author": "Jane Smith",
        "category": "Machine Learning",
        "views": 1500
    },
    {
        "title": "Quantization Techniques for Vector DBs",
        "content": "Quantization reduces memory footprint of vector databases. RQ (Rotational Quantization) offers 4x compression with <1% recall loss, making it ideal for production.",
        "author": "Alice Johnson",
        "category": "Optimization",
        "views": 950
    },
    {
        "title": "Hybrid Search Strategies",
        "content": "Hybrid search combines vector similarity with keyword search (BM25). Tuning the alpha parameter (0=BM25, 1=vector) optimizes for your use case. Start with alpha=0.7 for semantic-heavy tasks.",
        "author": "Bob Williams",
        "category": "Search",
        "views": 1200
    },
    {
        "title": "AI Agents in Vector Databases",
        "content": "AI Agents (2025 Preview) enable intelligent query planning, automatic query transformation, and personalized search results without manual tuning.",
        "author": "Jane Smith",
        "category": "AI",
        "views": 2500
    }
]

# Batch insert with error tracking
articles = client.collections.get(COLLECTION_NAME)
errors = []

try:
    with articles.batch.dynamic() as batch:
        for i, article in enumerate(sample_articles):
            vector = generate_embedding(article["title"] + " " + article["content"])
            batch.add_object(
                properties=article,
                vector=vector
            )
    
    # Check for batch errors
    if batch.failed_objects:
        errors = batch.failed_objects
        print(f"WARNING: {len(errors)} objects failed to insert")
        for error in errors[:5]:  # Show first 5
            print(f"  - {error}")
    
    print(f"\nSUCCESS: Batch inserted {len(sample_articles)} articles")
    
    # Verify count
    result = articles.aggregate.over_all(total_count=True)
    print(f"Total articles in collection: {result.total_count}")
    
except Exception as e:
    print(f"ERROR: Batch insert failed: {e}")

## Section 5: Retrieve Objects

Fetch and display stored articles with pagination.

In [None]:
"""Retrieve articles with pagination."""

articles = client.collections.get(COLLECTION_NAME)

# Fetch objects with offset/limit for pagination
response = articles.query.fetch_objects(
    limit=10,
    offset=0  # Use for pagination: page 2 = offset=10, page 3 = offset=20
)

print(f"Retrieved {len(response.objects)} articles:\n")

for i, obj in enumerate(response.objects, 1):
    print(f"{i}. {obj.properties['title']}")
    print(f"   Author: {obj.properties['author']}")
    print(f"   Category: {obj.properties['category']}")
    print(f"   Views: {obj.properties['views']:,}")
    print(f"   UUID: {obj.uuid}")
    print()

## Section 6: Vector Similarity Search with Dynamic ef

Perform semantic search with runtime query tuning.

In [None]:
"""Vector similarity search with dynamic ef tuning."""

# Generate query vector
query_vector = generate_embedding("vector database semantic search algorithms")

# Search with dynamic ef for higher recall
response = articles.query.near_vector(
    near_vector=query_vector,
    limit=5,
    
    # Return distance and additional metadata
    return_metadata=MetadataQuery(
        distance=True,
        certainty=True  # Similarity score (0-1)
    ),
    
    # Dynamic ef: Higher = better recall but slower
    # Default: -1 (auto-tuned)
    # Manual: 64 (fast), 128 (balanced), 256+ (high recall)
    # autocut=1  # Uncomment to enable autocut (stop at natural relevance gap)
)

print("Top 5 semantically similar articles:\n")

for i, obj in enumerate(response.objects, 1):
    distance = obj.metadata.distance if obj.metadata.distance else 0.0
    certainty = obj.metadata.certainty if obj.metadata.certainty else 0.0
    
    print(f"{i}. {obj.properties['title']}")
    print(f"   Category: {obj.properties['category']}")
    print(f"   Distance: {distance:.4f} | Certainty: {certainty:.2%}")
    print(f"   Content: {obj.properties['content'][:120]}...")
    print()

## Section 7: Hybrid Search with Alpha Tuning (2025)

Combine semantic (vector) and keyword (BM25) search with configurable weighting.

In [None]:
"""Hybrid search with alpha parameter tuning."""

from weaviate.classes.query import HybridFusion

# Search query
search_query = "machine learning algorithms"

# Generate query vector (in production, embed the query)
query_vector = generate_embedding(search_query)

print(f"Search query: '{search_query}'\n")
print("Testing different alpha values:\n")
print("  alpha=0.0: Pure keyword (BM25)")
print("  alpha=0.5: Balanced hybrid")
print("  alpha=0.7: Semantic-heavy (recommended for most tasks)")
print("  alpha=1.0: Pure vector search\n")
print("=" * 80)

# Test different alpha values
for alpha in [0.0, 0.5, 0.7, 1.0]:
    response = articles.query.hybrid(
        query=search_query,
        vector=query_vector,
        
        # Alpha parameter: 0 (BM25) to 1 (vector)
        alpha=alpha,
        
        # Fusion algorithm
        fusion_type=HybridFusion.RELATIVE_SCORE,  # or RANKED
        
        limit=3,
        return_metadata=MetadataQuery(
            score=True,  # Hybrid score
            explain_score=True  # Show scoring breakdown
        )
    )
    
    print(f"\nAlpha = {alpha:.1f}")
    print("-" * 80)
    
    for i, obj in enumerate(response.objects[:2], 1):  # Show top 2
        score = obj.metadata.score if obj.metadata.score else 0.0
        print(f"{i}. {obj.properties['title']}")
        print(f"   Score: {score:.4f}")
        print(f"   Category: {obj.properties['category']}")

## Section 8: Filtered Search

Combine search with attribute filters.

In [None]:
"""Search with attribute filters."""

# Filter by author
response = articles.query.fetch_objects(
    filters=Filter.by_property("author").equal("Jane Smith"),
    limit=10
)

print(f"Articles by Jane Smith: {len(response.objects)}\n")

for obj in response.objects:
    print(f"- {obj.properties['title']}")
    print(f"  Category: {obj.properties['category']}, Views: {obj.properties['views']:,}\n")

In [None]:
"""Complex multi-condition filters."""

# Machine Learning OR AI articles with > 1000 views
response = articles.query.fetch_objects(
    filters=(
        (
            Filter.by_property("category").equal("Machine Learning") |
            Filter.by_property("category").equal("AI")
        ) &
        Filter.by_property("views").greater_than(1000)
    ),
    limit=10
)

print(f"Popular ML/AI articles: {len(response.objects)}\n")

for obj in response.objects:
    print(f"- {obj.properties['title']}")
    print(f"  Category: {obj.properties['category']}")
    print(f"  Views: {obj.properties['views']:,}, Author: {obj.properties['author']}\n")

## Section 9: Vector Search with Filters

Combine semantic similarity with filtering.

In [None]:
"""Filtered vector similarity search."""

query_vector = generate_embedding("machine learning deep learning")

# Find similar ML articles
response = articles.query.near_vector(
    near_vector=query_vector,
    filters=Filter.by_property("category").equal("Machine Learning"),
    limit=5,
    return_metadata=MetadataQuery(distance=True)
)

print("Similar Machine Learning articles:\n")

for i, obj in enumerate(response.objects, 1):
    distance = obj.metadata.distance if obj.metadata.distance else 0.0
    print(f"{i}. {obj.properties['title']}")
    print(f"   Distance: {distance:.4f}")
    print(f"   Author: {obj.properties['author']}, Views: {obj.properties['views']:,}")
    print()

## Section 10: Update Operations

Modify existing objects with validation.

In [None]:
"""Update article properties with validation."""

# Get first article
response = articles.query.fetch_objects(limit=1)

if response.objects:
    article = response.objects[0]
    
    print("Before update:")
    print(f"Title: {article.properties['title']}")
    print(f"Views: {article.properties['views']:,}\n")
    
    # Update view count
    new_views = article.properties['views'] + 100
    
    try:
        articles.data.update(
            uuid=article.uuid,
            properties={"views": new_views}
        )
        
        print("SUCCESS: Updated views count\n")
        
        # Verify update
        updated = articles.query.fetch_object_by_id(article.uuid)
        
        print("After update:")
        print(f"Title: {updated.properties['title']}")
        print(f"Views: {updated.properties['views']:,}")
        
        if updated.properties['views'] == new_views:
            print("\nVERIFIED: Update successful")
        
    except Exception as e:
        print(f"ERROR: Update failed: {e}")
        
else:
    print("No articles found to update")

## Section 11: Aggregations and Analytics

Compute statistics and insights.

In [None]:
"""Aggregate data with grouping and metrics."""

from weaviate.classes.aggregate import GroupByAggregate

# Total count
result = articles.aggregate.over_all(total_count=True)
print(f"Total articles: {result.total_count}\n")

# Group by category with counts
result = articles.aggregate.over_all(
    group_by=GroupByAggregate(prop="category")
)

print("Articles by category:")
for group in result.groups:
    print(f"  {group.grouped_by.value}: {group.total_count} articles")

# Group by author with view statistics
print("\nAuthor statistics:")

result = articles.aggregate.over_all(
    group_by=GroupByAggregate(prop="author")
)

for group in result.groups:
    print(f"  {group.grouped_by.value}: {group.total_count} articles")

## Section 12: Delete Operations

Remove objects with verification.

In [None]:
"""Delete article by UUID with verification."""

# Get article to delete
response = articles.query.fetch_objects(limit=1)

if response.objects:
    article = response.objects[0]
    
    print(f"Deleting: {article.properties['title']}")
    print(f"UUID: {article.uuid}\n")
    
    # Count before deletion
    before = articles.aggregate.over_all(total_count=True)
    
    # Delete
    articles.data.delete_by_id(uuid=article.uuid)
    
    print("SUCCESS: Article deleted\n")
    
    # Verify deletion
    after = articles.aggregate.over_all(total_count=True)
    
    print(f"Articles before: {before.total_count}")
    print(f"Articles after: {after.total_count}")
    
    if after.total_count == before.total_count - 1:
        print("\nVERIFIED: Deletion successful")
        
else:
    print("No articles found to delete")

## Section 13: Quantization Comparison (2025)

Compare different quantization techniques for memory optimization.

In [None]:
"""Compare quantization techniques."""

print("Vector Database Quantization Techniques (2025)\n")
print("=" * 80)

quantization_comparison = {
    "RQ (Rotational Quantization)": {
        "compression": "4x",
        "recall_loss": "<1%",
        "speed_impact": "Minimal",
        "use_case": "Production default (v1.33+)",
        "configuration": "Configure.VectorIndex.Quantizer.rq()"
    },
    "PQ (Product Quantization)": {
        "compression": "8-16x",
        "recall_loss": "2-5%",
        "speed_impact": "Low",
        "use_case": "High compression, large datasets",
        "configuration": "Configure.VectorIndex.Quantizer.pq(segments=96)"
    },
    "BQ (Binary Quantization)": {
        "compression": "32x",
        "recall_loss": "10-15%",
        "speed_impact": "Very Low",
        "use_case": "Maximum compression, memory-constrained",
        "configuration": "Configure.VectorIndex.Quantizer.bq()"
    },
    "SQ (Scalar Quantization)": {
        "compression": "4x",
        "recall_loss": "<2%",
        "speed_impact": "Minimal",
        "use_case": "Fast, simple quantization",
        "configuration": "Configure.VectorIndex.Quantizer.sq()"
    }
}

for method, specs in quantization_comparison.items():
    print(f"\n{method}")
    print("-" * 80)
    print(f"  Compression:    {specs['compression']}")
    print(f"  Recall Loss:    {specs['recall_loss']}")
    print(f"  Speed Impact:   {specs['speed_impact']}")
    print(f"  Use Case:       {specs['use_case']}")
    print(f"  Configuration:  {specs['configuration']}")

print("\n" + "=" * 80)
print("\nRecommendation:")
print("  - Start with RQ (default in v1.33+) for best balance")
print("  - Use PQ for higher compression with acceptable recall loss")
print("  - Use BQ only for extreme memory constraints")
print("  - Always benchmark on your specific dataset")

## Section 14: AI Agents for Intelligent Queries (2025 Preview)

Use AI Agents to automatically optimize queries without manual tuning.

In [None]:
"""Demonstrate AI Agents concepts (2025 Preview)."""

print("Weaviate AI Agents (2025 Preview)\n")
print("=" * 80)

ai_agents = {
    "Query Agent": {
        "description": "Automatically determines optimal query strategy",
        "capabilities": [
            "Selects vector vs hybrid vs keyword search",
            "Auto-tunes alpha parameter for hybrid search",
            "Chooses appropriate filters dynamically",
            "Adjusts ef parameter based on query complexity"
        ],
        "example": "User asks 'machine learning papers' -> Agent chooses hybrid with alpha=0.7"
    },
    "Transformation Agent": {
        "description": "Transforms user queries into optimal search terms",
        "capabilities": [
            "Expands acronyms and abbreviations",
            "Adds synonyms for better coverage",
            "Corrects spelling and grammar",
            "Extracts key concepts from natural language"
        ],
        "example": "'ML algos' -> 'machine learning algorithms'"
    },
    "Personalization Agent": {
        "description": "Personalizes search results based on user context",
        "capabilities": [
            "Learns from user click patterns",
            "Adjusts ranking based on user preferences",
            "Filters content by user role/access",
            "Adapts to user expertise level"
        ],
        "example": "Beginner sees introductory articles first, experts see advanced content"
    }
}

for agent_name, info in ai_agents.items():
    print(f"\n{agent_name}")
    print("-" * 80)
    print(f"{info['description']}\n")
    
    print("Capabilities:")
    for capability in info['capabilities']:
        print(f"  - {capability}")
    
    print(f"\nExample: {info['example']}")

print("\n" + "=" * 80)
print("\nNote: AI Agents are in Preview (March 2025).")
print("Check Weaviate documentation for latest availability and API details.")
print("\nBenefits:")
print("  - Reduces manual query tuning")
print("  - Improves search relevance automatically")
print("  - Adapts to user behavior in real-time")
print("  - Lowers barrier to entry for developers")

## Section 15: Production RAG System

Build a production-ready RAG system with error handling and monitoring.

In [None]:
"""Create knowledge base for production RAG system."""

KB_COLLECTION = "KnowledgeBase"

# Comprehensive knowledge base
knowledge_base = [
    {
        "title": "What is a Vector Database?",
        "content": "A vector database is a specialized database designed to store and query high-dimensional vectors efficiently. Unlike traditional databases that store structured data in rows and columns, vector databases are optimized for similarity search operations. They use algorithms like HNSW (Hierarchical Navigable Small World) to perform fast approximate nearest neighbor searches. Vector databases are essential for applications like semantic search, recommendation systems, image similarity search, and RAG (Retrieval-Augmented Generation) systems.",
        "author": "Tech Expert",
        "category": "Database",
        "views": 500
    },
    {
        "title": "How does RAG work?",
        "content": "Retrieval-Augmented Generation (RAG) combines information retrieval with text generation. First, relevant documents are retrieved from a vector database using semantic search. These documents provide context to a language model, which then generates a response. RAG improves accuracy by grounding responses in factual information rather than relying solely on the model's training data. This reduces hallucinations and enables models to access up-to-date information.",
        "author": "AI Researcher",
        "category": "AI",
        "views": 750
    },
    {
        "title": "What is HNSW?",
        "content": "HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It creates a multi-layer graph where upper layers have sparse long-range connections and lower layers have dense short-range connections. During search, the algorithm starts at the top layer and progressively moves down, refining the search at each level. HNSW offers excellent performance with O(log N) search complexity and high recall rates (>95%). It's the default indexing algorithm in many vector databases including Weaviate.",
        "author": "Algorithm Expert",
        "category": "Algorithms",
        "views": 300
    },
    {
        "title": "Quantization in Vector Databases",
        "content": "Quantization reduces memory footprint by compressing vectors. RQ (Rotational Quantization) is the 2025 default, offering 4x compression with <1% recall loss. PQ (Product Quantization) provides 8-16x compression with 2-5% recall loss. Binary Quantization achieves 32x compression but with 10-15% recall loss. Choose quantization based on memory constraints and acceptable accuracy trade-offs.",
        "author": "Optimization Expert",
        "category": "Optimization",
        "views": 450
    },
    {
        "title": "Hybrid Search Best Practices",
        "content": "Hybrid search combines vector (semantic) and BM25 (keyword) search. The alpha parameter controls the balance: 0 for pure keyword, 1 for pure vector. Start with alpha=0.7 for semantic-heavy tasks. Use alpha=0.5 for balanced search. Use alpha=0.3 for keyword-heavy tasks like exact term matching. Always benchmark on your specific use case and queries.",
        "author": "Search Expert",
        "category": "Search",
        "views": 600
    }
]

# Create collection with production configuration
if client.collections.exists(KB_COLLECTION):
    client.collections.delete(KB_COLLECTION)

kb = client.collections.create(
    name=KB_COLLECTION,
    vectorizer_config=Configure.Vectorizer.none(),
    
    # Production HNSW configuration
    vector_index_config=Configure.VectorIndex.hnsw(
        distance_metric=VectorDistances.COSINE,
        max_connections=64,      # Higher for better recall
        ef_construction=256,     # Higher for better quality
        quantizer=Configure.VectorIndex.Quantizer.rq()
    ),
    
    # Enable hybrid search
    inverted_index_config=Configure.inverted_index(
        bm25_b=0.75,
        bm25_k1=1.2
    ),
    
    properties=[
        Property(name="title", data_type=DataType.TEXT, index_searchable=True),
        Property(name="content", data_type=DataType.TEXT, index_searchable=True),
        Property(name="author", data_type=DataType.TEXT, index_filterable=True),
        Property(name="category", data_type=DataType.TEXT, index_filterable=True),
        Property(name="views", data_type=DataType.INT, index_range_filters=True)
    ]
)

# Insert knowledge base
kb = client.collections.get(KB_COLLECTION)

with kb.batch.dynamic() as batch:
    for article in knowledge_base:
        vector = generate_embedding(article["title"] + " " + article["content"])
        batch.add_object(properties=article, vector=vector)

print(f"SUCCESS: Created production knowledge base")
print(f"Articles: {len(knowledge_base)}")
print(f"Configuration: HNSW (maxConnections=64) + RQ quantization + BM25")

In [None]:
"""Production RAG retrieval with hybrid search and reranking."""

def retrieve_context_production(
    question: str,
    top_k: int = 3,
    alpha: float = 0.7,
    category_filter: Optional[str] = None
) -> Dict[str, Any]:
    """Retrieve relevant context using production-ready RAG.
    
    Args:
        question: User question to answer
        top_k: Number of documents to retrieve
        alpha: Hybrid search parameter (0=BM25, 1=vector)
        category_filter: Optional category filter
        
    Returns:
        Dictionary with context, sources, and metadata
    """
    try:
        # Get collection
        kb = client.collections.get(KB_COLLECTION)
        
        # Generate query vector (use embedding model in production)
        query_vector = generate_embedding(question)
        
        # Build filter if provided
        filters = None
        if category_filter:
            filters = Filter.by_property("category").equal(category_filter)
        
        # Hybrid search with dynamic parameters
        response = kb.query.hybrid(
            query=question,
            vector=query_vector,
            alpha=alpha,
            fusion_type=HybridFusion.RELATIVE_SCORE,
            filters=filters,
            limit=top_k,
            return_metadata=MetadataQuery(
                score=True,
                distance=True
            )
        )
        
        # Format results
        context_parts = []
        sources = []
        
        for i, obj in enumerate(response.objects, 1):
            score = obj.metadata.score if obj.metadata.score else 0.0
            
            context_parts.append(
                f"[Document {i}]\n"
                f"Title: {obj.properties['title']}\n"
                f"Content: {obj.properties['content']}\n"
            )
            
            sources.append({
                "title": obj.properties['title'],
                "author": obj.properties['author'],
                "category": obj.properties['category'],
                "score": float(score),
                "uuid": str(obj.uuid)
            })
        
        return {
            "success": True,
            "context": "\n".join(context_parts),
            "sources": sources,
            "query": question,
            "num_results": len(response.objects),
            "alpha": alpha
        }
        
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "query": question
        }

# Test production RAG
question = "How does vector database quantization work?"

print(f"Question: {question}\n")
print("=" * 80)

result = retrieve_context_production(
    question=question,
    top_k=3,
    alpha=0.7  # Semantic-heavy
)

if result["success"]:
    print(f"\nRetrieved {result['num_results']} relevant documents:\n")
    
    for i, source in enumerate(result["sources"], 1):
        print(f"{i}. {source['title']}")
        print(f"   Author: {source['author']} | Category: {source['category']}")
        print(f"   Score: {source['score']:.4f}")
        print()
    
    print("=" * 80)
    print("\nNext Step: Pass context to LLM for generation")
    print("Example: OpenAI GPT-4, Anthropic Claude, Google Gemini")
    print("\nPrompt Template:")
    print("  'Answer this question based on the context below.\\n\\n'")
    print("  f'Question: {question}\\n\\n'")
    print("  f'Context:\\n{context}\\n\\n'")
    print("  'Answer:'")
    
else:
    print(f"ERROR: {result['error']}")

## Section 16: Performance Monitoring and Optimization

Monitor and optimize query performance.

In [None]:
"""Monitor query performance and optimize."""

import time

def benchmark_search(
    search_type: str,
    iterations: int = 5
) -> Dict[str, float]:
    """Benchmark search performance.
    
    Args:
        search_type: 'vector', 'hybrid', or 'keyword'
        iterations: Number of test iterations
        
    Returns:
        Performance metrics
    """
    kb = client.collections.get(KB_COLLECTION)
    query_vector = generate_embedding(query_text)
    query_text = "vector database search"
    
    timings = []
    
    for _ in range(iterations):
        start = time.time()
        
        if search_type == "vector":
            kb.query.near_vector(
                near_vector=query_vector,
                limit=10
            )
        elif search_type == "hybrid":
            kb.query.hybrid(
                query=query_text,
                vector=query_vector,
                alpha=0.7,
                limit=10
            )
        elif search_type == "keyword":
            kb.query.bm25(
                query=query_text,
                limit=10
            )
        
        elapsed = time.time() - start
        timings.append(elapsed * 1000)  # Convert to ms
    
    return {
        "avg_ms": np.mean(timings),
        "min_ms": np.min(timings),
        "max_ms": np.max(timings),
        "std_ms": np.std(timings)
    }

# Benchmark different search types
print("Search Performance Benchmark\n")
print("=" * 80)

for search_type in ["vector", "hybrid", "keyword"]:
    metrics = benchmark_search(search_type, iterations=5)
    
    print(f"\n{search_type.upper()} SEARCH")
    print(f"  Average: {metrics['avg_ms']:.2f} ms")
    print(f"  Min:     {metrics['min_ms']:.2f} ms")
    print(f"  Max:     {metrics['max_ms']:.2f} ms")
    print(f"  Std Dev: {metrics['std_ms']:.2f} ms")

print("\n" + "=" * 80)
print("\nOptimization Tips:")
print("  1. Enable RQ quantization for 4x faster queries")
print("  2. Tune ef parameter: higher = better recall, lower = faster")
print("  3. Use maxConnections=64 for <1M vectors")
print("  4. Enable connection pooling for high-throughput")
print("  5. Use batch operations for bulk inserts")
print("  6. Monitor query latency with metrics endpoint")

## Section 17: Cleanup

Clean up resources and close connections.

In [None]:
"""Clean up collections and close connection gracefully."""

# Optional: Delete collections (uncomment to clean up)
# collections_to_delete = [COLLECTION_NAME, KB_COLLECTION]
#
# for collection_name in collections_to_delete:
#     if client.collections.exists(collection_name):
#         client.collections.delete(collection_name)
#         print(f"Deleted '{collection_name}' collection")

# Close connection
try:
    client.close()
    print("SUCCESS: Connection closed gracefully")
except Exception as e:
    print(f"WARNING: Error during cleanup: {e}")

## Summary

In this comprehensive hands-on guide, you learned:

### Core Functionality
1. **Connection** - Production-ready connection with retry logic and gRPC
2. **Schema** - Advanced collection configuration with HNSW tuning
3. **CRUD** - Insert, read, update, delete with error handling
4. **Batch Operations** - Efficient bulk operations with monitoring

### Search Capabilities
5. **Vector Search** - Semantic similarity with dynamic ef tuning
6. **Hybrid Search (2025)** - Alpha parameter tuning for optimal results
7. **Filters** - Complex attribute filtering and aggregations

### 2025 Features
8. **RQ Quantization** - 4x compression with minimal recall loss (default in v1.33+)
9. **AI Agents (Preview)** - Intelligent query optimization
10. **Production RAG** - Complete retrieval system with error handling

### Optimization
11. **Performance** - Benchmarking and monitoring
12. **Best Practices** - Production deployment patterns

## Production Checklist

Before deploying to production:

- [ ] Replace mock embeddings with real embedding models (OpenAI, Cohere, HuggingFace)
- [ ] Enable RQ quantization (default in v1.33+)
- [ ] Tune HNSW parameters for your dataset size
- [ ] Configure hybrid search alpha for your use case
- [ ] Implement connection pooling and retry logic
- [ ] Set up monitoring and alerting
- [ ] Enable authentication and authorization
- [ ] Configure backups and disaster recovery
- [ ] Load test with expected traffic
- [ ] Set up multi-node cluster for high availability

## Next Steps

1. **Integrate Real Embeddings** - Use OpenAI text-embedding-3, Cohere embed-v3, or HuggingFace models
2. **Add LLM Integration** - Connect GPT-4, Claude, or Gemini for answer generation
3. **Implement Reranking** - Use Cohere Rerank or custom reranking models
4. **Enable Multi-tenancy** - Isolate data per customer/team
5. **Scale Horizontally** - Deploy multi-node Weaviate cluster
6. **Monitor Performance** - Set up Prometheus + Grafana dashboards
7. **Try AI Agents** - Experiment with Query/Transformation/Personalization Agents (when available)

## Resources

- [Weaviate Complete Guide](weaviate-complete-guide.md) - Comprehensive documentation with architecture diagrams
- [Official Documentation](https://weaviate.io/developers/weaviate) - Latest features and API reference
- [Python Client Docs](https://weaviate.io/developers/weaviate/client-libraries/python) - v4 client documentation
- [HNSW Tuning Guide](https://weaviate.io/developers/weaviate/config-refs/schema/vector-index#hnsw-index-parameters) - Parameter optimization
- [Quantization Guide](https://weaviate.io/developers/weaviate/concepts/vector-quantization) - RQ, PQ, BQ, SQ comparison
- [Production Best Practices](https://weaviate.io/developers/weaviate/installation/cluster) - Deployment patterns
- [Example Applications](https://github.com/weaviate/weaviate-examples) - Real-world use cases

---

**Weaviate Hands-On Guide - 2025 Edition**

*Updated with latest features: RQ quantization, hybrid search alpha tuning, AI Agents preview, and production best practices.*