# Weaviate Vector Database Demo

This notebook demonstrates using **Weaviate** with 100 sample articles.

## Weaviate Key Features
- **GraphQL API** - Query with GraphQL syntax
- **Hybrid Search** - Combine vector + BM25 keyword search
- **Flexible Schema** - Supports lists, dates, and complex types
- **Free Tier** - 14-day sandbox or self-hosted
- **Auto-schema** - Can infer schema from data

In [1]:
# Reload
%reload_ext autoreload
%autoreload 2

## 1. Setup and Imports

In [2]:
import os
import sys
from pathlib import Path
import time

# Add parent directory to path
parent_dir = Path().resolve().parent
sys.path.insert(0, str(parent_dir))

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Import utilities
from utils.embeddings import EmbeddingGenerator
from utils.data_loader import load_articles, get_article_metadata

print("✓ All imports successful")

✓ All imports successful


## 2. Load Embedding Model

Using `sentence-transformers/all-MiniLM-L6-v2` (384 dimensions)

In [3]:
# Initialize embedding model
embedding_model = EmbeddingGenerator()

# Test the model
test_text = "This is a test sentence for embedding generation."
test_embedding = embedding_model.embed_text(test_text)

print(f"  - Embedding dimension: {len(test_embedding)}")
print(f"  - Sample values: {test_embedding[:5]}")

Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
✓ Model loaded successfully. Embedding dimension: 384
  - Embedding dimension: 384
  - Sample values: [0.00306019 0.00200206 0.05544939 0.07702641 0.00857853]


## 3. Load Sample Articles

In [4]:
import json

# Load articles
articles = load_articles("../sample_articles.json")

print(f"\nLoaded {len(articles)} articles")

# preview the first article
print("\nSample article:")
print(json.dumps(articles[0], indent=2))

Loaded 100 articles from ../sample_articles.json

Loaded 100 articles

Sample article:
{
  "id": 1212,
  "item_source": "MY_GRAND_CANYON",
  "item_title": "Make This Your First Stop Grand Canyon Stop",
  "item_subtitle": "Lunch, souvenirs, tour info and an immersive big screen experience.",
  "body_content": "Nothing can properly prepare you for the heart-pumping magnificence of the Grand Canyon\u2014except maybe a visit to the Grand Canyon Visitor Center and IMAX in Tusayan (Too-Say-Ann), located just 1.5 miles from the South Entrance to the national park. This is where your journey into one of the world\u2019s most awe-inspiring landscapes begins.\nSee the IMAX movie &#8220;The Grand Canyon: Rivers of Time&#8221; (Photo courtesy Grand Canyon Visitors Center/IMAX)\nOn the giant, six-story IMAX screen, catch the movie Grand Canyon: Rivers of Time, which won a 2023 award for best visual effects from the Giant Screen Cinema Association. In 37 breathtaking minutes, you\u2019ll be transpor

## 4. Connect to Weaviate

Using Weaviate Cloud Services (WCS) or local instance

In [10]:
import weaviate
from weaviate.classes.init import Auth
from weaviate.classes.config import Configure, VectorDistances, Property, DataType
from weaviate.classes.query import MetadataQuery


WEAVIATE_URL = os.getenv("WEAVIATE_URL")
WEAVIATE_API_KEY = os.getenv("WEAVIATE_API_KEY")

# Connect to Weaviate
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WEAVIATE_URL,
    auth_credentials=Auth.api_key(WEAVIATE_API_KEY)
)

print(client.is_ready())
#client.close()

True


## 5. Create or Get Collection (Class)

Weaviate uses "classes" (similar to collections). We'll define a schema with proper types.

**Note on Weaviate's DATE type:**  
Weaviate's `DATE` datatype supports full RFC 3339 datetime with timezone (e.g., `"2025-10-16T14:09:00Z"`), not just dates. This is similar to Qdrant's native datetime support.

In [11]:
COLLECTION_NAME = "Article"

# Check if collection exists
try:
    if client.collections.exists(COLLECTION_NAME):
        print(f"Collection '{COLLECTION_NAME}' already exists")
        
        # Get existing collection
        articles_collection = client.collections.get(COLLECTION_NAME)
        
        # Get count
        response = articles_collection.aggregate.over_all(total_count=True)
        count = response.total_count if response.total_count else 0
        
        print(f"✓ Using existing collection: {COLLECTION_NAME}")
        print(f"  - Current count: {count} articles")
        
        # Ask user if they want to delete and recreate
        recreate = input("\nDo you want to delete and recreate? (y/n): ").lower().strip()
        if recreate == 'y':
            client.collections.delete(COLLECTION_NAME)
            print(f"✓ Deleted collection: {COLLECTION_NAME}")
            raise Exception("Recreate collection")
    else:
        raise Exception("Collection does not exist")
        
except Exception as e:
    # Create new collection with schema
    articles_collection = client.collections.create(
        name=COLLECTION_NAME,
        description="Outside articles with embeddings",
        vectorizer_config=None,  # We'll provide our own vectors
        properties=[
            Property(
                name="article_id",
                data_type=DataType.INT,
                description="Unique article identifier"
            ),
            Property(
                name="title",
                data_type=DataType.TEXT,
                description="Article title",
                index_searchable=True  # Enable BM25 search on title
            ),
            Property(
                name="subtitle",
                data_type=DataType.TEXT,
                description="Article subtitle",
                index_searchable=True  # Enable BM25 search on subtitle
            ),
            Property(
                name="body_content",
                data_type=DataType.TEXT,
                description="Full article content",
                index_searchable=True  # Enable BM25 search on content
            ),
            Property(
                name="source",
                data_type=DataType.TEXT,
                description="Article source",
                index_filterable=True  # Enable filtering on source
            ),
            Property(
                name="category",
                data_type=DataType.TEXT,
                description="Article category",
                index_filterable=True  # Enable exact match filtering
            ),
            Property(
                name="tags",
                data_type=DataType.TEXT_ARRAY,  # Weaviate supports arrays!
                description="Article tags",
                index_filterable=True  # Enable filtering on tags array
            ),
            Property(
                name="evergreen",
                data_type=DataType.BOOL,
                description="Whether article is evergreen content",
                index_filterable=True  # Enable boolean filtering
            ),
            Property(
                name="url",
                data_type=DataType.TEXT,
                description="Article URL"
            ),
            Property(
                name="created_at",
                data_type=DataType.DATE,  # Weaviate supports native datetime (RFC 3339)
                description="Article creation timestamp",
                index_range_filterable=True  # Enable range filtering (>=, <=, etc.)
            ),
        ],
        # Configure vector index
        vector_index_config=Configure.VectorIndex.hnsw(
            distance_metric=VectorDistances.COSINE
        )
    )
    
    print(f"✓ Created new collection: {COLLECTION_NAME}")
    print(f"  - Vector dimensions: 384")
    print(f"  - Distance metric: cosine")
    print(f"  - Schema: 10 properties (including text arrays and datetime)")
    print(f"  - Indexed fields: category, source, tags (filterable), created_at (range filterable)")

✓ Created new collection: Article
  - Vector dimensions: 384
  - Distance metric: cosine
  - Schema: 10 properties (including text arrays and datetime)
  - Indexed fields: category, source, tags (filterable), created_at (range filterable)


## 6. Generate Embeddings and Insert Data

Process articles in batches, matching the approach used in Chroma notebook

In [12]:
from tqdm.auto import tqdm

# Check current count
response = articles_collection.aggregate.over_all(total_count=True)
current_count = response.total_count if response.total_count else 0


# Process in batches - same approach as other notebooks
BATCH_SIZE = 20
total_articles = len(articles)

print(f"Processing {total_articles} articles in batches of {BATCH_SIZE}...\n")

start_time = time.time()

for i in tqdm(range(0, total_articles, BATCH_SIZE), desc="Inserting batches"):
    batch = articles[i:i + BATCH_SIZE]

    # Generate embeddings for batch - same as other notebooks
    texts = [
        f"Title: {a['item_title']}\nSubtitle: {a.get('item_subtitle', '')}\nContent: {a['body_content'][:500]}"
        for a in batch
    ]
    embeddings = embedding_model.embed_batch(texts, show_progress=False)

    # Prepare data for Weaviate
    with articles_collection.batch.dynamic() as batch_insert:
        for article, embedding in zip(batch, embeddings):
            # Get metadata - use "weaviate" to get native datetime
            metadata = get_article_metadata(article, db_type="weaviate")

            # Prepare properties for Weaviate
            properties = {
                "article_id": metadata["id"],
                "title": metadata["title"],
                "subtitle": metadata["subtitle"],
                "body_content": article.get("body_content", ""),
                "source": metadata["source"],
                "category": metadata["category"],
                "tags": metadata["tags"],  # Keep as list - Weaviate supports it!
                "evergreen": metadata["evergreen"],
                "url": metadata["url"],
                "created_at": metadata["created_at"],  # Native datetime object
            }

            # Add object with vector
            batch_insert.add_object(
                properties=properties,
                vector=embedding.tolist()
            )

elapsed_time = time.time() - start_time

# Verify count
response = articles_collection.aggregate.over_all(total_count=True)
final_count = response.total_count if response.total_count else 0

print(f"\n✓ Successfully inserted {total_articles} articles")
print(f"  - Time taken: {elapsed_time:.2f} seconds")
print(f"  - Average: {elapsed_time/total_articles:.2f} seconds per article")
print(f"  - Collection count: {final_count}")

Processing 100 articles in batches of 20...



Inserting batches:   0%|          | 0/5 [00:00<?, ?it/s]


✓ Successfully inserted 100 articles
  - Time taken: 1.80 seconds
  - Average: 0.02 seconds per article
  - Collection count: 100


## 7. Semantic Search - Basic Query

In [14]:
# Test query
query_text = "Most haunted hikes in the US"

print(f"Query: '{query_text}'\n")

# Generate query embedding
query_embedding = embedding_model.embed_text(query_text)

# Search with vector
response = articles_collection.query.near_vector(
    near_vector=query_embedding.tolist(),
    limit=5,
    return_metadata=MetadataQuery(distance=True)
)

print(f"Top {len(response.objects)} results:\n")
for i, obj in enumerate(response.objects):
    props = obj.properties
    distance = obj.metadata.distance if obj.metadata.distance else 0.0
    
    print(f"{i+1}. {props['title'][:70]}...")
    print(f"   Category: {props['category']} | Source: {props['source']}")
    print(f"   Distance: {distance:.4f}")

Query: 'Most haunted hikes in the US'

Top 5 results:

1. 13 of the Most Haunted Hikes in the U.S....
   Category: Destinations | Source: OUTSIDE
   Distance: 0.1942
2. A Missing Dog Helped a Stranded Hiker Return to Shadow Mountain Trail....
   Category: Hiking | Source: OUTSIDE
   Distance: 0.5422
3. An Inside Look at Outside’s 2025 Winter Editors’ Choice Testing Trip...
   Category: Gear | Source: OUTSIDE
   Distance: 0.6340
4. Two Hikers in British Columbia Were Hospitalized After a Grizzly Sow A...
   Category: Hiking | Source: OUTSIDE
   Distance: 0.6636
5. Why Does Washington Have So Many Climbing Accidents? A Mountain Rescue...
   Category: Skills | Source: CLIMBING
   Distance: 0.6637


## 8. Metadata Filtering - Category

Filter by category field

In [15]:
from weaviate.classes.query import Filter

query_text = "Women's Ironman World Championship"
target_category = "News"

print(f"Query: '{query_text}'")
print(f"Filter: category = '{target_category}'\n")

query_embedding = embedding_model.embed_text(query_text)

# Search with category filter
response = articles_collection.query.near_vector(
    near_vector=query_embedding.tolist(),
    limit=5,
    filters=Filter.by_property("category").equal(target_category),
    return_metadata=MetadataQuery(distance=True)
)

print(f"Top 5 Results (Category: {target_category}):\n")
for i, obj in enumerate(response.objects):
    props = obj.properties
    distance = obj.metadata.distance if obj.metadata.distance else 0.0
    
    print(f"{i+1}. {props['title'][:70]}...")
    print(f"   Category: {props['category']} | Source: {props['source']}")
    print(f"   Distance: {distance:.4f}")

Query: 'Women's Ironman World Championship'
Filter: category = 'News'

Top 5 Results (Category: News):

1. After Joy of Women's-Only Ironman World Championship, Grief Sets In...
   Category: News | Source: TRIATHLETE
   Distance: 0.2230
2. What a Race! Here's Where the Ironman Pro Series Stands After the Iron...
   Category: News | Source: TRIATHLETE
   Distance: 0.3536
3. The Fastest Shoes at 2025 Ironman World Championship Kona...
   Category: News | Source: TRIATHLETE
   Distance: 0.3818
4. The DNF Files: 2025 Ironman World Championship Kona...
   Category: News | Source: TRIATHLETE
   Distance: 0.4083
5. In Sweltering Conditions, Norway’s Solveig Løvseth Takes 2025 Ironman ...
   Category: News | Source: TRIATHLETE
   Distance: 0.4381


## 9. Metadata Filtering - Date Range

Weaviate supports native date comparisons!

In [18]:
from datetime import datetime, timezone
query_text = "cycling deals"
# Set the timezone for avoidance of doubt (otherwise the client will emit a warning)
cutoff_dt = datetime(2025, 10, 8).replace(tzinfo=timezone.utc)

print(f"Query: '{query_text}'")
print(f"Filter: created_at >= {cutoff_dt}\n")

query_embedding = embedding_model.embed_text(query_text)


# Search with date filter using native date comparison
response = articles_collection.query.near_vector(
    near_vector=query_embedding.tolist(),
    limit=5,
    filters=Filter.by_property("created_at").greater_or_equal(cutoff_dt),
    return_metadata=MetadataQuery(distance=True)
)

print(f"Found {len(response.objects)} results created after {cutoff_dt}:\n")
for i, obj in enumerate(response.objects):
    props = obj.properties
    distance = obj.metadata.distance if obj.metadata.distance else 0.0
    created = props['created_at']
    
    print(f"{i+1}. {props['title'][:70]}...")
    print(f"   Category: {props['category']} | Created: {created}")
    print(f"   Distance: {distance:.4f}")

Query: 'cycling deals'
Filter: created_at >= 2025-10-08 00:00:00+00:00

Found 5 results created after 2025-10-08 00:00:00+00:00:

1. Opinion: Cycling's Soccer-Inspired Relegation System Is a Hot Mess Tha...
   Category: Road Racing | Created: 2025-10-16 05:42:10+00:00
   Distance: 0.4996
2. Deal: Tailwind Endurance Fuel Is the Cycling Nutrition I Actually Use...
   Category: Road Gear | Created: 2025-10-13 11:30:52+00:00
   Distance: 0.5120
3. Pogačar's Bonuses and Brand Deals Revealed: Inside His $14 Million Pay...
   Category: Road Racing | Created: 2025-10-14 03:39:12+00:00
   Distance: 0.5273
4. Shop Evo's Anniversary Sale and Save up to 50% on Ski, Snowboard, and ...
   Category: Gear News | Created: 2025-10-14 10:53:27+00:00
   Distance: 0.5604
5. Deal: One of the Best Headphones for Cycling Is 50% Off...
   Category: Road Gear | Created: 2025-10-15 12:12:34+00:00
   Distance: 0.5671


## 10. Combined Filters - Evergreen + Date

Combine multiple filters with AND logic

In [21]:
query_text = "Halloween outdoor activities"
cutoff_date = "2025-10-09"

print(f"Query: '{query_text}'")
print(f"Filters:")
print(f"  - evergreen = True (timeless content)")
print(f"  - created_at >= '{cutoff_date}'\n")

# Generate query embedding
query_embedding = embedding_model.embed_text(query_text)

# Parse date for filter
cutoff_dt = datetime(2025, 10, 8).replace(tzinfo=timezone.utc)

# Combine filters with AND
combined_filter = Filter.all_of([
    Filter.by_property("evergreen").equal(True),
    Filter.by_property("created_at").greater_or_equal(cutoff_dt)
])

# Search with combined filters
response = articles_collection.query.near_vector(
    near_vector=query_embedding.tolist(),
    limit=10,
    filters=combined_filter,
    return_metadata=MetadataQuery(distance=True)
)

if response.objects:
    print(f"Found {len(response.objects)} evergreen articles created after {cutoff_date}:\n")
    for i, obj in enumerate(response.objects):
        props = obj.properties
        distance = obj.metadata.distance if obj.metadata.distance else 0.0
        tags = props.get('tags', [])
        created_at = props['created_at']

        print(f"{i+1}. {props['title'][:70]}...")
        print(f"   Category: {props['category']} | Evergreen: {props['evergreen']}")
        print(f"   Tags: {', '.join(tags) if tags else 'No tags'}")
        print(f"   Distance: {distance:.4f}")
        print(f"   Created at: {created_at}")
    print(f"\nTotal results: {len(response.objects)}")
else:
    print("No evergreen articles found after this date.")


Query: 'Halloween outdoor activities'
Filters:
  - evergreen = True (timeless content)
  - created_at >= '2025-10-09'

Found 10 evergreen articles created after 2025-10-09:

1. 13 of the Most Haunted Hikes in the U.S....
   Category: Destinations | Evergreen: True
   Tags: evergreen, Halloween, Hiking
   Distance: 0.5509
   Created at: 2025-10-16 11:22:41+00:00
2. The Thule Outset Hitch-Mounted Tent Turns Your Car Into a Campsite on ...
   Category: Camping | Evergreen: True
   Tags: 2025 Gear Reviews, Car Camping, Car Racks, Commerce, evergreen
   Distance: 0.7477
   Created at: 2025-10-14 10:30:11+00:00
3. The Best Daypacks for Every Kind of Hiker (2025)...
   Category: Daypacks | Evergreen: True
   Tags: 2025 Gear Reviews, 2025 Summer Gear Guide, backpack, Commerce, Day Packs
   Distance: 0.7666
   Created at: 2025-10-16 11:31:44+00:00
4. Everything You Need To Know Before Skiing Telluride For The First Time...
   Category: Resort Skiing | Evergreen: True
   Tags: evergreen, Telluri

## 11. Hybrid Search (Vector + BM25)

**Weaviate's unique feature**: Combine semantic vector search with keyword BM25 search

In [22]:
from weaviate.classes.query import HybridFusion

query_text = "Grand Canyon hiking"
print(f"Hybrid Query: '{query_text}'")
print(f"Combining: Vector search (semantic) + BM25 search (keyword)\n")

query_embedding = embedding_model.embed_text(query_text)

# Hybrid search: combines vector similarity + keyword matching
response = articles_collection.query.hybrid(
    query=query_text,  # Used for BM25 keyword search
    vector=query_embedding.tolist(),  # Used for vector search
    alpha=0.5,  # Balance: 0=pure BM25, 1=pure vector, 0.5=balanced
    fusion_type=HybridFusion.RANKED,  # Ranked fusion algorithm
    limit=5,
    return_metadata=MetadataQuery(score=True)
)

print(f"Top {len(response.objects)} hybrid results (semantic + keyword):\n")
for i, obj in enumerate(response.objects):
    props = obj.properties
    score = obj.metadata.score if obj.metadata.score else 0.0
    
    print(f"{i+1}. {props['title'][:70]}...")
    print(f"   Category: {props['category']} | Source: {props['source']}")
    print(f"   Hybrid Score: {score:.4f}")
    print(f"   Preview: {props['body_content'][:150]}...")

Hybrid Query: 'Grand Canyon hiking'
Combining: Vector search (semantic) + BM25 search (keyword)

Top 5 hybrid results (semantic + keyword):

1. Make This Your First Stop Grand Canyon Stop...
   Category: Attractions | Source: MY_GRAND_CANYON
   Hybrid Score: 0.0165
   Preview: Nothing can properly prepare you for the heart-pumping magnificence of the Grand Canyon—except maybe a visit to the Grand Canyon Visitor Center and IM...
2. Tusayan is Your Launchpad to Grand Adventures...
   Category: Gateway Towns | Source: MY_GRAND_CANYON
   Hybrid Score: 0.0165
   Preview: The town of Tusayan (pronounced Too-Say-An) is like the welcome committee for the Grand Canyon as the closest incorporated town to the South Rim; it i...
3. She Became the First Woman to Complete This 3,600-Mile Thru-Hike—and B...
   Category: Hiking | Source: OUTSIDE
   Hybrid Score: 0.0156
   Preview: Jessica “Stitches” Guo began her 30th birthday alone, in the woods, walking north towards the Canadian border.
It was the 

## 12. Performance Summary

In [23]:
from utils.benchmark import benchmark_queries

# Define query function for Weaviate
def weaviate_query_fn(query_text: str):
    """Query function for Weaviate benchmarking."""
    query_embedding = embedding_model.embed_text(query_text)
    return articles_collection.query.near_vector(
        near_vector=query_embedding.tolist(),
        limit=10
    )

# Run standardized benchmark
results = benchmark_queries(weaviate_query_fn)

# Additional benchmarks for Weaviate-specific features
print("\n" + "="*50)
print("Weaviate-Specific Performance:")
print("="*50 + "\n")

# Hybrid search benchmark
test_query = "outdoor camping gear"
query_embedding = embedding_model.embed_text(test_query)

times = []
for _ in range(5):
    start = time.time()
    response = articles_collection.query.hybrid(
        query=test_query,
        vector=query_embedding.tolist(),
        alpha=0.5,
        limit=10
    )
    times.append(time.time() - start)
avg_time = sum(times) / len(times)
print(f"Hybrid Search (Vector + BM25): {avg_time*1000:.1f}ms")

# Get collection stats
response = articles_collection.aggregate.over_all(total_count=True)
total_count = response.total_count if response.total_count else 0

print(f"\nCollection Statistics:")
print(f"  - Total articles: {total_count}")
print(f"  - Vector dimensions: 384")
print(f"  - Distance metric: cosine")

Running performance benchmark...

'outdoor hiking adventures' -> 61.5ms
'cycling race performance' -> 48.0ms
'travel destinations and tips' -> 45.2ms
'fitness training techniques' -> 55.5ms
'gear reviews and recommendations' -> 48.7ms

Performance Summary:
  - Average query time: 51.7ms
  - Min query time: 45.2ms
  - Max query time: 61.5ms

Weaviate-Specific Performance:

Hybrid Search (Vector + BM25): 44.0ms

Collection Statistics:
  - Total articles: 100
  - Vector dimensions: 384
  - Distance metric: cosine


## 14. Cleanup (Optional)

In [24]:
# Close connection
client.close()
print("✓ Disconnected from Weaviate")

✓ Disconnected from Weaviate


## Key Takeaways - Weaviate

### ✅ Strengths
1. **GraphQL API** - Powerful, flexible query language
2. **Hybrid Search** - Unique combination of vector + BM25 keyword search
3. **Rich Schema Support** - Native datetime (RFC 3339), arrays, complex types
4. **Flexible Filtering** - Easy to combine multiple filters with AND/OR logic
5. **Index Configuration** - Granular control over searchable, filterable, and range-filterable fields
6. **Good Performance** - Fast queries even with complex filters
7. **BM25 Integration** - Built-in keyword search alongside semantic search

### ⚠️ Considerations
1. **Free Tier Limited** - 14-day sandbox, then requires self-hosting or paid plan
2. **Schema Required** - Must define schema upfront (though can be auto-generated)
3. **Learning Curve** - GraphQL and schema concepts to learn
4. **Client Complexity** - More complex than schema-less alternatives
5. **Index Planning** - Must declare `index_filterable` and `index_range_filterable` upfront

### 🎯 Best For
- **Hybrid search needs** - When you want both semantic and keyword matching
- **Complex schemas** - Rich metadata with datetime, arrays, relationships
- **GraphQL users** - If you're familiar with GraphQL
- **Enterprise applications** - Production-grade features and performance

### 📊 Comparison Notes
- **vs Chroma**: More features (hybrid search, GraphQL) but more complex
- **vs Qdrant**: Similar flexibility and native datetime support, but different query paradigm (GraphQL vs REST)
- **vs Pinecone**: More flexible schema, hybrid search, native datetime, but free tier time-limited
- **vs Milvus**: Easier to use, better for smaller-medium scale

### 💡 Unique Weaviate Features
1. **Hybrid Search** - Seamless vector + BM25 keyword search combination
2. **GraphQL API** - Query using GraphQL syntax
3. **Native Datetime** - RFC 3339 datetime with timezone support (similar to Qdrant)
4. **Text Arrays** - First-class support for array properties
5. **Index Control** - Fine-grained control over which fields are searchable, filterable, or range-filterable

### 📝 Schema Indexing Notes
Weaviate requires explicit index configuration for filtering:
- **`index_filterable: True`** - Enable exact match filtering (for TEXT, BOOL, TEXT_ARRAY)
- **`index_range_filterable: True`** - Enable range filtering (>=, <=, etc.) for DATE, INT, NUMBER
- **`index_searchable: True`** - Enable BM25 keyword search on TEXT fields

Examples:
```python
Property(name="category", data_type=DataType.TEXT, index_filterable=True)
Property(name="tags", data_type=DataType.TEXT_ARRAY, index_filterable=True)
Property(name="created_at", data_type=DataType.DATE, index_range_filterable=True)
Property(name="title", data_type=DataType.TEXT, index_searchable=True)
```

### 🗓️ Date/Datetime Comparison Across DBs
- **Chroma/Milvus/Pinecone**: Timestamps only (INT64)
- **Weaviate**: Native RFC 3339 datetime with timezone (DATE datatype)
- **Qdrant**: Native datetime objects (similar to Weaviate)