# Metadata Filtering in Vector Search: Qdrant Demo
#
This notebook demonstrates metadata filtering capabilities in Qdrant,
concluding our series based on the "Metadata Filtering in Vector Search: A Comprehensive Guide for Engineering Leaders".
We'll use the familiar synthetic product dataset and highlight Qdrant's strengths with flexible JSON payloads.
#
**Key Qdrant Concepts Covered:**
- Starting a Qdrant instance (e.g., via Docker).
- Creating a collection with vector configuration.
- Qdrant's schemaless approach to JSON metadata payloads.
- Upserting points (vectors + payloads), including nested structures.
- Creating payload indexes (keyword, text, float, integer) for performance.
- Using Qdrant's rich filtering syntax (`query_filter`) with `must`, `should`, `must_not`.
- Demonstrating filtering on deeply nested payload keys.
#
*For an in-depth analysis of Qdrant's features, its query planner, and how it compares for different organization sizes, please consult our main guide.*

In [3]:
# 1. Setup
# !pip install qdrant-client numpy pandas

from qdrant_client import QdrantClient, models
from qdrant_client.http.models import PointStruct, Distance, VectorParams
from qdrant_client.http.models import PayloadSchemaType # For payload indexing

import numpy as np
import pandas as pd
import uuid # For generating unique IDs if needed
import time

## IMPORTANT: Starting Qdrant

Before running this notebook, ensure you have a Qdrant instance running.
The recommended way for local development is Docker:

```bash
docker run -d --name qdrant-demo \
    -p 6333:6333 \
    -p 6334:6334 \
    qdrant/qdrant:latest
```
This command starts the latest stable Qdrant version. It exposes the REST API on port 6333.

If you have an existing Qdrant instance, ensure it's accessible.

In [7]:
# Initialize Qdrant Client
QDRANT_URL = "http://localhost:6333"

try:
    client = QdrantClient(url=QDRANT_URL)
    print(f"Qdrant client connected successfully to {QDRANT_URL}.")
    # For more details: print(client.get_collections())
except Exception as e:
    print(f"Error connecting to Qdrant at {QDRANT_URL}: {e}")
    print("Please ensure Qdrant is running and accessible.")

Qdrant client connected successfully to http://localhost:6333.


In [8]:
# 2. Data Preparation
# Enhanced synthetic product data with a nested 'specs' field
data = [
    {"id_str": "P001", "product_name": "Smartwatch Series X", "category": "electronics", "brand": "AlphaTech", "price": 299.99, "rating": 4.5, "in_stock": True, "release_year": 2023, "specs": {"display": {"type": "AMOLED", "size_inches": 1.4}, "material": "aluminum", "water_resistant_atm": 5}},
    {"id_str": "P002", "product_name": "Organic Green Tea", "category": "groceries", "brand": "NaturePure", "price": 15.50, "rating": 4.8, "in_stock": True, "release_year": 2022, "specs": {"origin": "Japan", "type": "Sencha", "packaging_grams": 100}},
    {"id_str": "P003", "product_name": "Running Shoes Pro", "category": "apparel", "brand": "FitStride", "price": 120.00, "rating": 4.3, "in_stock": False, "release_year": 2023, "specs": {"type": "neutral", "terrain": "road", "drop_mm": 8}},
    {"id_str": "P004", "product_name": "Wireless Noise-Cancelling Headphones", "category": "electronics", "brand": "AudioMax", "price": 199.50, "rating": 4.7, "in_stock": True, "release_year": 2022, "specs": {"connection": "Bluetooth 5.2", "driver_mm": 40, "battery_hours": 30}},
    {"id_str": "P005", "product_name": "Advanced Yoga Mat", "category": "sports", "brand": "ZenFlow", "price": 45.00, "rating": 4.9, "in_stock": True, "release_year": 2024, "specs": {"material": "TPE", "thickness_mm": 6, "eco_friendly": True}},
    {"id_str": "P006", "product_name": "Smartphone Model Z", "category": "electronics", "brand": "AlphaTech", "price": 799.00, "rating": 4.2, "in_stock": True, "release_year": 2023, "specs": {"display": {"type": "OLED", "size_inches": 6.7}, "storage_gb": 256, "camera_mp": 108}},
]
df = pd.DataFrame(data)

# Qdrant point IDs: using integers mapped from original string IDs for simplicity
df['qdrant_id'] = range(1, len(df) + 1)

vector_dim = 128
df['vector'] = [np.random.rand(vector_dim).tolist() for _ in range(len(df))]

print(f"Prepared {len(df)} items with mock embeddings of dimension {vector_dim}.")
df[['qdrant_id', 'id_str', 'product_name', 'specs']].head(2)

Prepared 6 items with mock embeddings of dimension 128.


Unnamed: 0,qdrant_id,id_str,product_name,specs
0,1,P001,Smartwatch Series X,"{'display': {'type': 'AMOLED', 'size_inches': ..."
1,2,P002,Organic Green Tea,"{'origin': 'Japan', 'type': 'Sencha', 'packagi..."


In [9]:
# 3. Collection Creation
collection_name = "qdrant_product_catalog_v2"

try:
    if client.collection_exists(collection_name=collection_name):
        print(f"Collection '{collection_name}' already exists. Deleting it.")
        client.delete_collection(collection_name=collection_name)
        time.sleep(1)
except Exception as e:
    print(f"Note: Could not check/delete collection (may not exist): {e}")

print(f"Creating collection '{collection_name}'...")
client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(size=vector_dim, distance=models.Distance.COSINE)
)
print(f"Collection '{collection_name}' created successfully.")

Creating collection 'qdrant_product_catalog_v2'...
Collection 'qdrant_product_catalog_v2' created successfully.


In [11]:
# 4. Upserting Data (Points with Payloads, including Nested JSON)
# Qdrant excels with flexible JSON payloads.
points_to_upsert = []
for _, row in df.iterrows():
    payload = {
        "original_id": row["id_str"],
        "product_name": row["product_name"],
        "category": row["category"],
        "brand": row["brand"],
        "price": float(row["price"]),
        "rating": float(row["rating"]),
        "in_stock": bool(row["in_stock"]),
        "release_year": int(row["release_year"]),
        "specs": row["specs"] # Directly pass the nested dictionary
    }
    points_to_upsert.append(
        models.PointStruct(
            id=int(row["qdrant_id"]),
            vector=row["vector"],
            payload=payload
        )
    )

operation_info = client.upsert(collection_name=collection_name, wait=True, points=points_to_upsert)
print(f"Upsert operation status: {operation_info.status}")
time.sleep(1)
print(f"Collection point count: {client.get_collection(collection_name=collection_name).points_count}")

Upsert operation status: completed
Collection point count: 6


In [13]:
# 5. Creating Payload Indexes for Filtering Performance
# As our main guide emphasizes, payload indexing is crucial for filter performance in Qdrant.
print("Creating payload indexes...")
try:
    client.create_payload_index(collection_name=collection_name, field_name="category", field_schema=models.PayloadSchemaType.KEYWORD)
    print("Payload index created for 'category' (keyword for exact matches).")
    
    # Example for a text field with more advanced text search capabilities
    # Use TextIndexParams for text-specific configurations
    client.create_payload_index(
        collection_name=collection_name, 
        field_name="product_name", 
        field_schema=models.TextIndexParams(
            type="text",
            tokenizer=models.TokenizerType.WORD,
            lowercase=True,
            min_token_len=2,
            max_token_len=15
        )
    )
    print("Payload index created for 'product_name' (text for full-text search).")

    # Rest of your index creation code...
    client.create_payload_index(collection_name=collection_name, field_name="price", field_schema=models.PayloadSchemaType.FLOAT)
    print("Payload index created for 'price'.")
    
    client.create_payload_index(collection_name=collection_name, field_name="release_year", field_schema=models.PayloadSchemaType.INTEGER)
    print("Payload index created for 'release_year'.")

    # Indexing a nested field. Use dot notation for the path.
    client.create_payload_index(collection_name=collection_name, field_name="specs.display.type", field_schema=models.PayloadSchemaType.KEYWORD)
    print("Payload index created for nested field 'specs.display.type'.")
    
    # Display available fields for filtering after creating indexes
    collection_info = client.get_collection(collection_name=collection_name)
    print("\nAvailable indexed fields for filtering:")
    for field_name, schema in collection_info.payload_schema.items():
        print(f"- {field_name} - {schema.data_type}")
    
    time.sleep(1)
except Exception as e:
    print(f"Error creating payload index: {e}")

Creating payload indexes...
Payload index created for 'category' (keyword for exact matches).
Payload index created for 'product_name' (text for full-text search).
Payload index created for 'price'.
Payload index created for 'release_year'.
Payload index created for nested field 'specs.display.type'.

Available indexed fields for filtering:
- specs.display.type - keyword
- category - keyword
- release_year - integer
- product_name - text
- price - float


## 6. Metadata Filtering Examples (`query_filter`)
#
Qdrant's `query_filter` uses `must` (AND), `should` (OR), and `must_not` clauses with `FieldCondition`s.
Its strength in handling nested JSON is a key differentiator, as highlighted in our comprehensive guide.

In [22]:
query_vector = np.random.rand(vector_dim).tolist()
TOP_K = 3

def print_qdrant_results(points, query_desc=""):
    print(f"\n--- {query_desc} ---")
    if not points or len(points) == 0:
        print("No results found.")
        return
    
    for point in points:
        # Proper way to access point information
        print(f"  Point ID: {point.id}, Score: {point.score:.4f}")
        
        # Access payload safely
        if hasattr(point, 'payload') and point.payload:
            name = point.payload.get('product_name', 'N/A')
            category = point.payload.get('category', 'N/A') 
            price = point.payload.get('price', 'N/A')
            
            # Handle nested dictionary access safely
            specs = point.payload.get('specs', {})
            display_type = None
            if specs and 'display' in specs and isinstance(specs['display'], dict):
                display_type = specs['display'].get('type', 'N/A')
            
            print(f"  Name: {name}")
            print(f"  Category: {category}, Price: {price}")
            if display_type:
                print(f"  Display Type: {display_type}")
# Example:
filter_cat = models.Filter(must=[models.FieldCondition(key="category", match=models.MatchValue(value="electronics"))])
results_response = client.query_points(
    collection_name=collection_name,
    query=query_vector, 
    query_filter=filter_cat,
    limit=TOP_K, 
    with_payload=True
)

# Access the points list from the response
results_points = results_response.points

# Then pass the points list to your print function
print_qdrant_results(results_points, "Filtering for category 'electronics'")


--- Filtering for category 'electronics' ---
  Point ID: 1, Score: 0.7993
  Name: Smartwatch Series X
  Category: electronics, Price: 299.99
  Display Type: AMOLED
  Point ID: 6, Score: 0.7775
  Name: Smartphone Model Z
  Category: electronics, Price: 799.0
  Display Type: OLED
  Point ID: 4, Score: 0.7536
  Name: Wireless Noise-Cancelling Headphones
  Category: electronics, Price: 199.5


In [23]:
# ### Example 6.7: Filtering on Nested JSON Fields
# As our guide emphasizes, Qdrant excels at filtering on nested JSON structures.
# Let's find electronics that have an "OLED" display type.
# The path to the nested key is `specs.display.type`.

filter_nested = models.Filter(
    must=[
        models.FieldCondition(key="category", match=models.MatchValue(value="electronics")), # Main category
        models.FieldCondition(
            key="specs.display.type",  # Dot notation for nested field
            match=models.MatchValue(value="OLED")
        )
    ]
)
results_nested = client.search(
    collection_name=collection_name,
    query_vector=query_vector,
    query_filter=filter_nested,
    limit=TOP_K,
    with_payload=True # Crucial to see the payload
)
print_qdrant_results(results_nested, "Filtering for 'electronics' with 'OLED' display (nested filter)")


--- Filtering for 'electronics' with 'OLED' display (nested filter) ---
  Point ID: 6, Score: 0.7775
  Name: Smartphone Model Z
  Category: electronics, Price: 799.0
  Display Type: OLED


  results_nested = client.search(


In [33]:
# ### Example 6.8: More Complex Nested Filter with Our Dataset
# Find products with:
# 1. Either large displays (> 6.0 inches) for smartphones
# 2. OR specific materials (aluminum OR TPE) for watches and yoga mats

filter_complex_nested = models.Filter(
    should=[
        # Electronic products with large displays (matches Smartphone Model Z)
        models.Filter(
            must=[
                models.FieldCondition(key="category", match=models.MatchValue(value="electronics")),
                models.FieldCondition(key="specs.display.size_inches", range=models.Range(gt=6.0))
            ]
        ),
        # Products with specific materials (matches Smartwatch and Yoga Mat)
        models.FieldCondition(key="specs.material", match=models.MatchValue(value="aluminum")),
        models.FieldCondition(key="specs.material", match=models.MatchValue(value="TPE"))
    ],
)

results_complex_nested = client.search(
    collection_name=collection_name,
    query_vector=query_vector,
    query_filter=filter_complex_nested,
    limit=TOP_K,
    with_payload=True
)
print_qdrant_results(results_complex_nested, "Filtering for electronics with large displays OR products made of aluminum/TPE")


--- Filtering for electronics with large displays OR products made of aluminum/TPE ---
  Point ID: 1, Score: 0.7993
  Name: Smartwatch Series X
  Category: electronics, Price: 299.99
  Display Type: AMOLED
  Point ID: 5, Score: 0.7892
  Name: Advanced Yoga Mat
  Category: sports, Price: 45.0
  Point ID: 6, Score: 0.7775
  Name: Smartphone Model Z
  Category: electronics, Price: 799.0
  Display Type: OLED


  results_complex_nested = client.search(


In [25]:
# ### Example 6.9: Text Search on Product Names
# Qdrant's text indexing allows for flexible text searches on payload fields
filter_text = models.Filter(
    must=[
        models.FieldCondition(
            key="product_name",
            match=models.MatchText(text="wireless")  # Text search is case-insensitive if configured
        )
    ]
)
results_text = client.search(
    collection_name=collection_name,
    query_vector=query_vector,
    query_filter=filter_text,
    limit=TOP_K,
    with_payload=True
)
print_qdrant_results(results_text, "Text search for products with 'wireless' in the name")


--- Text search for products with 'wireless' in the name ---
  Point ID: 4, Score: 0.7536
  Name: Wireless Noise-Cancelling Headphones
  Category: electronics, Price: 199.5


  results_text = client.search(


## 7. Discussion
#
- **Flexible Payloads & Nested Filtering:** Qdrant's ability to handle schemaless JSON payloads, including deeply nested structures, and filter on them using dot notation is a significant advantage for complex metadata scenarios. This aligns with the flexibility highlighted in our main guide.
- **Rich Filter Syntax:** The `must`/`should`/`must_not` structure allows for expressive and complex logical combinations.
- **Payload Indexing:** Crucial for performance. Qdrant supports indexing various payload types, including `keyword` for exact matches, `text` for more general text search on metadata (with configurable tokenizers), `integer`, `float`, and `geo`.
- **Developer Experience:** The Python client and JSON-like filter objects are generally intuitive.
#
*Our comprehensive guide further details Qdrant's query planning, suitability for different organizational sizes, and its pros and cons.*

In [34]:
# 8. Cleanup
user_confirmation = input(f"Do you want to delete the collection '{collection_name}'? (yes/no): ")
if user_confirmation.lower() == 'yes':
    try:
        if client.collection_exists(collection_name=collection_name):
            client.delete_collection(collection_name=collection_name)
            print(f"Collection '{collection_name}' deleted.")
    except Exception as e:
        print(f"Error deleting collection: {e}")
else:
    print(f"Collection '{collection_name}' was not deleted.")

client.close()
print("Qdrant client closed.")
print("\nRemember to stop the Qdrant Docker container if you started it for this demo:")
print("docker stop qdrant-demo-v192 && docker rm qdrant-demo-v192")

Collection 'qdrant_product_catalog_v2' was not deleted.
Qdrant client closed.

Remember to stop the Qdrant Docker container if you started it for this demo:
docker stop qdrant-demo-v192 && docker rm qdrant-demo-v192


## Conclusion
#
This Qdrant notebook demonstrated its powerful and flexible metadata filtering,
particularly its handling of nested JSON payloads and the utility of payload indexing.
Its rich filter syntax makes it adaptable for various complex querying needs.
#
*To make an informed decision for your AI infrastructure, refer to our main guide for a complete comparison and strategic considerations.*