# OpenSearch Hybrid Search Use Cases Demonstration

This notebook demonstrates comprehensive hybrid search capabilities in OpenSearch 3.0+, including:
- Basic hybrid search combining keyword and semantic search
- Sorting and pagination strategies
- Advanced filtering techniques (collapse, post-filtering)
- Aggregations on hybrid results
- Inner hits for nested document retrieval
- Query explanation for debugging and optimization

**Documentation References:**
- [Hybrid Search Index](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/index/)
- [Sorting with Hybrid Queries](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/sorting/)
- [Pagination in Hybrid Search](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/pagination/)
- [Search After for Hybrid Queries](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/search-after/)
- [Collapse in Hybrid Search](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/collapse/)
- [Post-Filtering in Hybrid Search](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/post-filtering/)
- [Aggregations with Hybrid Search](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/aggregations/)
- [Inner Hits in Hybrid Search](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/inner-hits/)
- [Explain for Hybrid Queries](https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/explain/)


## üéØ Hybrid Search Architecture Overview

```mermaid
graph TB
    subgraph "üì• INPUT LAYER"
        A["üë§ User Query<br/>Text Input"]
    end
    
    subgraph "üîÑ PROCESSING LAYER"
        B["‚öôÔ∏è Split Query"]
        C["üìù Keyword<br/>Processing"]
        D["üß† Vector<br/>Embedding"]
        E["Model: all-MiniLM-L6-v2<br/>384 dimensions"]
    end
    
    subgraph "üîç SEARCH LAYER"
        F["üîé BM25 Search<br/>Keyword Matching"]
        G["üßÆ KNN Search<br/>Vector Similarity"]
    end
    
    subgraph "üìä RESULT PROCESSING"
        H["üìà Min-Max<br/>Normalization"]
        I["‚öñÔ∏è Weighted<br/>Combination<br/>30% Keyword + 70% Vector"]
        J["üè∑Ô∏è Sorting, Filtering,<br/>Pagination, Collapse"]
        K["üìã Aggregations &<br/>Analytics"]
    end
    
    subgraph "üéÅ OUTPUT LAYER"
        L["‚ú® Ranked Results<br/>Semantic + Keyword Matched"]
    end
    
    A --> B
    B --> C
    B --> D
    D --> E
    C --> F
    E --> G
    F --> H
    G --> H
    H --> I
    I --> J
    J --> K
    K --> L
    
    style A fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style B fill:#4ECDC4,stroke:#0B7285,color:#fff
    style C fill:#FFD93D,stroke:#D4A500,color:#000
    style D fill:#6C5CE7,stroke:#5F3DC4,color:#fff
    style E fill:#A29BFE,stroke:#7950F2,color:#fff
    style F fill:#FFA502,stroke:#D9480F,color:#fff
    style G fill:#00B894,stroke:#00704A,color:#fff
    style H fill:#74B9FF,stroke:#0984E3,color:#fff
    style I fill:#FD79A8,stroke:#E84393,color:#fff
    style J fill:#9B59B6,stroke:#6C3483,color:#fff
    style K fill:#16A085,stroke:#0D5D52,color:#fff
    style L fill:#27AE60,stroke:#1E8449,color:#fff
```

### üéì What You'll Learn in This Notebook:

| # | Use Case | Description | Query Type |
|---|----------|-------------|-----------|
| 1Ô∏è‚É£ | **Basic Hybrid Search** | Combine BM25 keyword + vector semantic search | Match + Neural |
| 2Ô∏è‚É£ | **Sorting** | Override relevance ranking with field values | Sort by price, rating, etc. |
| 3Ô∏è‚É£ | **Pagination** | Navigate large result sets in pages | From/Size with pagination_depth |
| 4Ô∏è‚É£ | **Search After** | Cursor-based pagination for deep results | Efficient scrolling |
| 5Ô∏è‚É£ | **Collapse** | Group by field and show 1 top result per group | Deduplication |
| 6Ô∏è‚É£ | **Post-Filtering** | Filter AFTER scoring (doesn't impact relevance) | Narrow after ranking |
| 7Ô∏è‚É£ | **Aggregations** | Compute statistics on search results | Facets, counts, ranges |
| 8Ô∏è‚É£ | **Inner Hits** | Get related documents within result groups | Context retrieval |
| 9Ô∏è‚É£ | **Explain** | Debug scoring to understand result ranking | Score analysis |

---


## üê≥ Docker Setup
- **If docker compose up fails , start it manually from shell**

In [None]:
%%bash
cd ../
echo "üöÄ Starting fully optimized OpenSearch cluster..."

# Start the optimized cluster
docker compose -f docker-compose-fully-optimized.yml down -v
docker compose -f docker-compose-fully-optimized.yml up -d

# Wait for startup
echo "‚è≥ Waiting for cluster to initialize..."
sleep 45

# Check cluster health
echo "üè• Checking cluster health..."
curl -k -u admin:Developer@123 https://localhost:9200/_cluster/health?pretty

üöÄ Starting fully optimized OpenSearch cluster...


 Container opensearch-optimized-node1  Stopping
 Container opensearch-optimized-node2  Stopping
 Container opensearch-optimized-dashboards  Stopping
 Container opensearch-optimized-dashboards  Stopped
 Container opensearch-optimized-dashboards  Removing
 Container opensearch-optimized-node1  Stopped
 Container opensearch-optimized-node1  Removing
 Container opensearch-optimized-node2  Stopped
 Container opensearch-optimized-node2  Removing
 Container opensearch-optimized-dashboards  Removed
 Container opensearch-optimized-node1  Removed
 Container opensearch-optimized-node2  Removed
 Volume ai_search_opensearch-optimized-data2  Removing
 Network ai_search_opensearch-net  Removing
 Volume ai_search_opensearch-optimized-data1  Removing
 Volume ai_search_opensearch-optimized-data2  Removed
 Volume ai_search_opensearch-optimized-data1  Removed
 Network ai_search_opensearch-net  Removed
 Network ai_search_opensearch-net  Creating
 Network ai_search_opensearch-net  Created
 Volume "ai_search

‚è≥ Waiting for cluster to initialize...
üè• Checking cluster health...


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   550  100   550    0     0   2118      0 --:--:-- --:--:-- --:--:--  2123


{
  "cluster_name" : "opensearch-optimized-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 4,
  "active_shards" : 8,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}


## Section 1: Import Required Libraries and Initialize OpenSearch Client

In [1]:
# Import Required Libraries
from opensearchpy import OpenSearch
import sys, os
from opensearchpy.helpers import bulk
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import json
import time
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Configuration
IS_AUTH = True
HOST = 'localhost'

# Get the current working directory of the notebook
current_dir = os.getcwd()
DATA_DIR = os.path.abspath(os.path.join(current_dir, '../../0. DATA'))

# Construct the path to the directory levels up
module_paths = [os.path.abspath(os.path.join(current_dir, '../../')),] 

# Add the module path to sys.path if it's not already there
for module_path in module_paths:
    if module_path not in sys.path:
        sys.path.append(module_path)

try:
    import helpers as hp
except ImportError as e:
    print(f"‚ö†Ô∏è  Warning: Could not import helpers module: {e}")
    hp = None

# Initialize the OpenSearch client
if IS_AUTH:
    client = OpenSearch(
        hosts=[{'host': HOST, 'port': 9200}],
        http_auth=('admin', 'Developer@123'),  # Replace with your credentials
        use_ssl=True,
        verify_certs=False,
        ssl_show_warn=False
    )
else:
    client = OpenSearch(
        hosts=[{'host': HOST, 'port': 9200}],
        use_ssl=False,
        verify_certs=False,
        ssl_assert_hostname=False,
        ssl_show_warn=False
    )

# Verify connection
try:
    info = client.info()
    print(f"‚úÖ Connected to {info['version']['distribution']} v{info['version']['number']}")
    print(f"üìä Cluster Status: {client.cluster.health()['status']}")
except Exception as e:
    print(f"‚ùå Connection failed: {e}")
    raise


‚úÖ Connected to opensearch v3.3.0
üìä Cluster Status: green


## Section 2: Set Up ML Model for Embeddings

Register and deploy a sentence-transformers embedding model for hybrid search.


In [2]:
# Update cluster settings for ML Commons
print("‚öôÔ∏è  Updating cluster settings for ML Commons...")
client.cluster.put_settings(body={
    "persistent": {
        "plugins": {
            "ml_commons": {
                "allow_registering_model_via_url": "true",
                "only_run_on_ml_node": "false",
                "model_access_control_enabled": "true",
                "native_memory_threshold": "99"
            }
        }
    }
})
print("‚úÖ Cluster settings updated")

# Register a model group
model_group_name = f"hybrid_search_group_{int(time.time())}"
print(f"\nüì¶ Registering model group: {model_group_name}")

model_group_response = client.transport.perform_request(
    method='POST',
    url='/_plugins/_ml/model_groups/_register',
    body={
        "name": model_group_name,
        "description": "Model group for hybrid search demonstrations"
    }
)

model_group_id = model_group_response['model_group_id']
print(f"‚úÖ Model group ID: {model_group_id}")

# Register the sentence-transformers model
print("\nüì• Registering sentence-transformers model...")
register_response = client.transport.perform_request(
    method='POST',
    url='/_plugins/_ml/models/_register',
    body={
        "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2",
        "version": "1.0.2",
        "model_group_id": model_group_id,
        "model_format": "TORCH_SCRIPT",
        "function_name": "TEXT_EMBEDDING",
    }
)

register_task_id = register_response['task_id']
print(f"Registration task ID: {register_task_id}")

# Wait for model registration to complete
print("‚è≥ Waiting for model registration to complete...")
max_wait = 60  # Max wait time in seconds
start_time = time.time()

while True:
    task_status = client.transport.perform_request(
        method='GET',
        url=f'/_plugins/_ml/tasks/{register_task_id}'
    )
    
    if task_status['state'] == 'COMPLETED':
        model_id = task_status['model_id']
        print(f"‚úÖ Model registration completed. Model ID: {model_id}")
        break
    elif task_status['state'] == 'FAILED':
        print(f"‚ùå Model registration failed: {task_status.get('error', 'Unknown error')}")
        break
    elif time.time() - start_time > max_wait:
        print(f"‚è∞ Registration timeout after {max_wait} seconds. Current state: {task_status['state']}")
        break
    else:
        print(f"  State: {task_status['state']}")
        time.sleep(5)

# Deploy the model
print(f"\nüöÄ Deploying model {model_id}...")
deploy_response = client.transport.perform_request(
    method='POST',
    url=f'/_plugins/_ml/models/{model_id}/_deploy'
)

deploy_task_id = deploy_response['task_id']
print(f"Deployment task ID: {deploy_task_id}")

# Wait for deployment to complete
print("‚è≥ Waiting for model deployment to complete...")
start_time = time.time()

while True:
    deployment_status = client.transport.perform_request(
        method='GET',
        url=f'/_plugins/_ml/tasks/{deploy_task_id}'
    )
    
    if deployment_status['state'] == 'COMPLETED':
        print(f"‚úÖ Model deployment completed")
        break
    elif deployment_status['state'] == 'FAILED':
        print(f"‚ùå Deployment failed: {deployment_status.get('error', 'Unknown error')}")
        break
    elif time.time() - start_time > max_wait:
        print(f"‚è∞ Deployment timeout after {max_wait} seconds. Current state: {deployment_status['state']}")
        break
    else:
        print(f"  State: {deployment_status['state']}")
        time.sleep(5)

print(f"\n‚ú® Model setup complete! Model ID: {model_id}")


‚öôÔ∏è  Updating cluster settings for ML Commons...
‚úÖ Cluster settings updated

üì¶ Registering model group: hybrid_search_group_1762695879
‚úÖ Model group ID: aPvcaJoBbVynCTYZmvrv

üì• Registering sentence-transformers model...
Registration task ID: afvcaJoBbVynCTYZmvr_
‚è≥ Waiting for model registration to complete...
  State: CREATED
  State: CREATED
  State: CREATED
‚úÖ Model registration completed. Model ID: vrPcaJoBzyzo5mmPncRH

üöÄ Deploying model vrPcaJoBzyzo5mmPncRH...
Deployment task ID: avvcaJoBbVynCTYZ1fqz
‚è≥ Waiting for model deployment to complete...
  State: CREATED
‚úÖ Model deployment completed

‚ú® Model setup complete! Model ID: vrPcaJoBzyzo5mmPncRH


## Section 3: Create Index with Hybrid Search Configuration

Create an OpenSearch index supporting both keyword search (BM25) and vector search (ANN).


In [4]:
# Index configuration with ingest pipeline
index_name = "hybrid_search_products"
ingest_pipeline_name = "nlp-ingest-pipeline"

print(f"üìã Creating ingest pipeline: {ingest_pipeline_name}")

# Create ingest pipeline for text embedding
ingest_pipeline_body = {
    "description": "Text embedding pipeline for hybrid search",
    "processors": [
        {
            "text_embedding": {
                "model_id": model_id,
                "field_map": {
                    "product_name": "product_embedding",
                    "description": "description_embedding"
                }
            }
        }
    ]
}

client.transport.perform_request(
    method='PUT',
    url=f'/_ingest/pipeline/{ingest_pipeline_name}',
    body=ingest_pipeline_body
)
print(f"‚úÖ Ingest pipeline created")

# Create search pipeline for hybrid search normalization
search_pipeline_name = "hybrid-search-pipeline"
print(f"\nüìã Creating search pipeline: {search_pipeline_name}")

search_pipeline_body = {
    "description": "Search pipeline for hybrid search with normalization",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "min_max"
                },
                "combination": {
                    "technique": "arithmetic_mean",
                    "parameters": {
                        "weights": [0.3, 0.7]
                    }
                }
            }
        }
    ]
}

client.transport.perform_request(
    method='PUT',
    url=f'/_search/pipeline/{search_pipeline_name}',
    body=search_pipeline_body
)
print(f"‚úÖ Search pipeline created")

# Delete index if it already exists
if client.indices.exists(index=index_name):
    print(f"\nüóëÔ∏è  Deleting existing index: {index_name}")
    client.indices.delete(index=index_name)

# Create index with hybrid search configuration
print(f"\nüìÅ Creating index: {index_name}")

index_body = {
    "settings": {
        "index.knn": True,
        "default_pipeline": ingest_pipeline_name,
        "number_of_shards": 2,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "product_id": {
                "type": "keyword"
            },
            "product_name": {
                "type": "text",
                "analyzer": "standard"
            },
            "description": {
                "type": "text",
                "analyzer": "standard"
            },
            "product_embedding": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "engine": "lucene",
                    "space_type": "l2",
                    "name": "hnsw",
                    "parameters": {}
                }
            },
            "description_embedding": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "engine": "lucene",
                    "space_type": "l2",
                    "name": "hnsw",
                    "parameters": {}
                }
            },
            "category": {
                "type": "keyword"
            },
            "price": {
                "type": "float"
            },
            "rating": {
                "type": "float"
            },
            "in_stock": {
                "type": "boolean"
            },
            "supplier": {
                "type": "keyword"
            },
            "tags": {
                "type": "keyword"
            }
        }
    }
}

client.indices.create(index=index_name, body=index_body)
print(f"‚úÖ Index created with hybrid search configuration")

üìã Creating ingest pipeline: nlp-ingest-pipeline
‚úÖ Ingest pipeline created

üìã Creating search pipeline: hybrid-search-pipeline
‚úÖ Search pipeline created

üìÅ Creating index: hybrid_search_products
‚úÖ Index created with hybrid search configuration
‚úÖ Index created with hybrid search configuration


## Section 4: Index Sample Data with Vector Embeddings

Load sample product data and index documents with vector embeddings.


In [7]:
# Sample product data for hybrid search demonstration
sample_products = [
    {
        "product_id": "P001",
        "product_name": "Wireless Bluetooth Headphones",
        "description": "Premium noise-cancelling wireless headphones with 30-hour battery life and superior sound quality",
        "category": "Electronics",
        "price": 199.99,
        "rating": 4.5,
        "in_stock": True,
        "supplier": "TechBrand",
        "tags": ["audio", "wireless", "headphones", "electronics"]
    },
    {
        "product_id": "P002",
        "product_name": "USB-C Fast Charging Cable",
        "description": "Durable USB-C charging cable with fast charging capability for smartphones and tablets",
        "category": "Accessories",
        "price": 29.99,
        "rating": 4.8,
        "in_stock": True,
        "supplier": "TechBrand",
        "tags": ["cable", "charging", "usb-c", "accessories"]
    },
    {
        "product_id": "P003",
        "product_name": "Portable Phone Charger",
        "description": "20000mAh portable power bank with dual USB ports and LED display for extended battery life",
        "category": "Electronics",
        "price": 49.99,
        "rating": 4.6,
        "in_stock": True,
        "supplier": "PowerTech",
        "tags": ["battery", "charger", "portable", "power"]
    },
    {
        "product_id": "P004",
        "product_name": "Mechanical Gaming Keyboard",
        "description": "RGB backlit mechanical keyboard with Cherry MX switches for professional gaming and typing",
        "category": "Computer Peripherals",
        "price": 149.99,
        "rating": 4.7,
        "in_stock": True,
        "supplier": "GamingGear",
        "tags": ["keyboard", "gaming", "mechanical", "rgb"]
    },
    {
        "product_id": "P005",
        "product_name": "4K Webcam",
        "description": "Professional 4K resolution webcam with auto-focus and built-in microphone for streaming and conferencing",
        "category": "Computer Peripherals",
        "price": 129.99,
        "rating": 4.4,
        "in_stock": True,
        "supplier": "StreamTech",
        "tags": ["webcam", "4k", "video", "streaming"]
    },
    {
        "product_id": "P006",
        "product_name": "Ergonomic Wireless Mouse",
        "description": "Ergonomic design wireless mouse with precision tracking and 18-month battery life",
        "category": "Computer Peripherals",
        "price": 39.99,
        "rating": 4.5,
        "in_stock": True,
        "supplier": "TechBrand",
        "tags": ["mouse", "wireless", "ergonomic", "peripherals"]
    },
    {
        "product_id": "P007",
        "product_name": "Monitor Stand Riser",
        "description": "Adjustable monitor stand with storage space for improving desk ergonomics and workspace organization",
        "category": "Furniture",
        "price": 69.99,
        "rating": 4.3,
        "in_stock": True,
        "supplier": "DeskSetup",
        "tags": ["furniture", "desk", "stand", "organization"]
    },
    {
        "product_id": "P008",
        "product_name": "LED Desk Lamp",
        "description": "Smart LED desk lamp with adjustable brightness and color temperature for comfortable workspace lighting",
        "category": "Lighting",
        "price": 59.99,
        "rating": 4.6,
        "in_stock": True,
        "supplier": "LightTech",
        "tags": ["lighting", "lamp", "led", "smart"]
    }
]

print(f"üìä Preparing {len(sample_products)} products for indexing...")

# Index documents
actions = []
for i, product in enumerate(sample_products):
    actions.append({
        "_index": index_name,
        "_id": product["product_id"],
        "_source": product
    })

print(f"üì§ Indexing documents using bulk API...")
success_count, errors = bulk(client, actions, chunk_size=100)
print(f"‚úÖ Indexed {success_count} documents successfully")
if len(errors) > 0:
    print(f"‚ö†Ô∏è  {len(errors)} documents failed to index")
    for error in errors:
        print(f"   Error: {error}")

# Wait a moment for indexing to complete
time.sleep(2)

# Verify indexing
doc_count = client.cat.count(index=index_name, format='json')[0]['count']
print(f"\n‚ú® Index Statistics:")
print(f"   Index Name: {index_name}")
print(f"   Total Documents: {doc_count}")
print(f"   Search Pipeline: {search_pipeline_name}")
print(f"   Ingest Pipeline: {ingest_pipeline_name}")

üìä Preparing 8 products for indexing...
üì§ Indexing documents using bulk API...
‚úÖ Indexed 8 documents successfully

‚ú® Index Statistics:
   Index Name: hybrid_search_products
   Total Documents: 8
   Search Pipeline: hybrid-search-pipeline
   Ingest Pipeline: nlp-ingest-pipeline

‚ú® Index Statistics:
   Index Name: hybrid_search_products
   Total Documents: 8
   Search Pipeline: hybrid-search-pipeline
   Ingest Pipeline: nlp-ingest-pipeline


## Section 5: Basic Hybrid Search Query

Execute a basic hybrid search combining keyword (match) and vector (neural) queries.

Hybrid search combines:
- **Keyword Search**: Traditional BM25 relevance scoring for text matching
- **Vector Search**: Semantic similarity using neural networks
- **Normalization**: Scores are normalized to 0-1 range for fair comparison
- **Combination**: Weighted average combines both scores into final ranking


In [10]:
def print_search_results(response, title="Search Results"):
    """Helper function to pretty-print search results"""
    print(f"\n{'='*80}")
    print(f"{title}")
    print(f"{'='*80}")
    print(f"Total Hits: {response['hits']['total']['value']}")
    print(f"Time: {response['took']}ms\n")
    
    for i, hit in enumerate(response['hits']['hits'], 1):
        # Handle None score (when sorting by fields other than _score)
        score_str = f"{hit['_score']:.4f}" if hit['_score'] is not None else "N/A (sorted by field)"
        print(f"[{i}] Score: {score_str} | ID: {hit['_id']}")
        source = hit['_source']
        print(f"    Product: {source.get('product_name', 'N/A')}")
        print(f"    Category: {source.get('category', 'N/A')} | Price: ${source.get('price', 'N/A')}")
        print(f"    Description: {source.get('description', 'N/A')[:80]}...")
        if 'sort' in hit:
            print(f"    Sort Value: {hit['sort']}")
        print()

# Basic hybrid search combining keyword and vector search
query_text = "wireless headphones"

print(f"üîç Searching for: '{query_text}'")
print(f"Using search pipeline: {search_pipeline_name}\n")

hybrid_query = {
    "_source": {
        "exclude": ["product_embedding", "description_embedding"]
    },
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": {
                            "query": query_text
                        }
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": query_text,
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    }
}

# Execute the hybrid search
response = client.search(
    index=index_name,
    body=hybrid_query,
    params={"search_pipeline": search_pipeline_name}
)

print_search_results(response, "üìä Basic Hybrid Search Results")

# Display the query structure
print("\nüìã Query Structure:")
print(json.dumps(hybrid_query, indent=2))

üîç Searching for: 'wireless headphones'
Using search pipeline: hybrid-search-pipeline


üìä Basic Hybrid Search Results
Total Hits: 8
Time: 14ms

[1] Score: 1.0000 | ID: P001
    Product: Wireless Bluetooth Headphones
    Category: Electronics | Price: $199.99
    Description: Premium noise-cancelling wireless headphones with 30-hour battery life and super...

[2] Score: 0.1995 | ID: P006
    Product: Ergonomic Wireless Mouse
    Category: Computer Peripherals | Price: $39.99
    Description: Ergonomic design wireless mouse with precision tracking and 18-month battery lif...

[3] Score: 0.0741 | ID: P003
    Product: Portable Phone Charger
    Category: Electronics | Price: $49.99
    Description: 20000mAh portable power bank with dual USB ports and LED display for extended ba...

[4] Score: 0.0546 | ID: P008
    Product: LED Desk Lamp
    Category: Lighting | Price: $59.99
    Description: Smart LED desk lamp with adjustable brightness and color temperature for comfort...

[5] Scor

## Section 6: Hybrid Search with Sorting

Hybrid search results can be sorted by field values, overriding relevance scoring.
When custom sorting is applied, document scores become `null` (only present when sorting by `_score`).


In [11]:
# Example 1: Sort by price (ascending)
print("üìä USE CASE 1: Sort by Price (Ascending)")
print("-" * 80)

sorted_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": "electronics"
                    }
                },
                {
                    "match": {
                        "description": "electronics"
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "price": {
                "order": "asc"
            }
        }
    ]
}

response = client.search(
    index=index_name,
    body=sorted_query,
    params={"search_pipeline": search_pipeline_name}
)

print_search_results(response, "Results sorted by price (lowest first)")

# Example 2: Sort by rating (descending)
print("\n\nüìä USE CASE 2: Sort by Rating (Descending)")
print("-" * 80)

sorted_query_rating = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": "wireless"
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": "wireless connectivity",
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "rating": {
                "order": "desc"
            }
        }
    ]
}

response = client.search(
    index=index_name,
    body=sorted_query_rating,
    params={"search_pipeline": search_pipeline_name}
)

print_search_results(response, "Results sorted by rating (highest first)")

# Example 3: Multi-field sorting
print("\n\nüìä USE CASE 3: Multi-Field Sorting (Category, then Price)")
print("-" * 80)

multi_sort_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match_all": {}
                },
                {
                    "match_all": {}
                }
            ]
        }
    },
    "sort": [
        {"category": {"order": "asc"}},
        {"price": {"order": "asc"}}
    ]
}

response = client.search(
    index=index_name,
    body=multi_sort_query,
    params={"search_pipeline": search_pipeline_name}
)

print_search_results(response, "Results sorted by category, then price")


üìä USE CASE 1: Sort by Price (Ascending)
--------------------------------------------------------------------------------

Results sorted by price (lowest first)
Total Hits: 0
Time: 6ms



üìä USE CASE 2: Sort by Rating (Descending)
--------------------------------------------------------------------------------

Results sorted by rating (highest first)
Total Hits: 8
Time: 19ms

[1] Score: N/A (sorted by field) | ID: P002
    Product: USB-C Fast Charging Cable
    Category: Accessories | Price: $29.99
    Description: Durable USB-C charging cable with fast charging capability for smartphones and t...
    Sort Value: [4.8]

[2] Score: N/A (sorted by field) | ID: P004
    Product: Mechanical Gaming Keyboard
    Category: Computer Peripherals | Price: $149.99
    Description: RGB backlit mechanical keyboard with Cherry MX switches for professional gaming ...
    Sort Value: [4.7]

[3] Score: N/A (sorted by field) | ID: P003
    Product: Portable Phone Charger
    Category: Electronics 

## Section 7: Hybrid Search with Pagination

Pagination allows retrieval of large result sets in manageable chunks using `from` and `size` parameters.
For hybrid queries, use `pagination_depth` to control how many results per subquery are retrieved from each shard.


In [12]:
# Pagination example
print("üìñ USE CASE: Pagination with Hybrid Search")
print("-" * 80)

page_size = 2
page_number = 1
pagination_depth = 10  # Retrieve up to 10 results per subquery from each shard

def fetch_page(page_num, page_size, pagination_depth=10):
    """Fetch a specific page of results"""
    from_value = (page_num - 1) * page_size
    
    pagination_query = {
        "query": {
            "hybrid": {
                "pagination_depth": pagination_depth,
                "queries": [
                    {
                        "match": {
                            "product_name": "electronics"
                        }
                    },
                    {
                        "bool": {
                            "should": [
                                {
                                    "match": {
                                        "category": "Electronics"
                                    }
                                },
                                {
                                    "match": {
                                        "category": "Accessories"
                                    }
                                }
                            ]
                        }
                    }
                ]
            }
        },
        "from": from_value,
        "size": page_size
    }
    
    response = client.search(
        index=index_name,
        body=pagination_query,
        params={"search_pipeline": search_pipeline_name}
    )
    
    return response

# Fetch first page
print(f"üîÑ Fetching page {page_number} (size: {page_size}, pagination_depth: {pagination_depth})")
response_page1 = fetch_page(page_number, page_size, pagination_depth)
print_search_results(response_page1, f"Page {page_number} of Hybrid Search Results")

# Fetch second page
page_number = 2
print(f"\nüîÑ Fetching page {page_number} (size: {page_size}, pagination_depth: {pagination_depth})")
response_page2 = fetch_page(page_number, page_size, pagination_depth)
print_search_results(response_page2, f"Page {page_number} of Hybrid Search Results")

# Display pagination info
total_hits = response_page1['hits']['total']['value']
total_pages = (total_hits + page_size - 1) // page_size
print(f"\nüìä Pagination Statistics:")
print(f"   Total Hits: {total_hits}")
print(f"   Page Size: {page_size}")
print(f"   Total Pages: {total_pages}")


üìñ USE CASE: Pagination with Hybrid Search
--------------------------------------------------------------------------------
üîÑ Fetching page 1 (size: 2, pagination_depth: 10)

Page 1 of Hybrid Search Results
Total Hits: 3
Time: 12ms

[1] Score: 0.7000 | ID: P002
    Product: USB-C Fast Charging Cable
    Category: Accessories | Price: $29.99
    Description: Durable USB-C charging cable with fast charging capability for smartphones and t...

[2] Score: 0.7000 | ID: P001
    Product: Wireless Bluetooth Headphones
    Category: Electronics | Price: $199.99
    Description: Premium noise-cancelling wireless headphones with 30-hour battery life and super...


üîÑ Fetching page 2 (size: 2, pagination_depth: 10)

Page 2 of Hybrid Search Results
Total Hits: 3
Time: 8ms

[1] Score: 0.7000 | ID: P003
    Product: Portable Phone Charger
    Category: Electronics | Price: $49.99
    Description: 20000mAh portable power bank with dual USB ports and LED display for extended ba...


üìä Pagina

## Section 8: Hybrid Search with Search After

The `search_after` parameter provides cursor-based pagination for efficient deep pagination.
This is more performant than `from` + `size` for large offsets.


In [13]:
# Search After example for cursor-based pagination
print("üîñ USE CASE: Search After (Cursor-Based Pagination)")
print("-" * 80)

# First query to get initial results
first_search_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match_all": {}
                },
                {
                    "match_all": {}
                }
            ]
        }
    },
    "sort": [
        {"price": {"order": "desc"}}
    ],
    "size": 2
}

print("üìå First Query (Price DESC):")
response = client.search(
    index=index_name,
    body=first_search_query,
    params={"search_pipeline": search_pipeline_name}
)
print_search_results(response, "Initial Results (Top 2 by price)")

# Extract sort values from the last document for cursor navigation
if response['hits']['hits']:
    last_hit = response['hits']['hits'][-1]
    search_after_value = last_hit['sort']
    
    print(f"üìç Last sort value (cursor): {search_after_value}")
    print(f"   Will fetch results AFTER this price: ${search_after_value[0]}")
    
    # Second query using search_after
    next_search_query = {
        "query": {
            "hybrid": {
                "queries": [
                    {
                        "match_all": {}
                    },
                    {
                        "match_all": {}
                    }
                ]
            }
        },
        "sort": [
            {"price": {"order": "desc"}}
        ],
        "size": 2,
        "search_after": search_after_value
    }
    
    print("\nüìå Second Query (Using search_after cursor):")
    response2 = client.search(
        index=index_name,
        body=next_search_query,
        params={"search_pipeline": search_pipeline_name}
    )
    print_search_results(response2, "Next Results (Next 2 after cursor)")


üîñ USE CASE: Search After (Cursor-Based Pagination)
--------------------------------------------------------------------------------
üìå First Query (Price DESC):

Initial Results (Top 2 by price)
Total Hits: 8
Time: 8ms

[1] Score: N/A (sorted by field) | ID: P001
    Product: Wireless Bluetooth Headphones
    Category: Electronics | Price: $199.99
    Description: Premium noise-cancelling wireless headphones with 30-hour battery life and super...
    Sort Value: [199.99]

[2] Score: N/A (sorted by field) | ID: P004
    Product: Mechanical Gaming Keyboard
    Category: Computer Peripherals | Price: $149.99
    Description: RGB backlit mechanical keyboard with Cherry MX switches for professional gaming ...
    Sort Value: [149.99]

üìç Last sort value (cursor): [149.99]
   Will fetch results AFTER this price: $149.99

üìå Second Query (Using search_after cursor):

Next Results (Next 2 after cursor)
Total Hits: 8
Time: 7ms

[1] Score: N/A (sorted by field) | ID: P005
    Product: 4

## Section 9: Hybrid Search with Collapse

The `collapse` feature groups results by a field value, returning only the highest-scoring document per group.
Useful for deduplicating results and avoiding multiple results from the same category/supplier.


In [14]:
# Collapse example
print("üîó USE CASE: Collapse Results by Supplier")
print("-" * 80)

collapse_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": "wireless electronics"
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": "wireless products",
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    },
    "collapse": {
        "field": "supplier"
    },
    "size": 10
}

print("üìå Searching for 'wireless electronics' with collapse by supplier")
response = client.search(
    index=index_name,
    body=collapse_query,
    params={"search_pipeline": search_pipeline_name}
)

print(f"\n‚ú® Results (one per supplier):")
print(f"Total Matches: {response['hits']['total']['value']}")
print(f"Unique Suppliers: {len(response['hits']['hits'])}\n")

for i, hit in enumerate(response['hits']['hits'], 1):
    source = hit['_source']
    print(f"[{i}] Score: {hit['_score']:.4f}")
    print(f"    Supplier: {source.get('supplier')}")
    print(f"    Product: {source.get('product_name')}")
    print(f"    Price: ${source.get('price')}")
    print()

# Collapse with sorting
print("\n" + "="*80)
print("üîó Collapse with Sorting by Price")
print("-" * 80)

collapse_sort_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match_all": {}
                },
                {
                    "match_all": {}
                }
            ]
        }
    },
    "collapse": {
        "field": "category"
    },
    "sort": [
        {"price": {"order": "asc"}}
    ],
    "size": 10
}

print("üìå Collapse by category, sorted by lowest price")
response = client.search(
    index=index_name,
    body=collapse_sort_query,
    params={"search_pipeline": search_pipeline_name}
)

print(f"\n‚ú® Results (one per category, sorted by price):\n")

for i, hit in enumerate(response['hits']['hits'], 1):
    source = hit['_source']
    print(f"[{i}] Category: {source.get('category')} | Price: ${source.get('price'):.2f}")
    print(f"    Product: {source.get('product_name')}")
    print()


üîó USE CASE: Collapse Results by Supplier
--------------------------------------------------------------------------------
üìå Searching for 'wireless electronics' with collapse by supplier

‚ú® Results (one per supplier):
Total Matches: 8
Unique Suppliers: 6

[1] Score: 1.0000
    Supplier: TechBrand
    Product: Wireless Bluetooth Headphones
    Price: $199.99

[2] Score: 0.2287
    Supplier: PowerTech
    Product: Portable Phone Charger
    Price: $49.99

[3] Score: 0.1967
    Supplier: StreamTech
    Product: 4K Webcam
    Price: $129.99

[4] Score: 0.1564
    Supplier: GamingGear
    Product: Mechanical Gaming Keyboard
    Price: $149.99

[5] Score: 0.0267
    Supplier: LightTech
    Product: LED Desk Lamp
    Price: $59.99

[6] Score: 0.0157
    Supplier: DeskSetup
    Product: Monitor Stand Riser
    Price: $69.99


üîó Collapse with Sorting by Price
--------------------------------------------------------------------------------
üìå Collapse by category, sorted by lowest p

## Section 10: Hybrid Search with Post-Filtering

Post-filtering applies additional filters AFTER relevance scoring and normalization.
This allows filtering without impacting scores or result order, useful for refining hybrid search results.


In [15]:
# Post-filtering example
print("üîç USE CASE: Post-Filtering on Hybrid Results")
print("-" * 80)

# Example 1: Filter by price range after ranking
post_filter_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": "electronics"
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": "tech gadgets",
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    },
    "post_filter": {
        "range": {
            "price": {
                "lte": 100
            }
        }
    }
}

print("üìå Hybrid search + post-filter: price <= $100")
response = client.search(
    index=index_name,
    body=post_filter_query,
    params={"search_pipeline": search_pipeline_name}
)
print_search_results(response, "Results filtered by price <= $100")

# Example 2: Filter by category and in_stock status
print("\n" + "="*80)
print("üîç Post-Filter: Category + Stock Status")
print("-" * 80)

complex_post_filter_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match_all": {}
                },
                {
                    "match_all": {}
                }
            ]
        }
    },
    "post_filter": {
        "bool": {
            "must": [
                {
                    "term": {
                        "in_stock": True
                    }
                },
                {
                    "terms": {
                        "category": ["Electronics", "Computer Peripherals"]
                    }
                }
            ]
        }
    }
}

print("üìå Post-filter: in_stock=true AND category in (Electronics, Computer Peripherals)")
response = client.search(
    index=index_name,
    body=complex_post_filter_query,
    params={"search_pipeline": search_pipeline_name}
)
print_search_results(response, "Filtered Results")

# Show the difference
print("\nüìä Post-Filter Impact:")
print(f"   Total hybrid matches: Query would match more documents")
print(f"   After post-filter: {response['hits']['total']['value']} documents")
print("   ‚ÑπÔ∏è  Post-filters narrow results without changing relevance scores")


üîç USE CASE: Post-Filtering on Hybrid Results
--------------------------------------------------------------------------------
üìå Hybrid search + post-filter: price <= $100

Results filtered by price <= $100
Total Hits: 5
Time: 18ms

[1] Score: 0.7000 | ID: P006
    Product: Ergonomic Wireless Mouse
    Category: Computer Peripherals | Price: $39.99
    Description: Ergonomic design wireless mouse with precision tracking and 18-month battery lif...

[2] Score: 0.5093 | ID: P003
    Product: Portable Phone Charger
    Category: Electronics | Price: $49.99
    Description: 20000mAh portable power bank with dual USB ports and LED display for extended ba...

[3] Score: 0.4849 | ID: P007
    Product: Monitor Stand Riser
    Category: Furniture | Price: $69.99
    Description: Adjustable monitor stand with storage space for improving desk ergonomics and wo...

[4] Score: 0.2772 | ID: P008
    Product: LED Desk Lamp
    Category: Lighting | Price: $59.99
    Description: Smart LED desk la

## Section 11: Hybrid Search with Aggregations

Aggregations compute statistics and analytics on the documents matching the hybrid query.
They operate on the subset of documents returned by the hybrid query.


In [16]:
# Aggregations example
print("üìä USE CASE: Aggregations on Hybrid Search Results")
print("-" * 80)

aggregation_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "description": "electronics wireless"
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": "tech electronics",
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "categories": {
            "terms": {
                "field": "category",
                "size": 10
            }
        },
        "price_stats": {
            "stats": {
                "field": "price"
            }
        },
        "avg_rating": {
            "avg": {
                "field": "rating"
            }
        },
        "price_ranges": {
            "range": {
                "field": "price",
                "ranges": [
                    {"to": 50},
                    {"from": 50, "to": 100},
                    {"from": 100, "to": 150},
                    {"from": 150}
                ]
            }
        },
        "by_category": {
            "terms": {
                "field": "category",
                "size": 10
            },
            "aggs": {
                "avg_price": {
                    "avg": {
                        "field": "price"
                    }
                },
                "max_rating": {
                    "max": {
                        "field": "rating"
                    }
                }
            }
        }
    },
    "size": 5
}

print("üìå Hybrid search with aggregations:")
print("  - Terms aggregation by category")
print("  - Price statistics")
print("  - Average rating")
print("  - Price range distribution")
print("  - Nested: Average price per category\n")

response = client.search(
    index=index_name,
    body=aggregation_query,
    params={"search_pipeline": search_pipeline_name}
)

# Display search results
print_search_results(response, "Hybrid Search Results (with aggregations)")

# Display aggregation results
print("\n" + "="*80)
print("üìà AGGREGATION RESULTS")
print("="*80)

aggs = response.get('aggregations', {})

# Categories
print("\nüìÅ Product Categories:")
if 'categories' in aggs:
    for bucket in aggs['categories']['buckets']:
        print(f"  ‚Ä¢ {bucket['key']}: {bucket['doc_count']} products")

# Price statistics
print("\nüí∞ Price Statistics:")
if 'price_stats' in aggs:
    stats = aggs['price_stats']
    print(f"  Count: {stats['count']}")
    print(f"  Min: ${stats['min']:.2f}")
    print(f"  Max: ${stats['max']:.2f}")
    print(f"  Avg: ${stats['avg']:.2f}")
    print(f"  Sum: ${stats['sum']:.2f}")

# Average rating
print("\n‚≠ê Average Rating:")
if 'avg_rating' in aggs:
    print(f"  {aggs['avg_rating']['value']:.2f}")

# Price ranges
print("\nüíµ Price Range Distribution:")
if 'price_ranges' in aggs:
    ranges_labels = ["< $50", "$50-100", "$100-150", "> $150"]
    for label, bucket in zip(ranges_labels, aggs['price_ranges']['buckets']):
        print(f"  {label}: {bucket['doc_count']} products")

# Category breakdown
print("\nüìä Detailed Category Breakdown:")
if 'by_category' in aggs:
    for bucket in aggs['by_category']['buckets']:
        print(f"  {bucket['key']}:")
        print(f"    Count: {bucket['doc_count']}")
        print(f"    Avg Price: ${bucket['avg_price']['value']:.2f}")
        print(f"    Max Rating: {bucket['max_rating']['value']:.2f}")


üìä USE CASE: Aggregations on Hybrid Search Results
--------------------------------------------------------------------------------
üìå Hybrid search with aggregations:
  - Terms aggregation by category
  - Price statistics
  - Average rating
  - Price range distribution
  - Nested: Average price per category


Hybrid Search Results (with aggregations)
Total Hits: 8
Time: 45ms

[1] Score: 0.7000 | ID: P004
    Product: Mechanical Gaming Keyboard
    Category: Computer Peripherals | Price: $149.99
    Description: RGB backlit mechanical keyboard with Cherry MX switches for professional gaming ...

[2] Score: 0.5422 | ID: P006
    Product: Ergonomic Wireless Mouse
    Category: Computer Peripherals | Price: $39.99
    Description: Ergonomic design wireless mouse with precision tracking and 18-month battery lif...

[3] Score: 0.4319 | ID: P007
    Product: Monitor Stand Riser
    Category: Furniture | Price: $69.99
    Description: Adjustable monitor stand with storage space for improv

## Section 12: Hybrid Search with Inner Hits

Inner hits retrieve nested or related documents within hybrid search results.
Useful for understanding which specific parts of a document matched the query.


In [17]:
# Inner hits example with collapse
print("üéØ USE CASE: Inner Hits with Collapse")
print("-" * 80)

inner_hits_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": "wireless"
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": "wireless connectivity",
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    },
    "collapse": {
        "field": "category",
        "inner_hits": [
            {
                "name": "top_products",
                "size": 2,
                "sort": ["_score"]
            }
        ]
    }
}

print("üìå Hybrid search with collapse and inner_hits")
print("  - Collapse by category")
print("  - Show top 2 products per category\n")

response = client.search(
    index=index_name,
    body=inner_hits_query,
    params={"search_pipeline": search_pipeline_name}
)

print(f"‚ú® Collapsed Results (by Category) with Inner Hits:\n")
print(f"Total Hits: {response['hits']['total']['value']}\n")

for i, hit in enumerate(response['hits']['hits'], 1):
    source = hit['_source']
    print(f"[{i}] Category: {source.get('category')}")
    print(f"    Main Result: {source.get('product_name')} (${source.get('price')})")
    
    # Display inner hits (other top products in same category)
    if 'inner_hits' in hit and 'top_products' in hit['inner_hits']:
        print(f"    Other Top Products in {source.get('category')}:")
        for j, inner_hit in enumerate(hit['inner_hits']['top_products']['hits']['hits'], 1):
            inner_source = inner_hit['_source']
            print(f"      {j}. {inner_source.get('product_name')} (${inner_source.get('price')})")
    
    print()


üéØ USE CASE: Inner Hits with Collapse
--------------------------------------------------------------------------------
üìå Hybrid search with collapse and inner_hits
  - Collapse by category
  - Show top 2 products per category

‚ú® Collapsed Results (by Category) with Inner Hits:

Total Hits: 8

[1] Category: Electronics
    Main Result: Wireless Bluetooth Headphones ($199.99)
    Other Top Products in Electronics:
      1. Wireless Bluetooth Headphones ($199.99)
      2. Portable Phone Charger ($49.99)

[2] Category: Computer Peripherals
    Main Result: Ergonomic Wireless Mouse ($39.99)
    Other Top Products in Computer Peripherals:
      1. Ergonomic Wireless Mouse ($39.99)
      2. 4K Webcam ($129.99)

[3] Category: Lighting
    Main Result: LED Desk Lamp ($59.99)
    Other Top Products in Lighting:
      1. LED Desk Lamp ($59.99)

[4] Category: Accessories
    Main Result: USB-C Fast Charging Cable ($29.99)
    Other Top Products in Accessories:
      1. USB-C Fast Charging C

## Section 13: Hybrid Search with Explain

The `explain` parameter shows detailed scoring information about how hybrid search calculates final scores.
Reveals how keyword and vector scores are normalized and combined.

Note: Explain is resource-intensive and recommended only for debugging/troubleshooting.


In [18]:
# Explain example
print("üî¨ USE CASE: Hybrid Search Explain (Score Analysis)")
print("-" * 80)

# First, check if we need to update the search pipeline with explanation processor
explain_pipeline_name = "hybrid-search-pipeline-explain"

try:
    # Try to create a search pipeline with explanation processor
    explain_pipeline_body = {
        "description": "Search pipeline with hybrid score explanation",
        "phase_results_processors": [
            {
                "normalization-processor": {
                    "normalization": {
                        "technique": "min_max"
                    },
                    "combination": {
                        "technique": "arithmetic_mean",
                        "parameters": {
                            "weights": [0.3, 0.7]
                        }
                    }
                }
            }
        ],
        "response_processors": [
            {
                "hybrid_score_explanation": {}
            }
        ]
    }
    
    client.transport.perform_request(
        method='PUT',
        url=f'/_search/pipeline/{explain_pipeline_name}',
        body=explain_pipeline_body
    )
    print(f"‚úÖ Created search pipeline with explanation processor: {explain_pipeline_name}")
except Exception as e:
    print(f"‚ö†Ô∏è  Note: Explanation processor may not be available: {str(e)[:100]}")
    explain_pipeline_name = search_pipeline_name

# Execute query with explain
explain_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "product_name": "headphones"
                    }
                },
                {
                    "neural": {
                        "product_embedding": {
                            "query_text": "audio equipment",
                            "model_id": model_id,
                            "k": 5
                        }
                    }
                }
            ]
        }
    },
    "size": 1
}

print("\nüìå Executing hybrid query with explain=true")
print("   Query: match 'headphones' + neural 'audio equipment'\n")

response = client.search(
    index=index_name,
    body=explain_query,
    params={"search_pipeline": search_pipeline_name, "explain": "true"}
)

# Display results with explanation
if response['hits']['hits']:
    hit = response['hits']['hits'][0]
    source = hit['_source']
    
    print(f"üèÜ Top Result:")
    print(f"   Product: {source.get('product_name')}")
    print(f"   Final Score: {hit['_score']:.6f}")
    print(f"   Category: {source.get('category')}")
    print(f"   Price: ${source.get('price')}")
    
    # Show explanation if available
    if '_explanation' in hit:
        print(f"\nüìä Scoring Explanation:")
        print(f"   Value: {hit['_explanation']['value']:.6f}")
        print(f"   Description: {hit['_explanation']['description']}")
        
        # Display details of normalization and combination
        if 'details' in hit['_explanation']:
            print(f"\n   Details of Score Calculation:")
            for i, detail in enumerate(hit['_explanation']['details'], 1):
                print(f"     [{i}] {detail['description']}: {detail['value']:.6f}")
                if 'details' in detail:
                    for j, subdetail in enumerate(detail['details'][:3], 1):  # Show first 3
                        print(f"         [{j}] {subdetail['description']}: {subdetail['value']:.6f}")
    else:
        print("\nüìå Explanation details not available (may require explicit processor configuration)")

# Show how to interpret hybrid scores
print("\n" + "="*80)
print("üìö Understanding Hybrid Search Scores:")
print("-" * 80)
print("""
1. KEYWORD SCORE (BM25):
   - Based on term frequency and document length normalization
   - Uses Okapi BM25 algorithm
   - Range: 0 to unlimited (typically 0-50)

2. NEURAL SCORE (Vector):
   - Based on similarity between query and document vectors
   - Uses chosen distance metric (L2, cosine, etc.)
   - Range: 0 to 1 (after normalization)

3. NORMALIZATION:
   - min_max: Maps all scores to 0-1 range
   - Formula: (score - min) / (max - min)
   - Ensures fair comparison between different scoring methods

4. COMBINATION:
   - arithmetic_mean: (keyword_score + neural_score) / 2
   - With weights: (w1 * keyword + w2 * neural) / (w1 + w2)
   - In this example: 0.3 * keyword + 0.7 * neural

5. FINAL SCORE:
   - Result after normalization and combination
   - Used for ranking results in descending order
   - Only null when custom sorting is applied
""")


üî¨ USE CASE: Hybrid Search Explain (Score Analysis)
--------------------------------------------------------------------------------
‚úÖ Created search pipeline with explanation processor: hybrid-search-pipeline-explain

üìå Executing hybrid query with explain=true
   Query: match 'headphones' + neural 'audio equipment'

üèÜ Top Result:
   Product: Wireless Bluetooth Headphones
   Final Score: 1.000000
   Category: Electronics
   Price: $199.99

üìä Scoring Explanation:
   Value: 0.690830
   Description: combined score of:

   Details of Score Calculation:
     [1] weight(product_name:headphones in 5) [PerFieldSimilarity], result of:: 0.690830
         [1] score(freq=1.0), computed as boost * idf * tf from:: 0.690830
     [2] within top 5 docs: 0.432413

üìö Understanding Hybrid Search Scores:
--------------------------------------------------------------------------------

1. KEYWORD SCORE (BM25):
   - Based on term frequency and document length normalization
   - Uses Okapi BM2

## Summary: Hybrid Search Use Cases and Best Practices

### Key Takeaways

**Hybrid Search combines:**
- Keyword search (BM25) for exact term matching
- Vector search (neural) for semantic understanding
- Normalization to make scores comparable
- Combination for weighted final ranking

### When to Use Each Feature

| Feature | Use Case | Example |
|---------|----------|---------|
| **Basic Hybrid** | Default multi-modal search | Search for products with semantic + keyword matching |
| **Sorting** | Override relevance ranking | Sort by price, date, or popularity |
| **Pagination** | Browse large result sets | Display results in pages of 10-20 items |
| **Search After** | Efficient deep pagination | Cursor-based navigation for APIs |
| **Collapse** | Deduplicate results | Show one product per supplier/category |
| **Post-Filter** | Narrow results after ranking | Filter by price range, availability |
| **Aggregations** | Analytics on results | Count by category, price statistics |
| **Inner Hits** | Context with results | Show similar items in same category |
| **Explain** | Debug scoring | Understand why documents ranked in certain order |

### Performance Considerations

1. **pagination_depth**: Controls memory usage - higher values can impact performance
2. **size**: Limit results returned to reduce network overhead
3. **collapse**: Reduces result set but may impact performance
4. **aggregations**: Computed on all matching documents, can be expensive
5. **explain**: Resource-intensive, use only for debugging

### Configuration Best Practices

1. **Weights**: Adjust normalization weights based on importance (keyword vs semantic)
2. **Normalization**: Use `min_max` for fair score comparison
3. **Combination**: Use `arithmetic_mean` for balanced blending
4. **Model**: Choose appropriate embedding model for your use case
5. **Pipeline**: Create dedicated search pipelines for different scenarios

### Common Patterns

```python
# Pattern 1: Product search with filtering
hybrid_query + post_filter + sort by relevance

# Pattern 2: Browse with facets
hybrid_query + aggregations + collapse by category

# Pattern 3: Deep pagination with sorting
hybrid_query + search_after + sort by field

# Pattern 4: Analytics dashboard
hybrid_query + multiple aggregations + pagination
```


In [20]:
# Cleanup and Summary
print("\n" + "="*80)
print("üéì HYBRID SEARCH DEMONSTRATION COMPLETE")
print("="*80)

# Gather statistics
try:
    index_stats = client.indices.stats(index=index_name)
    doc_count = index_stats['indices'][index_name]['primaries']['docs']['count']
    store_size = index_stats['indices'][index_name]['primaries']['store']['size_in_bytes']
    
    print(f"""
‚ú® Summary Statistics:
   - Index Name: {index_name}
   - Documents Indexed: {doc_count}
   - Storage Size: {store_size / 1024:.2f} KB
   - Search Pipeline: {search_pipeline_name}
   - Ingest Pipeline: {ingest_pipeline_name}
   - Model ID: {model_id}
   - Model Group ID: {model_group_id}

üìö Topics Covered:
   1. ‚úÖ Basic Hybrid Search (keyword + vector)
   2. ‚úÖ Sorting (by price, rating, custom fields)
   3. ‚úÖ Pagination (from/size with pagination_depth)
   4. ‚úÖ Search After (cursor-based navigation)
   5. ‚úÖ Collapse (deduplication by field)
   6. ‚úÖ Post-Filtering (narrow after ranking)
   7. ‚úÖ Aggregations (facets, statistics)
   8. ‚úÖ Inner Hits (nested results)
   9. ‚úÖ Explain (score analysis)

üîó Documentation:
   - Hybrid Search: https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/
   - Search Pipelines: https://docs.opensearch.org/latest/search-plugins/search-pipelines/
   - Vector Search: https://docs.opensearch.org/latest/vector-search/

üìñ Next Steps:
   - Experiment with different search queries
   - Adjust weights and normalization techniques
   - Try combining multiple features (e.g., sort + collapse + post_filter)
   - Monitor performance and adjust pagination_depth as needed
    """)
except Exception as e:
    print(f"Note: Could not gather statistics: {e}")

print("\n‚úÖ All examples completed successfully!")
print("="*80)


üéì HYBRID SEARCH DEMONSTRATION COMPLETE

‚ú® Summary Statistics:
   - Index Name: hybrid_search_products
   - Documents Indexed: 8
   - Storage Size: 124.34 KB
   - Search Pipeline: hybrid-search-pipeline
   - Ingest Pipeline: nlp-ingest-pipeline
   - Model ID: vrPcaJoBzyzo5mmPncRH
   - Model Group ID: aPvcaJoBbVynCTYZmvrv

üìö Topics Covered:
   1. ‚úÖ Basic Hybrid Search (keyword + vector)
   2. ‚úÖ Sorting (by price, rating, custom fields)
   3. ‚úÖ Pagination (from/size with pagination_depth)
   4. ‚úÖ Search After (cursor-based navigation)
   5. ‚úÖ Collapse (deduplication by field)
   6. ‚úÖ Post-Filtering (narrow after ranking)
   7. ‚úÖ Aggregations (facets, statistics)
   8. ‚úÖ Inner Hits (nested results)
   9. ‚úÖ Explain (score analysis)

üîó Documentation:
   - Hybrid Search: https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/
   - Search Pipelines: https://docs.opensearch.org/latest/search-plugins/search-pipelines/
   - Vector Search: https://do