# üîç SearchIndexTool - Execute DSL Queries on Indices

```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#2980B9', 'primaryTextColor':'#fff', 'primaryBorderColor':'#21618C', 'lineColor':'#F39C12', 'secondaryColor':'#3498DB', 'tertiaryColor':'#27AE60', 'fontSize':'16px'}}}%%
graph TB
    A[üë§ User Query<br/>Find products > $100] --> B[ü§ñ Flow Agent]
    B --> C{üîç SearchIndexTool}
    C --> D[üìã DSL Query]
    D --> E[üîé Execute Search]
    E --> F[üìö OpenSearch Index]
    F --> G[üìä Matching Documents]
    G --> H[üì§ Results]
    
    style A fill:#3498DB,stroke:#2980B9,color:#fff
    style C fill:#2980B9,stroke:#21618C,color:#fff
    style D fill:#E67E22,stroke:#D35400,color:#fff
    style F fill:#9B59B6,stroke:#8E44AD,color:#fff
    style H fill:#27AE60,stroke:#229954,color:#fff
```

## üìö Learning Objectives

In this notebook, you'll learn:
1. ‚úÖ How to use **SearchIndexTool** to execute DSL queries
2. ‚úÖ Building **OpenSearch DSL** (Domain Specific Language) queries
3. ‚úÖ Filtering, sorting, and pagination techniques
4. ‚úÖ Aggregations and analytics queries
5. ‚úÖ Best practices for search performance

---

## üéØ What is SearchIndexTool?

**SearchIndexTool** executes **OpenSearch DSL queries** directly on indices. It's the most powerful search tool for:
- üîç **Complex Searches**: Boolean queries, filters, ranges
- üìä **Aggregations**: Calculate statistics, group data
- üéØ **Precise Control**: Full access to OpenSearch query capabilities
- ‚ö° **Performance**: Direct query execution without LLM overhead

**Key Features**:
- Accepts full DSL query JSON as input
- Supports all OpenSearch query types (match, term, bool, range, etc.)
- Enables aggregations, sorting, and field filtering
- No LLM required (simple tool)

---

## Step 1: Import Required Libraries

In [1]:
import sys
import json
from datetime import datetime, timedelta

# Add parent directory to path to import helper functions
sys.path.append('..')
from agent_helpers import (
    get_os_client,
    create_flow_agent,
    execute_agent,
    cleanup_resources
)

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## Step 2: Initialize OpenSearch Client

In [2]:
# Initialize OpenSearch client
client = get_os_client()

# Verify connection
info = client.info()
print(f"‚úÖ Connected to OpenSearch cluster: {info['cluster_name']}")
print(f"üìä Version: {info['version']['number']}")

‚úÖ Connected to OpenSearch cluster: docker-cluster
üìä Version: 3.3.0


## Step 3: Create Sample E-Commerce Index

Let's create a realistic e-commerce product index with various data types for comprehensive testing.

In [3]:
# Create ecommerce products index
index_name = "ecommerce_products"

# Delete if exists
if client.indices.exists(index=index_name):
    client.indices.delete(index=index_name)

# Create index with mappings
index_body = {
    "mappings": {
        "properties": {
            "product_name": {"type": "text"},
            "product_id": {"type": "keyword"},
            "category": {"type": "keyword"},
            "brand": {"type": "keyword"},
            "price": {"type": "float"},
            "rating": {"type": "float"},
            "reviews_count": {"type": "integer"},
            "in_stock": {"type": "boolean"},
            "discount_percent": {"type": "integer"},
            "created_date": {"type": "date"},
            "tags": {"type": "keyword"}
        }
    }
}

client.indices.create(index=index_name, body=index_body)
print(f"‚úÖ Created index: {index_name}")

# Sample product data
products = [
    {
        "product_name": "Laptop Pro 15 inch",
        "product_id": "ELEC-001",
        "category": "Electronics",
        "brand": "TechBrand",
        "price": 1299.99,
        "rating": 4.5,
        "reviews_count": 245,
        "in_stock": True,
        "discount_percent": 10,
        "created_date": "2025-01-15",
        "tags": ["laptop", "computers", "work"]
    },
    {
        "product_name": "Wireless Mouse",
        "product_id": "ELEC-002",
        "category": "Electronics",
        "brand": "TechBrand",
        "price": 29.99,
        "rating": 4.2,
        "reviews_count": 512,
        "in_stock": True,
        "discount_percent": 15,
        "created_date": "2025-01-10",
        "tags": ["accessories", "wireless", "office"]
    },
    {
        "product_name": "Running Shoes",
        "product_id": "SPORT-001",
        "category": "Sports",
        "brand": "SportyFit",
        "price": 89.99,
        "rating": 4.7,
        "reviews_count": 1024,
        "in_stock": True,
        "discount_percent": 20,
        "created_date": "2025-01-20",
        "tags": ["running", "shoes", "fitness"]
    },
    {
        "product_name": "Yoga Mat Premium",
        "product_id": "SPORT-002",
        "category": "Sports",
        "brand": "YogaPro",
        "price": 45.00,
        "rating": 4.8,
        "reviews_count": 768,
        "in_stock": False,
        "discount_percent": 0,
        "created_date": "2025-01-05",
        "tags": ["yoga", "fitness", "mat"]
    },
    {
        "product_name": "Coffee Maker Deluxe",
        "product_id": "HOME-001",
        "category": "Home",
        "brand": "BrewMaster",
        "price": 159.99,
        "rating": 4.4,
        "reviews_count": 342,
        "in_stock": True,
        "discount_percent": 5,
        "created_date": "2025-01-12",
        "tags": ["coffee", "kitchen", "appliances"]
    },
    {
        "product_name": "Smart Watch Series 5",
        "product_id": "ELEC-003",
        "category": "Electronics",
        "brand": "TechBrand",
        "price": 399.00,
        "rating": 4.6,
        "reviews_count": 1876,
        "in_stock": True,
        "discount_percent": 12,
        "created_date": "2025-01-18",
        "tags": ["wearable", "fitness", "smart"]
    },
    {
        "product_name": "Office Desk Ergonomic",
        "product_id": "HOME-002",
        "category": "Home",
        "brand": "OfficePro",
        "price": 299.99,
        "rating": 4.3,
        "reviews_count": 156,
        "in_stock": True,
        "discount_percent": 0,
        "created_date": "2025-01-08",
        "tags": ["furniture", "office", "desk"]
    }
]

# Bulk index documents
for product in products:
    client.index(index=index_name, body=product, refresh=True)

print(f"‚úÖ Indexed {len(products)} products")

‚úÖ Created index: ecommerce_products
‚úÖ Indexed 7 products


## Step 4: Create Flow Agent with SearchIndexTool

In [4]:
# Define the tool configuration with template matching RAG examples
tools = [
    {
        "type": "SearchIndexTool",
        "parameters": {
            "input": '{"index": "${parameters.index}", "query": ${parameters.query} }'
        }
    }
]

# Create the flow agent
agent_id = create_flow_agent(
    client=client,
    agent_name="Search_Index_Agent",
    description="An agent that executes DSL queries on OpenSearch indices",
    tools=tools
)

print(f"‚úÖ Flow agent created with ID: {agent_id}")
print(f"üîß Tool configured: SearchIndexTool")

   Registering flow agent: Search_Index_Agent...
   ‚úì Agent registered: _ltoiZsBLQ1mV2UNRSg5
‚úÖ Flow agent created with ID: _ltoiZsBLQ1mV2UNRSg5
üîß Tool configured: SearchIndexTool


## Step 5: Test Case 1 - Simple Match Query

In [5]:
# Search for products with "laptop" in the name
# Based on RAG examples, pass index and query as separate parameters
query_dsl = {
    "match": {
        "product_name": "laptop"
    }
}

parameters = {
    "index": index_name,
    "query": {
        "query": query_dsl,
        "size": 10,
        "_source": ["product_name", "brand", "price", "rating"]
    }
}

print("Query: Find products with 'laptop' in the name")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nSearch Results:")
print(json.dumps(response, indent=2))

Query: Find products with 'laptop' in the name

Search Results:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":1299.99,\"rating\":4.5,\"product_name\":\"Laptop Pro 15 inch\",\"brand\":\"TechBrand\"},\"_id\":\"91toiZsBLQ1mV2UNICiR\",\"_score\":0.6695906}\n"
        }
      ]
    }
  ]
}


## Step 6: Test Case 2 - Range Query with Filters

In [6]:
# Find products with price between $50 and $200, in stock
query_dsl = {
    "bool": {
        "must": [
            {
                "range": {
                    "price": {
                        "gte": 50,
                        "lte": 200
                    }
                }
            }
        ],
        "filter": [
            {
                "term": {
                    "in_stock": True
                }
            }
        ]
    }
}

parameters = {
    "index": index_name,
    "query": {
        "query": query_dsl,
        "sort": [{"price": "asc"}],
        "size": 10
    }
}

print("‚ùì Query: Find in-stock products priced between $50-$200")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Filtered Results:")
print(json.dumps(response, indent=2))

‚ùì Query: Find in-stock products priced between $50-$200

üìä Filtered Results:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":89.99,\"product_id\":\"SPORT-001\",\"rating\":4.7,\"created_date\":\"2025-01-20\",\"category\":\"Sports\",\"in_stock\":true,\"discount_percent\":20,\"product_name\":\"Running Shoes\",\"brand\":\"SportyFit\",\"reviews_count\":1024,\"tags\":[\"running\",\"shoes\",\"fitness\"]},\"_id\":\"-VtoiZsBLQ1mV2UNICih\",\"_score\":NaN}\n{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":159.99,\"product_id\":\"HOME-001\",\"rating\":4.4,\"created_date\":\"2025-01-12\",\"category\":\"Home\",\"in_stock\":true,\"discount_percent\":5,\"product_name\":\"Coffee Maker Deluxe\",\"brand\":\"BrewMaster\",\"reviews_count\":342,\"tags\":[\"coffee\",\"kitchen\",\"appliances\"]},\"_id\":\"-1toiZsBLQ1mV2UNICix\",\"_score\":NaN}\n"
        }
      ]
    }
  ]
}

## Step 7: Test Case 3 - Boolean Query with Multiple Conditions

In [7]:
# Find Electronics category products from TechBrand with rating > 4.0
query_dsl = {
    "bool": {
        "must": [
            {"term": {"category": "Electronics"}},
            {"term": {"brand": "TechBrand"}}
        ],
        "filter": [
            {"range": {"rating": {"gt": 4.0}}}
        ]
    }
}

parameters = {
    "index": index_name,
    "query": {
        "query": query_dsl,
        "sort": [{"rating": "desc"}],
        "_source": ["product_name", "brand", "price", "rating", "reviews_count"]
    }
}

print("‚ùì Query: Find TechBrand electronics with rating > 4.0")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Boolean Query Results:")
print(json.dumps(response, indent=2))

‚ùì Query: Find TechBrand electronics with rating > 4.0

üìä Boolean Query Results:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":399.0,\"rating\":4.6,\"product_name\":\"Smart Watch Series 5\",\"brand\":\"TechBrand\",\"reviews_count\":1876},\"_id\":\"_FtoiZsBLQ1mV2UNICi4\",\"_score\":NaN}\n{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":1299.99,\"rating\":4.5,\"product_name\":\"Laptop Pro 15 inch\",\"brand\":\"TechBrand\",\"reviews_count\":245},\"_id\":\"91toiZsBLQ1mV2UNICiR\",\"_score\":NaN}\n{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":29.99,\"rating\":4.2,\"product_name\":\"Wireless Mouse\",\"brand\":\"TechBrand\",\"reviews_count\":512},\"_id\":\"-FtoiZsBLQ1mV2UNICia\",\"_score\":NaN}\n"
        }
      ]
    }
  ]
}


## Step 8: Test Case 4 - Aggregations Query

In [8]:
# Calculate average price per category
parameters = {
    "index": index_name,
    "query": {
        "size": 0,
        "aggs": {
            "categories": {
                "terms": {
                    "field": "category",
                    "size": 10
                },
                "aggs": {
                    "avg_price": {
                        "avg": {"field": "price"}
                    },
                    "avg_rating": {
                        "avg": {"field": "rating"}
                    },
                    "total_products": {
                        "value_count": {"field": "product_id"}
                    }
                }
            }
        }
    }
}

print("‚ùì Query: Calculate average price and rating per category")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Aggregation Results:")
print(json.dumps(response, indent=2))

‚ùì Query: Calculate average price and rating per category

üìä Aggregation Results:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": ""
        }
      ]
    }
  ]
}


## Step 9: Test Case 5 - Multi-Match Query with Boosting

In [9]:
# Search across multiple fields with different weights
query_dsl = {
    "multi_match": {
        "query": "fitness smart",
        "fields": ["product_name^3", "tags^2", "category"],
        "type": "best_fields"
    }
}

parameters = {
    "index": index_name,
    "query": {
        "query": query_dsl,
        "size": 5,
        "_source": ["product_name", "category", "tags", "price"]
    }
}

print("‚ùì Query: Search for 'fitness smart' across multiple fields")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Multi-Match Results:")
print(json.dumps(response, indent=2))

‚ùì Query: Search for 'fitness smart' across multiple fields

üìä Multi-Match Results:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":399.0,\"category\":\"Electronics\",\"product_name\":\"Smart Watch Series 5\",\"tags\":[\"wearable\",\"fitness\",\"smart\"]},\"_id\":\"_FtoiZsBLQ1mV2UNICi4\",\"_score\":2.008772}\n"
        }
      ]
    }
  ]
}


## Step 10: Test Case 6 - Terms Query for Multiple Values

In [10]:
# Find products with specific tags
query_dsl = {
    "terms": {
        "tags": ["fitness", "office", "kitchen"]
    }
}

parameters = {
    "index": index_name,
    "query": {
        "query": query_dsl,
        "size": 10,
        "_source": ["product_name", "tags", "category", "price"]
    }
}

print("‚ùì Query: Find products with tags: fitness, office, or kitchen")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Terms Query Results:")
print(json.dumps(response, indent=2))

‚ùì Query: Find products with tags: fitness, office, or kitchen

üìä Terms Query Results:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":29.99,\"category\":\"Electronics\",\"product_name\":\"Wireless Mouse\",\"tags\":[\"accessories\",\"wireless\",\"office\"]},\"_id\":\"-FtoiZsBLQ1mV2UNICia\",\"_score\":1.0}\n{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":89.99,\"category\":\"Sports\",\"product_name\":\"Running Shoes\",\"tags\":[\"running\",\"shoes\",\"fitness\"]},\"_id\":\"-VtoiZsBLQ1mV2UNICih\",\"_score\":1.0}\n{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":45.0,\"category\":\"Sports\",\"product_name\":\"Yoga Mat Premium\",\"tags\":[\"yoga\",\"fitness\",\"mat\"]},\"_id\":\"-ltoiZsBLQ1mV2UNICip\",\"_score\":1.0}\n{\"_index\":\"ecommerce_products\",\"_source\":{\"price\":159.99,\"category\":\"Home\",\"product_name\":\"Coffee Maker Deluxe\",\"t

## üéì Key Takeaways

### What We Learned:

1. **SearchIndexTool Capabilities**:
   - ‚úÖ Execute full DSL queries with complete control
   - ‚úÖ Support for all query types (match, term, bool, range, etc.)
   - ‚úÖ Aggregations for analytics and statistics
   - ‚úÖ Sorting, filtering, and field selection
   - ‚úÖ No LLM required (direct query execution)

2. **Common Query Types**:
   - **match**: Full-text search with analysis
   - **term**: Exact match on keyword fields
   - **range**: Numeric or date ranges
   - **bool**: Combine multiple conditions (must, should, must_not, filter)
   - **multi_match**: Search across multiple fields
   - **terms**: Match any value from a list

3. **Query Structure**:
   ```json
   {
     "index": "index_name",
     "query": { /* query DSL */ },
     "size": 10,
     "sort": [{ "field": "asc" }],
     "_source": ["field1", "field2"],
     "aggs": { /* aggregations */ }
   }
   ```

4. **Practical Use Cases**:
   - üîç **E-commerce Search**: Product filtering, price ranges, categories
   - üìä **Analytics**: Aggregations for business insights
   - üéØ **Precise Filtering**: Complex boolean logic
   - ‚ö° **High Performance**: Direct query without LLM overhead

### Best Practices:

- ‚úÖ Use **filter** context for non-scoring queries (better performance)
- ‚úÖ Use **must** context when relevance scoring matters
- ‚úÖ Limit **size** to avoid large result sets
- ‚úÖ Use **_source** filtering to return only needed fields
- ‚úÖ Index **keyword** fields for exact match and aggregations
- ‚úÖ Index **text** fields for full-text search

### Performance Tips:

- ‚ö° **Filter context** doesn't calculate scores (faster)
- ‚ö° Use **term** queries on keyword fields, not text fields
- ‚ö° Avoid wildcards at the beginning of terms
- ‚ö° Use aggregations instead of large result sets for analytics
- ‚ö° Consider **pagination** for large result sets

### Query Optimization:

```python
# ‚úÖ GOOD: Filter context (no scoring)
{"bool": {"filter": [{"term": {"status": "active"}}]}}

# ‚ùå LESS EFFICIENT: Must context (calculates scores)
{"bool": {"must": [{"term": {"status": "active"}}]}}

# ‚úÖ GOOD: Specific fields
{"_source": ["name", "price"]}

# ‚ùå LESS EFFICIENT: All fields
{"_source": True}
```

---

## üßπ Cleanup (Optional)

Uncomment and run this cell to clean up resources created in this notebook.

In [11]:
# # Delete the flow agent
# cleanup_resources(
#     client=client,
#     agent_ids=[agent_id]
# )

# # Delete test index
# client.indices.delete(index=index_name, ignore=[404])

# print("‚úÖ Cleanup complete!")

## üöÄ Next Steps

Now that you understand SearchIndexTool, explore:
- **QueryPlanningTool**: Convert natural language to DSL queries automatically
- **PPLTool**: Use Piped Processing Language for analytics queries
- **VectorDBTool**: Semantic search with embeddings
- **RAGTool**: Combine search with LLM generation

---

üìö **Resources**:
- [OpenSearch Query DSL](https://opensearch.org/docs/latest/query-dsl/)
- [Boolean Queries](https://opensearch.org/docs/latest/query-dsl/compound/bool/)
- [Aggregations](https://opensearch.org/docs/latest/aggregations/)
- [Search API Reference](https://opensearch.org/docs/latest/api-reference/search/)