# üó∫Ô∏è IndexMappingTool - Retrieve Index Mappings and Settings

```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#2ECC71', 'primaryTextColor':'#fff', 'primaryBorderColor':'#27AE60', 'lineColor':'#F39C12', 'secondaryColor':'#3498DB', 'tertiaryColor':'#E74C3C', 'fontSize':'16px'}}}%%
graph TB
    A[üë§ User Query<br/>What fields exist?] --> B[ü§ñ Flow Agent]
    B --> C{üó∫Ô∏è IndexMappingTool}
    C --> D[üìã Get Mappings API]
    C --> E[‚öôÔ∏è Get Settings API]
    D --> F[üìä Field Info]
    E --> F
    F --> G[üì§ Formatted Response]
    
    style A fill:#3498DB,stroke:#2980B9,color:#fff
    style C fill:#2ECC71,stroke:#27AE60,color:#fff
    style D fill:#E67E22,stroke:#D35400,color:#fff
    style E fill:#E67E22,stroke:#D35400,color:#fff
    style G fill:#27AE60,stroke:#229954,color:#fff
```

## üìö Learning Objectives

In this notebook, you'll learn:
1. ‚úÖ How to use **IndexMappingTool** to retrieve index structure information
2. ‚úÖ Understanding index **mappings** (field types, analyzers)
3. ‚úÖ Understanding index **settings** (shards, replicas, refresh interval)
4. ‚úÖ How to query multiple indices at once
5. ‚úÖ Practical use cases for schema discovery

---

## üéØ What is IndexMappingTool?

**IndexMappingTool** retrieves the **mappings** (field definitions) and **settings** (configuration) for one or more OpenSearch indices. This is essential for:
- üîç **Schema Discovery**: Understanding what fields exist in an index
- üèóÔ∏è **Query Planning**: Knowing field types before building queries
- üîß **Troubleshooting**: Checking index configuration issues
- üìä **Data Analysis**: Understanding data structure before analysis

**Key Features**:
- Retrieves field names, types, and analyzer configurations
- Returns index settings (shards, replicas, etc.)
- Supports wildcards to query multiple indices
- Works without requiring an LLM model

---

## Step 1: Import Required Libraries

In [1]:
import sys
import json
from datetime import datetime

# Add parent directory to path to import helper functions
sys.path.append('..')
from agent_helpers import (
    get_os_client,
    create_flow_agent,
    execute_agent,
    cleanup_resources
)

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## Step 2: Initialize OpenSearch Client

In [2]:
# Initialize OpenSearch client
client = get_os_client()

# Verify connection
info = client.info()
print(f"‚úÖ Connected to OpenSearch cluster: {info['cluster_name']}")
print(f"üìä Version: {info['version']['number']}")

‚úÖ Connected to OpenSearch cluster: docker-cluster
üìä Version: 3.3.0


## Step 3: Create Sample Indices for Testing

Let's create a few sample indices with different field types to demonstrate the tool's capabilities.

In [3]:
# Create a product catalog index
product_index = "product_catalog"

product_mapping = {
    "mappings": {
        "properties": {
            "product_name": {"type": "text", "analyzer": "standard"},
            "product_id": {"type": "keyword"},
            "price": {"type": "float"},
            "category": {"type": "keyword"},
            "description": {"type": "text"},
            "in_stock": {"type": "boolean"},
            "created_date": {"type": "date"},
            "tags": {"type": "keyword"},
            "location": {"type": "geo_point"}
        }
    },
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 1,
        "refresh_interval": "5s"
    }
}

# Delete if exists and create new
if client.indices.exists(index=product_index):
    client.indices.delete(index=product_index)
    
client.indices.create(index=product_index, body=product_mapping)
print(f"‚úÖ Created index: {product_index}")

# Create a customer reviews index
review_index = "customer_reviews"

review_mapping = {
    "mappings": {
        "properties": {
            "review_id": {"type": "keyword"},
            "product_id": {"type": "keyword"},
            "customer_name": {"type": "text"},
            "rating": {"type": "integer"},
            "review_text": {"type": "text", "analyzer": "english"},
            "verified_purchase": {"type": "boolean"},
            "review_date": {"type": "date"},
            "helpful_votes": {"type": "integer"}
        }
    },
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    }
}

if client.indices.exists(index=review_index):
    client.indices.delete(index=review_index)
    
client.indices.create(index=review_index, body=review_mapping)
print(f"‚úÖ Created index: {review_index}")

# Add some sample documents
client.index(index=product_index, body={
    "product_name": "Laptop Pro 15",
    "product_id": "PROD-001",
    "price": 1299.99,
    "category": "Electronics",
    "description": "High-performance laptop with 16GB RAM",
    "in_stock": True,
    "created_date": "2025-01-15",
    "tags": ["laptop", "electronics", "computers"],
    "location": {"lat": 37.7749, "lon": -122.4194}
})

client.index(index=review_index, body={
    "review_id": "REV-001",
    "product_id": "PROD-001",
    "customer_name": "John Smith",
    "rating": 5,
    "review_text": "Excellent laptop! Very fast and reliable.",
    "verified_purchase": True,
    "review_date": "2025-01-20",
    "helpful_votes": 15
})

client.indices.refresh(index=[product_index, review_index])
print("‚úÖ Sample documents indexed")

‚úÖ Created index: product_catalog
‚úÖ Created index: customer_reviews
‚úÖ Sample documents indexed


## Step 4: Create Flow Agent with IndexMappingTool

The IndexMappingTool is a **simple tool** that doesn't require an LLM model. It directly retrieves index structure information.

In [7]:
# Define the tool configuration
tools = [
    {
        "type": "IndexMappingTool",
        "parameters": {
            "index": ["${parameters.index}"],
            "input": "${parameters.question}"
        }
    }
]

# Create the flow agent
agent_id = create_flow_agent(
    client,
    "Index_Mapping_Agent",
    "An agent that retrieves index mappings and settings to help understand index structure",
    tools
)

print(f"‚úÖ Flow agent created with ID: {agent_id}")
print(f"üîß Tool configured: IndexMappingTool")

   Registering flow agent: Index_Mapping_Agent...
   ‚úì Agent registered: 8oFVa5oBAjDPEnaCHJaX
‚úÖ Flow agent created with ID: 8oFVa5oBAjDPEnaCHJaX
üîß Tool configured: IndexMappingTool


## Step 5: Test Case 1 - Get Mappings for Product Index

In [8]:
# Query the mappings for product_catalog index
parameters = {
    "index": "product_catalog",
    "question": "What fields are in this index?"
}

print("‚ùì Question: What fields are in the product_catalog index?")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Agent Response:")
print(json.dumps(response, indent=2))

‚ùì Question: What fields are in the product_catalog index?

üìä Agent Response:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "index: product_catalog\n\nmappings:\nproperties={category={type=keyword}, created_date={type=date}, description={type=text}, in_stock={type=boolean}, location={type=geo_point}, price={type=float}, product_id={type=keyword}, product_name={type=text, analyzer=standard}, tags={type=keyword}}\n\n\nsettings:\nindex.creation_date=1762737261562\nindex.number_of_replicas=1\nindex.number_of_shards=2\nindex.provided_name=product_catalog\nindex.refresh_interval=5s\nindex.replication.type=DOCUMENT\nindex.uuid=1Qd3TCc5TMeDtXjtsJP5pw\nindex.version.created=137247827\n\n\n"
        }
      ]
    }
  ]
}


## Step 6: Test Case 2 - Get Settings for Review Index

In [9]:
# Query settings for customer_reviews index
parameters = {
    "index": "customer_reviews",
    "question": "What are the settings for this index?"
}

print("‚ùì Question: What are the settings for customer_reviews index?")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\n‚öôÔ∏è Index Settings:")
print(json.dumps(response, indent=2))

‚ùì Question: What are the settings for customer_reviews index?

‚öôÔ∏è Index Settings:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "index: customer_reviews\n\nmappings:\nproperties={customer_name={type=text}, helpful_votes={type=integer}, product_id={type=keyword}, rating={type=integer}, review_date={type=date}, review_id={type=keyword}, review_text={type=text, analyzer=english}, verified_purchase={type=boolean}}\n\n\nsettings:\nindex.creation_date=1762737261717\nindex.number_of_replicas=0\nindex.number_of_shards=1\nindex.provided_name=customer_reviews\nindex.replication.type=DOCUMENT\nindex.uuid=gIyJWZp3TzGiTTU4ZrkYKg\nindex.version.created=137247827\n\n\n"
        }
      ]
    }
  ]
}


## Step 7: Test Case 3 - Query Multiple Indices with Wildcard

In [10]:
# Use wildcard pattern to get info from both indices
parameters = {
    "index": "product_*,customer_*",
    "question": "What fields are common across product and customer indices?"
}

print("‚ùì Question: What fields are common across product_* and customer_* indices?")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüìä Multi-Index Mappings:")
print(json.dumps(response, indent=2))

‚ùì Question: What fields are common across product_* and customer_* indices?

üìä Multi-Index Mappings:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "There were no results searching the index parameter [product_*,customer_*]."
        }
      ]
    }
  ]
}


## Step 8: Test Case 4 - Analyze Field Types

In [None]:
# Get specific field type information
parameters = {
    "index": "product_catalog",
    "question": "What type is the price field and is it searchable?"
}

print("‚ùì Question: What type is the price field in product_catalog?")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüîç Field Analysis:")
print(json.dumps(response, indent=2))

from IPython.display import Markdown, display

# Extract and display the response in human-readable format
result_content = json.loads(response['inference_results'][0]['output'][0]['result'])['choices'][0]['message']['content']
display(Markdown(result_content))

‚ùì Question: What type is the price field in product_catalog?

üîç Field Analysis:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "index: product_catalog\n\nmappings:\nproperties={category={type=keyword}, created_date={type=date}, description={type=text}, in_stock={type=boolean}, location={type=geo_point}, price={type=float}, product_id={type=keyword}, product_name={type=text, analyzer=standard}, tags={type=keyword}}\n\n\nsettings:\nindex.creation_date=1762737261562\nindex.number_of_replicas=1\nindex.number_of_shards=2\nindex.provided_name=product_catalog\nindex.refresh_interval=5s\nindex.replication.type=DOCUMENT\nindex.uuid=1Qd3TCc5TMeDtXjtsJP5pw\nindex.version.created=137247827\n\n\n"
        }
      ]
    }
  ]
}


## Step 9: Test Case 5 - Check Analyzer Configuration

In [None]:
# Check text field analyzers
parameters = {
    "index": "customer_reviews",
    "question": "What analyzer is used for the review_text field?"
}

print("‚ùì Question: What analyzer is used for review_text in customer_reviews?")
print("="*60)

response = execute_agent(client, agent_id, parameters)

print("\nüîß Analyzer Configuration:")
print(json.dumps(response, indent=2))

from IPython.display import Markdown, display

# Extract and display the response in human-readable format
result_content = json.loads(response['inference_results'][0]['output'][0]['result'])['choices'][0]['message']['content']
display(Markdown(result_content))

‚ùì Question: What analyzer is used for review_text in customer_reviews?

üîß Analyzer Configuration:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "index: customer_reviews\n\nmappings:\nproperties={customer_name={type=text}, helpful_votes={type=integer}, product_id={type=keyword}, rating={type=integer}, review_date={type=date}, review_id={type=keyword}, review_text={type=text, analyzer=english}, verified_purchase={type=boolean}}\n\n\nsettings:\nindex.creation_date=1762737261717\nindex.number_of_replicas=0\nindex.number_of_shards=1\nindex.provided_name=customer_reviews\nindex.replication.type=DOCUMENT\nindex.uuid=gIyJWZp3TzGiTTU4ZrkYKg\nindex.version.created=137247827\n\n\n"
        }
      ]
    }
  ]
}


## üéì Key Takeaways

### What We Learned:

1. **IndexMappingTool Capabilities**:
   - ‚úÖ Retrieves complete field mappings (names, types, analyzers)
   - ‚úÖ Returns index settings (shards, replicas, refresh interval)
   - ‚úÖ Supports multiple indices via wildcards
   - ‚úÖ No LLM model required (simple tool)

2. **Understanding Mappings**:
   - **Field Types**: text, keyword, integer, float, boolean, date, geo_point, etc.
   - **Analyzers**: standard, english, custom analyzers for text processing
   - **Indexing Options**: stored, doc_values, norms configuration

3. **Understanding Settings**:
   - **Shards**: Number of primary shards (cannot be changed after creation)
   - **Replicas**: Number of replica shards (can be changed dynamically)
   - **Refresh Interval**: How often index is refreshed for search

4. **Practical Use Cases**:
   - üîç **Query Planning**: Know field types before building queries
   - üèóÔ∏è **Schema Discovery**: Explore unknown index structures
   - üîß **Troubleshooting**: Diagnose mapping or settings issues
   - üìä **Data Modeling**: Understand existing data structures

### Best Practices:

- ‚úÖ Use **keyword** type for exact match searches (IDs, categories)
- ‚úÖ Use **text** type for full-text search (descriptions, content)
- ‚úÖ Choose appropriate **analyzers** based on language and use case
- ‚úÖ Set **number_of_replicas** based on availability needs
- ‚úÖ Adjust **refresh_interval** based on real-time requirements

### Performance Tips:

- ‚ö° Use wildcards carefully - querying many indices can be slow
- ‚ö° Cache mapping information if querying repeatedly
- ‚ö° Consider using index templates for consistent mappings

---

## üßπ Cleanup (Optional)

Uncomment and run this cell to clean up resources created in this notebook.

In [None]:
# # Delete the flow agent
# cleanup_resources(
#     client=client,
#     agent_ids=[agent_id]
# )

# # Delete test indices
# client.indices.delete(index="product_catalog", ignore=[404])
# client.indices.delete(index="customer_reviews", ignore=[404])

# print("‚úÖ Cleanup complete!")

## üöÄ Next Steps

Now that you understand IndexMappingTool, explore:
- **SearchIndexTool**: Execute DSL queries on indices
- **QueryPlanningTool**: Convert natural language to DSL queries
- **VectorDBTool**: Perform semantic search with embeddings
- **RAGTool**: Build retrieval-augmented generation pipelines

---

üìö **Resources**:
- [OpenSearch Mapping Documentation](https://opensearch.org/docs/latest/field-types/)
- [Index Settings Reference](https://opensearch.org/docs/latest/api-reference/index-apis/create-index/)
- [ML Commons Agent Tools](https://opensearch.org/docs/latest/ml-commons-plugin/agents-tools/)