# üìä LogPatternTool - Extract Patterns from Log Data

```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#D35400', 'primaryTextColor':'#fff', 'primaryBorderColor':'#BA4A00', 'lineColor':'#F39C12', 'secondaryColor':'#3498DB', 'tertiaryColor':'#27AE60', 'fontSize':'16px'}}}%%
graph TB
    A[üìã Log Data] --> B[ü§ñ Flow Agent]
    B --> C{üîç LogPatternTool}
    C --> D[üìä Sample Logs]
    D --> E[üß† Pattern Detection]
    E --> F[üéØ Pattern Extraction]
    F --> G[üìà Pattern Frequency]
    G --> H[üìã Top N Patterns]
    
    style A fill:#3498DB,stroke:#2980B9,color:#fff
    style C fill:#D35400,stroke:#BA4A00,color:#fff
    style E fill:#9B59B6,stroke:#8E44AD,color:#fff
    style G fill:#E67E22,stroke:#D35400,color:#fff
    style H fill:#27AE60,stroke:#229954,color:#fff
```

## üìö Learning Objectives

1. ‚úÖ Discover **recurring patterns** in log data automatically
2. ‚úÖ Use **DSL or PPL queries** to extract logs
3. ‚úÖ Configure **sample size** and **top N patterns**
4. ‚úÖ Analyze **log templates** and **frequencies**
5. ‚úÖ Identify **common issues** and **anomalies**

---

## üéØ What is LogPatternTool?

**LogPatternTool** automatically discovers patterns in unstructured log data:
- üîç **Pattern Detection**: Finds recurring log templates
- üìä **Frequency Analysis**: Counts pattern occurrences
- üéØ **Anomaly Detection**: Identifies unusual patterns
- üìà **Trend Analysis**: Tracks pattern changes over time

**Example**:
```
Raw Logs:
  "User john logged in from 192.168.1.1"
  "User mary logged in from 192.168.1.5"
  "User bob logged in from 192.168.1.8"

Pattern Detected:
  "User <*> logged in from <*>" (count: 3)
```

---

## Step 1: Import Libraries

In [1]:
import sys
import json
from datetime import datetime

sys.path.append('..')
from agent_helpers import (
    get_os_client,
    create_flow_agent,
    execute_agent,
    cleanup_resources
)

print("‚úÖ Libraries imported!")

‚úÖ Libraries imported!


## Step 2: Initialize Client

In [2]:
client = get_os_client()
print("‚úÖ Client ready")

‚úÖ Client ready


## Step 3: Create Sample Log Index

In [3]:
index_name = "system_logs"

if client.indices.exists(index=index_name):
    client.indices.delete(index=index_name)

client.indices.create(index=index_name)

# Sample log messages with patterns
logs = [
    {"timestamp": "2025-11-09T10:00:00Z", "message": "User john logged in from 192.168.1.1"},
    {"timestamp": "2025-11-09T10:01:00Z", "message": "User mary logged in from 192.168.1.5"},
    {"timestamp": "2025-11-09T10:02:00Z", "message": "User bob logged in from 192.168.1.8"},
    {"timestamp": "2025-11-09T10:03:00Z", "message": "Database connection failed for host db01"},
    {"timestamp": "2025-11-09T10:04:00Z", "message": "Database connection failed for host db02"},
    {"timestamp": "2025-11-09T10:05:00Z", "message": "API request timeout after 30 seconds"},
    {"timestamp": "2025-11-09T10:06:00Z", "message": "API request timeout after 45 seconds"},
    {"timestamp": "2025-11-09T10:07:00Z", "message": "API request timeout after 60 seconds"},
    {"timestamp": "2025-11-09T10:08:00Z", "message": "Memory usage at 85% on server web01"},
    {"timestamp": "2025-11-09T10:09:00Z", "message": "Memory usage at 92% on server web02"},
]

for log in logs:
    client.index(index=index_name, body=log, refresh=True)

print(f"‚úÖ Created {len(logs)} log entries")

‚úÖ Created 10 log entries


## Step 4: Create Agent with LogPatternTool (DSL Query)

In [4]:
# DSL query to extract all logs
dsl_query = {
    "query": {"match_all": {}},
    "_source": ["message"]
}

tools = [{
    "type": "LogPatternTool",
    "parameters": {
        "sample_log_size": 100,
        "top_n_pattern": 5,
        "input": json.dumps(dsl_query)
    }
}]

agent_id = create_flow_agent(
    client, "Log_Pattern_Agent",
    "Discovers patterns in log data",
    tools
)
print(f"‚úÖ Log pattern agent created: {agent_id}")

   Registering flow agent: Log_Pattern_Agent...
   ‚úì Agent registered: e7UFb5oBFJiTVjgy4ZSF
‚úÖ Log pattern agent created: e7UFb5oBFJiTVjgy4ZSF


## Step 5: Test Case 1 - Discover All Patterns

In [5]:
parameters = {"index": index_name}

print("üîç Analyzing log patterns...")
print("="*60)
response = execute_agent(client, agent_id, parameters)
print("\nüìä Discovered Patterns:")
print(json.dumps(response, indent=2))

üîç Analyzing log patterns...

üìä Discovered Patterns:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "[{\"pattern\":\"Database connection failed for host <*>\",\"total count\":2,\"sample logs\":[\"Database connection failed for host db01\",\"Database connection failed for host db02\"]},{\"pattern\":\"User john logged in from <*IP*>\",\"total count\":1,\"sample logs\":[\"User john logged in from 192.168.1.1\"]},{\"pattern\":\"Memory usage at 85% on server <*>\",\"total count\":1,\"sample logs\":[\"Memory usage at 85% on server web01\"]},{\"pattern\":\"User bob logged in from <*IP*>\",\"total count\":1,\"sample logs\":[\"User bob logged in from 192.168.1.8\"]},{\"pattern\":\"API request timeout after 60 seconds\",\"total count\":1,\"sample logs\":[\"API request timeout after 60 seconds\"]}]"
        }
      ]
    }
  ]
}


## Step 6: Create Agent with DSL Query

In [15]:
## Step 6.5: Create Agent Using Direct API Call

# Create agent using direct transport API call with DSL query
dsl_query_errors = {
    "query": {
        "bool": {
            "should": [
                {"match": {"message": "failed"}},
                {"match": {"message": "timeout"}}
            ]
        }
    },
    "_source": ["message"]
}

agent_response = client.transport.perform_request(
    "POST",
    "/_plugins/_ml/agents/_register",
    body={
        "name": "Test_Agent_For_Log_Pattern_Tool",
        "type": "flow",
        "description": "this is a test agent for the LogPatternTool",
        "memory": {
            "type": "demo"
        },
        "tools": [
            {
                "type": "LogPatternTool",
                "parameters": {
                    "sample_log_size": 50,
                    "top_n_pattern": 3,
                    "input": json.dumps(dsl_query_errors)
                }
            }
        ]
    }
)

agent_id_direct = agent_response["agent_id"]
print(f"‚úÖ Direct API agent created: {agent_id_direct}")
print(f"üìã Full response: {json.dumps(agent_response, indent=2)}")

‚úÖ Direct API agent created: f7UMb5oBFJiTVjgyWpQE
üìã Full response: {
  "agent_id": "f7UMb5oBFJiTVjgyWpQE"
}


## Step 7: Test Case 2 - Error Patterns Only

In [16]:
parameters = {"index": index_name}

print("üîç Analyzing error patterns...")
print("="*60)
response = execute_agent(client, agent_id_direct, parameters)
print("\nüìä Error Patterns:")
print(json.dumps(response, indent=2))

üîç Analyzing error patterns...

üìä Error Patterns:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "[{\"pattern\":\"Database connection failed for host <*>\",\"total count\":2,\"sample logs\":[\"Database connection failed for host db01\",\"Database connection failed for host db02\"]},{\"pattern\":\"API request timeout after 60 seconds\",\"total count\":1,\"sample logs\":[\"API request timeout after 60 seconds\"]},{\"pattern\":\"API request timeout after 45 seconds\",\"total count\":1,\"sample logs\":[\"API request timeout after 45 seconds\"]}]"
        }
      ]
    }
  ]
}


## Step 8: Add More Diverse Logs

In [17]:
# Add more varied logs
additional_logs = [
    {"timestamp": "2025-11-09T10:10:00Z", "message": "Disk space low on volume /dev/sda1"},
    {"timestamp": "2025-11-09T10:11:00Z", "message": "Disk space low on volume /dev/sdb1"},
    {"timestamp": "2025-11-09T10:12:00Z", "message": "Backup completed successfully in 120 seconds"},
    {"timestamp": "2025-11-09T10:13:00Z", "message": "Backup completed successfully in 135 seconds"},
    {"timestamp": "2025-11-09T10:14:00Z", "message": "User alice logged in from 10.0.0.15"},
]

for log in additional_logs:
    client.index(index=index_name, body=log, refresh=True)

print(f"‚úÖ Added {len(additional_logs)} more logs")

‚úÖ Added 5 more logs


## Step 9: Test Case 3 - Updated Pattern Analysis

In [18]:
parameters = {"index": index_name}

print("üîç Re-analyzing patterns with updated data...")
print("="*60)
response = execute_agent(client, agent_id, parameters)
print("\nüìä Updated Pattern Analysis:")
print(json.dumps(response, indent=2))

üîç Re-analyzing patterns with updated data...

üìä Updated Pattern Analysis:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "[{\"pattern\":\"Disk space low on volume /dev/<*>\",\"total count\":2,\"sample logs\":[\"Disk space low on volume /dev/sda1\",\"Disk space low on volume /dev/sdb1\"]},{\"pattern\":\"Database connection failed for host <*>\",\"total count\":2,\"sample logs\":[\"Database connection failed for host db01\",\"Database connection failed for host db02\"]},{\"pattern\":\"Backup completed successfully in 135 seconds\",\"total count\":1,\"sample logs\":[\"Backup completed successfully in 135 seconds\"]},{\"pattern\":\"User john logged in from <*IP*>\",\"total count\":1,\"sample logs\":[\"User john logged in from 192.168.1.1\"]},{\"pattern\":\"User alice logged in from <*IP*>\",\"total count\":1,\"sample logs\":[\"User alice logged in from 10.0.0.15\"]}]"
        }
      ]
    }
  ]
}


## üéì Key Takeaways

### What We Learned:

1. **LogPatternTool Capabilities**:
   - ‚úÖ Automatically discovers recurring patterns
   - ‚úÖ Groups similar log messages
   - ‚úÖ Counts pattern frequencies
   - ‚úÖ Supports DSL and PPL queries

2. **Configuration Parameters**:
   ```python
   {
       "type": "LogPatternTool",
       "parameters": {
           "sample_log_size": 100,    # Number of logs to analyze
           "top_n_pattern": 5,         # Return top 5 patterns
           "input": query              # DSL or PPL query
       }
   }
   ```

3. **Query Types**:
   ```python
   # DSL Query
   dsl_query = {
       "query": {"match_all": {}},
       "_source": ["message"]
   }
   
   # PPL Query
   ppl_query = "source=logs | where level='ERROR' | fields message"
   ```

4. **Pattern Examples**:
   ```
   Original Logs:
     "User john logged in from 192.168.1.1"
     "User mary logged in from 192.168.1.5"
   
   Pattern:
     "User <*> logged in from <*>"
     Count: 2
   ```

5. **Use Cases**:
   - üîç **Troubleshooting**: Find recurring errors
   - üìä **Monitoring**: Track common events
   - üéØ **Anomaly Detection**: Identify unusual patterns
   - üìà **Trend Analysis**: Monitor pattern changes

### Common Patterns Discovered:

| Pattern Template | Example Count | Meaning |
|-----------------|---------------|----------|
| User <*> logged in from <*> | 4 | Login events |
| Database connection failed for host <*> | 2 | DB errors |
| API request timeout after <*> seconds | 3 | Timeout issues |
| Memory usage at <*>% on server <*> | 2 | Resource alerts |

### Best Practices:

- ‚úÖ **Sample Size**: Balance accuracy vs performance (50-1000 logs)
- ‚úÖ **Time Windows**: Analyze recent logs for current issues
- ‚úÖ **Top N**: Start with 5-10 patterns, adjust as needed
- ‚úÖ **Field Selection**: Use `_source` to specify message fields
- ‚úÖ **Filters**: Use PPL/DSL to focus on specific log types

### Combining with Other Tools:

```python
# Complete log analysis pipeline
tools = [
    {"type": "LogPatternTool", ...},           # 1. Find patterns
    {"type": "LogPatternAnalysisTool", ...},  # 2. Deep analysis
    {"type": "MLModelTool", ...}              # 3. Explain findings
]
```

---

## üßπ Cleanup

In [None]:
# # cleanup_resources(
# #     client=client,
# #     agent_ids=[agent_id, agent_id_ppl]
# # )
# # client.indices.delete(index=index_name)
# # print("‚úÖ Cleanup complete!")

## üöÄ Next Steps

- **LogPatternAnalysisTool**: Advanced pattern analysis
- **PPLTool**: Custom log queries
- **MLModelTool**: Explain pattern insights

üìö [LogPatternTool Documentation](https://opensearch.org/docs/latest/ml-commons-plugin/agents-tools/tools/log-pattern-tool/)