# üî¨ LogPatternAnalysisTool - Advanced Log Pattern Analysis

```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#C0392B', 'primaryTextColor':'#fff', 'primaryBorderColor':'#A93226', 'lineColor':'#F39C12', 'secondaryColor':'#3498DB', 'tertiaryColor':'#27AE60', 'fontSize':'16px'}}}%%
graph TB
    A[üìã Time-Series Logs] --> B[ü§ñ Flow Agent]
    B --> C{üî¨ LogPatternAnalysisTool}
    C --> D{Analysis Mode}
    D --> E[üìä Sequence Analysis]
    D --> F[üîÑ Pattern Difference]
    D --> G[üí° Pattern Insights]
    E --> H[üìà Event Sequences]
    F --> I[‚ö° Changed Patterns]
    G --> J[üéØ Anomalies]
    
    style A fill:#3498DB,stroke:#2980B9,color:#fff
    style C fill:#C0392B,stroke:#A93226,color:#fff
    style D fill:#9B59B6,stroke:#8E44AD,color:#fff
    style E fill:#16A085,stroke:#138D75,color:#fff
    style F fill:#E67E22,stroke:#D35400,color:#fff
    style G fill:#27AE60,stroke:#229954,color:#fff
```

## üìö Learning Objectives

1. ‚úÖ Perform **sequence analysis** to find event chains
2. ‚úÖ Compare **pattern differences** between time periods
3. ‚úÖ Generate **insights** from log patterns
4. ‚úÖ Use **trace fields** to correlate events
5. ‚úÖ Detect **anomalies** and **changes** in log behavior

---

## üéØ What is LogPatternAnalysisTool?

**LogPatternAnalysisTool** provides advanced analysis beyond simple pattern discovery:
- üîó **Sequence Analysis**: Find event chains using trace IDs
- ‚ö° **Difference Detection**: Compare baseline vs current patterns
- üí° **Insights Generation**: Identify anomalies and trends
- üìä **Time-Based Analysis**: Analyze patterns over time windows

**Three Analysis Modes**:
1. **Sequence**: Find event sequences (requires trace field)
2. **Difference**: Compare two time periods
3. **Insights**: Anomaly detection on single time period

---

## Step 1: Import Libraries

In [1]:
import sys
import json
from datetime import datetime, timedelta

sys.path.append('..')
from agent_helpers import (
    get_os_client,
    create_flow_agent,
    execute_agent,
    cleanup_resources
)

print("‚úÖ Libraries imported!")

‚úÖ Libraries imported!


## Step 2: Initialize Client

In [2]:
client = get_os_client()
print("‚úÖ Client ready")

‚úÖ Client ready


## Step 3: Create Log Index with Trace IDs

In [3]:
index_name = "transaction_logs"

if client.indices.exists(index=index_name):
    client.indices.delete(index=index_name)

client.indices.create(index=index_name)

# Logs with trace IDs for sequence analysis
now = datetime.utcnow()
baseline_time = now - timedelta(hours=2)
selection_time = now - timedelta(hours=1)

logs = [
    # Baseline period (2 hours ago) - Normal patterns
    {"timestamp": (baseline_time).isoformat() + "Z", "trace_id": "tx001", "message": "Request received"},
    {"timestamp": (baseline_time + timedelta(seconds=1)).isoformat() + "Z", "trace_id": "tx001", "message": "Authentication successful"},
    {"timestamp": (baseline_time + timedelta(seconds=2)).isoformat() + "Z", "trace_id": "tx001", "message": "Database query executed"},
    {"timestamp": (baseline_time + timedelta(seconds=3)).isoformat() + "Z", "trace_id": "tx001", "message": "Response sent"},
    
    {"timestamp": (baseline_time + timedelta(minutes=5)).isoformat() + "Z", "trace_id": "tx002", "message": "Request received"},
    {"timestamp": (baseline_time + timedelta(minutes=5, seconds=1)).isoformat() + "Z", "trace_id": "tx002", "message": "Authentication successful"},
    {"timestamp": (baseline_time + timedelta(minutes=5, seconds=2)).isoformat() + "Z", "trace_id": "tx002", "message": "Database query executed"},
    {"timestamp": (baseline_time + timedelta(minutes=5, seconds=3)).isoformat() + "Z", "trace_id": "tx002", "message": "Response sent"},
    
    # Selection period (1 hour ago) - Anomalous patterns
    {"timestamp": (selection_time).isoformat() + "Z", "trace_id": "tx003", "message": "Request received"},
    {"timestamp": (selection_time + timedelta(seconds=1)).isoformat() + "Z", "trace_id": "tx003", "message": "Authentication failed"},
    {"timestamp": (selection_time + timedelta(seconds=2)).isoformat() + "Z", "trace_id": "tx003", "message": "Request rejected"},
    
    {"timestamp": (selection_time + timedelta(minutes=5)).isoformat() + "Z", "trace_id": "tx004", "message": "Request received"},
    {"timestamp": (selection_time + timedelta(minutes=5, seconds=1)).isoformat() + "Z", "trace_id": "tx004", "message": "Authentication successful"},
    {"timestamp": (selection_time + timedelta(minutes=5, seconds=2)).isoformat() + "Z", "trace_id": "tx004", "message": "Database timeout"},
    {"timestamp": (selection_time + timedelta(minutes=5, seconds=3)).isoformat() + "Z", "trace_id": "tx004", "message": "Error response sent"},
]

for log in logs:
    client.index(index=index_name, body=log, refresh=True)

print(f"‚úÖ Created {len(logs)} transaction logs with trace IDs")
print(f"üìÖ Baseline period: {baseline_time.isoformat()}")
print(f"üìÖ Selection period: {selection_time.isoformat()}")

‚úÖ Created 15 transaction logs with trace IDs
üìÖ Baseline period: 2026-01-04T12:48:06.296129
üìÖ Selection period: 2026-01-04T13:48:06.296129


In [4]:
## Step 3.5: Extract Time Ranges from Index Data

# Query index to get actual min/max timestamps for baseline and selection periods
agg_query = {
    "aggs": {
        "baseline_period": {
            "filter": {
                "range": {
                    "timestamp": {
                        "gte": (baseline_time).isoformat() + "Z",
                        "lte": (baseline_time + timedelta(minutes=10)).isoformat() + "Z"
                    }
                }
            },
            "aggs": {
                "min_time": {"min": {"field": "timestamp"}},
                "max_time": {"max": {"field": "timestamp"}}
            }
        },
        "selection_period": {
            "filter": {
                "range": {
                    "timestamp": {
                        "gte": (selection_time).isoformat() + "Z",
                        "lte": (selection_time + timedelta(minutes=10)).isoformat() + "Z"
                    }
                }
            },
            "aggs": {
                "min_time": {"min": {"field": "timestamp"}},
                "max_time": {"max": {"field": "timestamp"}}
            }
        }
    }
}

result = client.search(index=index_name, body=agg_query)

# Extract timestamps from aggregation results
baseline_agg = result['aggregations']['baseline_period']
selection_agg = result['aggregations']['selection_period']

baseline_start = baseline_agg['min_time']['value_as_string']
baseline_end = baseline_agg['max_time']['value_as_string']
selection_start = selection_agg['min_time']['value_as_string']
selection_end = selection_agg['max_time']['value_as_string']

print("‚úÖ Time ranges extracted from index data:")
print(f"üìÖ Baseline period: {baseline_start} to {baseline_end}")
print(f"üìÖ Selection period: {selection_start} to {selection_end}")

‚úÖ Time ranges extracted from index data:
üìÖ Baseline period: 2026-01-04T12:48:06.296Z to 2026-01-04T12:53:09.296Z
üìÖ Selection period: 2026-01-04T13:48:06.296Z to 2026-01-04T13:53:09.296Z


## Step 4: Mode 1 - Sequence Analysis

In [5]:
# Sequence analysis requires trace field and time ranges
tools_sequence = [{
    "type": "LogPatternAnalysisTool",
    "parameters": {
        "index": index_name,
        "timeField": "timestamp",
        "logFieldName": "message",
        "traceFieldName": "trace_id",
        "baseTimeRangeStart": baseline_start,
        "baseTimeRangeEnd": baseline_end,
        "selectionTimeRangeStart": selection_start,
        "selectionTimeRangeEnd": selection_end
    }
}]

agent_sequence = client.transport.perform_request(
    "POST",
    "/_plugins/_ml/agents/_register",
    body={
        "name": "Sequence_Analysis_Agent",
        "type": "flow",
        "description": "Analyzes event sequences using trace IDs with baseline and selection time ranges",
        "memory": {
            "type": "demo"
        },
        "tools": tools_sequence
    }
)
print(f"‚úÖ Sequence agent created: {agent_sequence['agent_id']}")
agent_sequence = agent_sequence['agent_id']

‚úÖ Sequence agent created: YFt7iZsBLQ1mV2UNqClb


## Step 5: Test Case 1 - Event Sequences

In [6]:
parameters = {
    "index": index_name,
    "timeField": "timestamp",
    "logFieldName": "message",
    "traceFieldName": "trace_id",
    "baseTimeRangeStart": baseline_start,
    "baseTimeRangeEnd": baseline_end,
    "selectionTimeRangeStart": selection_start,
    "selectionTimeRangeEnd": selection_end
}

print("üîó Analyzing event sequences...")
print("="*60)
response = client.transport.perform_request(
    "POST",
    f"/_plugins/_ml/agents/{agent_sequence}/_execute",
    body={"parameters": parameters}
)
print("\nüìä Event Sequences by Trace ID:")
print(json.dumps(response, indent=2))

üîó Analyzing event sequences...

üìä Event Sequences by Trace ID:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"EXCEPTIONAL\":{\"tx003\":\"Authentication <*> -> Request <*>\"},\"BASE\":{\"tx001\":\"Authentication successful -> Database query executed -> Response sent\"}}"
        }
      ]
    }
  ]
}


## Step 6: Mode 2 - Pattern Difference (Baseline vs Selection)

In [7]:
baseline_start = baseline_time.isoformat() + "Z"
baseline_end = (baseline_time + timedelta(minutes=10)).isoformat() + "Z"
selection_start = selection_time.isoformat() + "Z"
selection_end = (selection_time + timedelta(minutes=10)).isoformat() + "Z"

tools_difference = [{
    "type": "LogPatternAnalysisTool",
    "parameters": {
        "baseTimeRangeStart": baseline_start,
        "baseTimeRangeEnd": baseline_end,
        "selectionTimeRangeStart": selection_start,
        "selectionTimeRangeEnd": selection_end,
        "timeField": "timestamp",
        "input": json.dumps({
            "query": {"match_all": {}},
            "_source": ["message", "timestamp"]
        })
    }
}]

agent_difference = create_flow_agent(
    client, "Pattern_Difference_Agent",
    "Compares patterns between time periods",
    tools_difference
)
print(f"‚úÖ Difference agent created: {agent_difference}")

   Registering flow agent: Pattern_Difference_Agent...
   ‚úì Agent registered: Y1t8iZsBLQ1mV2UNAikE
‚úÖ Difference agent created: Y1t8iZsBLQ1mV2UNAikE


## Step 7: Test Case 2 - Pattern Changes

In [8]:
parameters = {"index": index_name}

print("‚ö° Comparing baseline vs selection patterns...")
print("="*60)
response = execute_agent(client, agent_difference, parameters)
print("\nüìä Pattern Differences:")
print(json.dumps(response, indent=2))

‚ö° Comparing baseline vs selection patterns...

üìä Pattern Differences:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"patternMapDifference\":[{\"pattern\":\"Request <*>\",\"base\":0.0,\"selection\":0.3333333333333333,\"lift\":null},{\"pattern\":\"Authentication <*>\",\"base\":0.0,\"selection\":0.3333333333333333,\"lift\":null},{\"pattern\":\"Error response sent\",\"base\":0.0,\"selection\":0.16666666666666666,\"lift\":null},{\"pattern\":\"Database timeout\",\"base\":0.0,\"selection\":0.16666666666666666,\"lift\":null}]}"
        }
      ]
    }
  ]
}


## Step 8: Mode 3 - Pattern Insights (Anomaly Detection)

In [9]:
tools_insights = [{
    "type": "LogPatternAnalysisTool",
    "parameters": {
        "selectionTimeRangeStart": selection_start,
        "selectionTimeRangeEnd": selection_end,
        "timeField": "timestamp",
        "input": json.dumps({
            "query": {"match_all": {}},
            "_source": ["message", "timestamp"]
        })
    }
}]

agent_insights = client.transport.perform_request(
    "POST",
    "/_plugins/_ml/agents/_register",
    body={
        "name": "Pattern_Insights_Agent",
        "type": "flow",
        "description": "Generates insights from log patterns",
        "memory": {
            "type": "demo"
        },
        "tools": tools_insights
    }
)
print(f"‚úÖ Insights agent created: {agent_insights['agent_id']}")
agent_insights = agent_insights['agent_id']

‚úÖ Insights agent created: Zlt8iZsBLQ1mV2UNtikc


## Step 9: Test Case 3 - Anomaly Detection

In [10]:
parameters = {"index": index_name}

print("üí° Generating pattern insights...")
print("="*60)
response = execute_agent(client, agent_insights, parameters)
print("\nüìä Pattern Insights & Anomalies:")
print(json.dumps(response, indent=2))

üí° Generating pattern insights...

üìä Pattern Insights & Anomalies:
{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "result": "{\"logInsights\":[{\"pattern\":\"Error response sent\",\"count\":1.0,\"sampleLogs\":[\"Error response sent\"]},{\"pattern\":\"Authentication failed\",\"count\":1.0,\"sampleLogs\":[\"Authentication failed\"]},{\"pattern\":\"Database timeout\",\"count\":1.0,\"sampleLogs\":[\"Database timeout\"]}]}"
        }
      ]
    }
  ]
}


## üéì Key Takeaways

### What We Learned:

1. **Three Analysis Modes**:
   ```python
   # Mode 1: Sequence Analysis (requires trace field)
   {
       "traceFieldName": "trace_id",
       "input": dsl_or_ppl_query
   }
   
   # Mode 2: Pattern Difference (baseline vs selection)
   {
       "baseTimeRangeStart": "2025-11-09T08:00:00Z",
       "baseTimeRangeEnd": "2025-11-09T09:00:00Z",
       "selectionTimeRangeStart": "2025-11-09T09:00:00Z",
       "selectionTimeRangeEnd": "2025-11-09T10:00:00Z",
       "timeField": "timestamp"
   }
   
   # Mode 3: Insights (anomaly detection on single period)
   {
       "selectionTimeRangeStart": "2025-11-09T09:00:00Z",
       "selectionTimeRangeEnd": "2025-11-09T10:00:00Z",
       "timeField": "timestamp"
   }
   ```

2. **Mode Selection Guide**:
   | Mode | Use When | Requires |
   |------|----------|----------|
   | Sequence | Track event flows | Trace field |
   | Difference | Compare time periods | Time ranges |
   | Insights | Find anomalies | Single time range |

3. **Sequence Analysis Results**:
   ```json
   {
     "trace_id": "tx001",
     "sequence": [
       "Request received",
       "Authentication successful",
       "Database query executed",
       "Response sent"
     ]
   }
   ```

4. **Difference Analysis Results**:
   ```json
   {
     "new_patterns": [
       "Authentication failed",
       "Database timeout"
     ],
     "disappeared_patterns": [
       "Response sent successfully"
     ],
     "frequency_changes": {
       "Error response sent": "+150%"
     }
   }
   ```

5. **Use Cases**:
   - üîó **Request Tracing**: Follow transaction flows
   - üìä **Performance Regression**: Compare before/after deployments
   - üéØ **Anomaly Detection**: Find unusual patterns
   - üîç **Root Cause Analysis**: Identify failure sequences
   - üìà **Trend Analysis**: Track pattern evolution

### Best Practices:

- ‚úÖ **Trace IDs**: Use consistent trace field for sequence analysis
- ‚úÖ **Time Windows**: Choose comparable time ranges (same duration)
- ‚úÖ **Field Selection**: Include timestamp and message fields
- ‚úÖ **Baseline Period**: Use stable/normal period for comparison
- ‚úÖ **Sample Size**: Ensure sufficient logs in both periods

### Real-World Scenarios:

**1. Deployment Impact Analysis**:
```python
# Compare pre-deployment vs post-deployment
baseTimeRange: "before deployment"
selectionTimeRange: "after deployment"
# Result: New error patterns, changed frequencies
```

**2. Incident Investigation**:
```python
# Sequence analysis on failed transactions
traceFieldName: "request_id"
# Result: Common failure sequences
```

**3. Anomaly Monitoring**:
```python
# Insights on recent hour
selectionTimeRange: "last 1 hour"
# Result: Unusual patterns, spikes, anomalies
```

### Combining with Other Tools:

```python
# Complete log analysis workflow
tools = [
    {"type": "LogPatternTool", ...},           # 1. Discover patterns
    {"type": "LogPatternAnalysisTool", ...},  # 2. Analyze sequences/changes
    {"type": "MLModelTool", ...}              # 3. Generate recommendations
]
```

---

## üßπ Cleanup

In [None]:
# # cleanup_resources(
# #     client=client,
# #     agent_ids=[agent_sequence, agent_difference, agent_insights]
# # )
# # client.indices.delete(index=index_name)
# # print("‚úÖ Cleanup complete!")

## üöÄ Next Steps

- **LogPatternTool**: Basic pattern discovery
- **PPLTool**: Custom log queries
- **DataDistributionTool**: Data distribution analysis

üìö [LogPatternAnalysisTool Documentation](https://opensearch.org/docs/latest/ml-commons-plugin/agents-tools/tools/log-pattern-analysis-tool/)