# üìä PPLTool - Piped Processing Language Query Generation

```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#16A085', 'primaryTextColor':'#fff', 'primaryBorderColor':'#138D75', 'lineColor':'#F39C12', 'secondaryColor':'#3498DB', 'tertiaryColor':'#27AE60', 'fontSize':'16px'}}}%%
graph TB
    A[üë§ Natural Language<br/>Show error logs] --> B[ü§ñ Flow Agent]
    B --> C{üìä PPLTool}
    C --> D[üéØ LLM Model]
    D --> E[üìù Generate PPL Query]
    E --> F[‚úÖ Valid PPL]
    F --> G[üîç Execute Query]
    G --> H[üìä Results]
    
    style A fill:#3498DB,stroke:#2980B9,color:#fff
    style C fill:#16A085,stroke:#138D75,color:#fff
    style D fill:#9B59B6,stroke:#8E44AD,color:#fff
    style E fill:#E67E22,stroke:#D35400,color:#fff
    style H fill:#27AE60,stroke:#229954,color:#fff
```

## üìö Learning Objectives

1. ‚úÖ Convert natural language to **PPL (Piped Processing Language)** queries
2. ‚úÖ Use **PPL syntax** for log and event analysis
3. ‚úÖ Build **analytics pipelines** with pipes
4. ‚úÖ **Execute PPL queries** automatically
5. ‚úÖ Simplify **complex log analysis** workflows

---

## üéØ What is PPLTool?

**PPLTool** generates **Piped Processing Language (PPL)** queries from natural language. PPL is ideal for:
- üìä **Log Analysis**: Filter, aggregate, transform log data
- üîÑ **Data Pipelines**: Chain operations with pipes
- üìà **Analytics**: Time-series analysis, statistics
- üéØ **Simplicity**: More intuitive than DSL for many use cases

**PPL Example**:
```sql
source=logs | where level='ERROR' | stats count() by service
```

---

## Prerequisites

To create a PPL tool, you need a **fine-tuned model** that translates natural language into PPL queries, or alternatively, you can use **large language models** for prompt-based translation. The PPLTool supports:

- ‚úÖ **Anthropic Claude** models (model_type: `CLAUDE`)
- ‚úÖ **OpenAI** models - GPT-3.5, GPT-4 (model_type: `OPENAI`)
- ‚úÖ **Fine-tuned** custom models (model_type: `FINETUNE`)

In this notebook, we'll use **OpenAI** models for natural language to PPL translation.

---

## Step 1: Import Libraries

In [1]:
import sys
import json
from datetime import datetime, timedelta

sys.path.append('..')
from agent_helpers import (
    get_os_client,
    configure_cluster_for_openai,
    create_openai_connector,
    register_and_deploy_openai_model,
    create_flow_agent,
    execute_agent,
    cleanup_resources,
    OPENAI_API_KEY  # Import the API key constant
)

print("‚úÖ Libraries imported!")

‚úÖ Libraries imported!


## Step 2: Create a connector and deploy the model

This step corresponds to the documentation's Step 1 and Step 2, where we:
1. Create a connector for OpenAI
2. Register and deploy the model

In [2]:
# Step 2: Create OpenAI connector for PPL Tool
# Using standard chat completions with gpt-4o-mini for model_type="OPENAI"

import time

client = get_os_client()
configure_cluster_for_openai(client)

# Use the standard create_openai_connector helper which sets up chat completions correctly
connector_id = create_openai_connector(client, model_name="gpt-4o-mini")
print(f"‚úÖ Connector created: {connector_id}")

# Register and deploy using the helper
model_id = register_and_deploy_openai_model(client, connector_id, model_name="gpt-4o-mini")
print(f"‚úÖ Model deployed: {model_id}")

   Configuring cluster settings for OpenAI connector...
   ‚úì Cluster settings configured successfully
   Creating OpenAI connector for gpt-4o-mini...
   ‚úì Connector created: aFt_iZsBLQ1mV2UNgymq
‚úÖ Connector created: aFt_iZsBLQ1mV2UNgymq
   Creating model group...
   ‚úì Model group created: aVt_iZsBLQ1mV2UNgym0
   Registering gpt-4o-mini model...
   ‚úì Model registered: a1t_iZsBLQ1mV2UNgynL
   Deploying model...
   ‚è≥ Waiting for model deployment...
      Model status: DEPLOYING
      Model status: DEPLOYED
      ‚úì Model deployed successfully!
‚úÖ Model deployed: a1t_iZsBLQ1mV2UNgynL


## Step 3: Create Sample Logs Index

Before running the agent, we need an index with log data to query.

In [3]:
index_name = "application_logs"

if client.indices.exists(index=index_name):
    client.indices.delete(index=index_name)

client.indices.create(index=index_name)

# Sample log data
logs = [
    {"timestamp": "2025-11-09T10:00:00Z", "level": "INFO", "service": "api", "message": "Request processed"},
    {"timestamp": "2025-11-09T10:01:00Z", "level": "ERROR", "service": "api", "message": "Connection timeout"},
    {"timestamp": "2025-11-09T10:02:00Z", "level": "ERROR", "service": "database", "message": "Query failed"},
    {"timestamp": "2025-11-09T10:03:00Z", "level": "WARN", "service": "api", "message": "Slow response"},
    {"timestamp": "2025-11-09T10:04:00Z", "level": "INFO", "service": "worker", "message": "Job completed"},
    {"timestamp": "2025-11-09T10:05:00Z", "level": "ERROR", "service": "api", "message": "Authentication failed"},
]

for log in logs:
    client.index(index=index_name, body=log, refresh=True)

print(f"‚úÖ Created {len(logs)} log entries")

‚úÖ Created 6 log entries


## Step 4: Register a flow agent that will run the PPLTool

A flow agent runs a sequence of tools in order and returns the last tool's output. To create a flow agent, we provide the model ID in the `model_id` parameter. To run the generated query, we set `execute` to `true`.

In [4]:
# Register PPL agent using model_type="OPENAI"
# This tells the PPL tool to use built-in OpenAI formatting

tools = [{
    "type": "PPLTool",
    "name": "TransferQuestionToPPLAndExecuteTool",
    "description": "Use this tool to transfer natural language to generate PPL and execute PPL to query inside.",
    "parameters": {
        "model_id": model_id,
        "model_type": "OPENAI",  # Use built-in OPENAI support
        "execute": True  # Execute the generated PPL query
    }
}]

agent_id = create_flow_agent(
    client, 
    "Test_Agent_For_PPL",
    "this is a test agent",
    tools
)
print(f"‚úÖ Agent registered: {agent_id}")

   Registering flow agent: Test_Agent_For_PPL...
   ‚úì Agent registered: eVt_iZsBLQ1mV2UN0inH
‚úÖ Agent registered: eVt_iZsBLQ1mV2UN0inH


## **Step 5: Execute the Agent**

Test the PPL tool by sending natural language questions.

### ‚ö†Ô∏è Known Limitations

The PPL tool has compatibility challenges with OpenAI models in the current OpenSearch version:

1. **model_type="OPENAI"**: Requires specific connector message format that doesn't align with standard OpenAI chat completions
2. **model_type="FINETUNE"**: Requires `/v1/completions` endpoint, but newer OpenAI models (gpt-4o-mini) only support `/v1/chat/completions`
3. **gpt-3.5-turbo-instruct**: Supports completions endpoint but encounters inference failures through the PPL tool

### ‚úÖ Working Alternative

Instead of using PPLTool, you can:
- Use the model directly to generate PPL queries
- Execute queries manually using OpenSearch PPL API
- Use other verified tools like LogPatternTool or CatIndexTool

In [5]:
# Alternative Approach: Use OpenAI model directly to generate PPL
# Then execute the PPL query manually

print("üîÑ Alternative: Generate PPL using model directly")
print("="*60)

# Step 1: Generate PPL query using the LLM
question = "Show me all ERROR level logs"
ppl_prompt = f"""Convert this natural language question to a PPL (Piped Processing Language) query for OpenSearch.

Index: {index_name}
Question: {question}

PPL syntax guide:
- Start with: source=index_name
- Filter with: where field='value' or where field=value
- Aggregate with: stats count() by field
- Sort with: sort field desc/asc
- Limit with: head N

Output ONLY the PPL query, nothing else."""

try:
    # Call model to generate PPL
    model_request = {
        "parameters": {
            "messages": [{"role": "user", "content": ppl_prompt}]
        }
    }
    
    llm_response = client.transport.perform_request(
        'POST',
        f'/_plugins/_ml/models/{model_id}/_predict',
        body=model_request
    )
    
    # Extract generated PPL query
    ppl_query = llm_response['inference_results'][0]['output'][0]['dataAsMap']['choices'][0]['message']['content'].strip()
    # Remove markdown code blocks if present
    ppl_query = ppl_query.replace('```sql', '').replace('```', '').strip()
    
    print(f"\n‚úÖ Generated PPL Query:")
    print(f"   {ppl_query}")
    
    # Step 2: Execute the PPL query
    print(f"\nüîç Executing PPL query...")
    ppl_result = client.transport.perform_request(
        'POST',
        '/_plugins/_ppl',
        body={"query": ppl_query}
    )
    
    print(f"\nüìä Query Results:")
    print(json.dumps(ppl_result, indent=2))
    
except Exception as e:
    print(f"\n‚ùå Error: {e}")
    if hasattr(e, 'info'):
        print(f"\nüìù Details:")
        print(json.dumps(e.info, indent=2))

üîÑ Alternative: Generate PPL using model directly

‚úÖ Generated PPL Query:
   source=application_logs | where level='ERROR'

üîç Executing PPL query...

üìä Query Results:
{
  "schema": [
    {
      "name": "message",
      "type": "string"
    },
    {
      "name": "level",
      "type": "string"
    },
    {
      "name": "service",
      "type": "string"
    },
    {
      "name": "timestamp",
      "type": "timestamp"
    }
  ],
  "datarows": [
    [
      "Connection timeout",
      "ERROR",
      "api",
      "2025-11-09 10:01:00"
    ],
    [
      "Query failed",
      "ERROR",
      "database",
      "2025-11-09 10:02:00"
    ],
    [
      "Authentication failed",
      "ERROR",
      "api",
      "2025-11-09 10:05:00"
    ]
  ],
  "total": 3,
  "size": 3
}


## Test Case 2 - Count Errors by Service

In [6]:
# Test Case 2: Count errors grouped by service
question = "Count the number of ERROR level logs grouped by service (use uppercase ERROR)"

ppl_prompt = f"""Convert to PPL query:
Index: {index_name}
Question: {question}

PPL format: source=index | where field='value' | stats function() by field
Output only the query."""

model_request = {"parameters": {"messages": [{"role": "user", "content": ppl_prompt}]}}
llm_response = client.transport.perform_request('POST', f'/_plugins/_ml/models/{model_id}/_predict', body=model_request)
ppl_query = llm_response['inference_results'][0]['output'][0]['dataAsMap']['choices'][0]['message']['content'].strip().replace('```', '').replace('sql', '').strip()

print(f"Question: {question}")
print(f"PPL Query: {ppl_query}")

ppl_result = client.transport.perform_request('POST', '/_plugins/_ppl', body={"query": ppl_query})
print(f"\nResults:")
print(json.dumps(ppl_result, indent=2))

Question: Count the number of ERROR level logs grouped by service (use uppercase ERROR)
PPL Query: source=application_logs | where level='ERROR' | stats count() by service

Results:
{
  "schema": [
    {
      "name": "count()",
      "type": "bigint"
    },
    {
      "name": "service",
      "type": "string"
    }
  ],
  "datarows": [
    [
      2,
      "api"
    ],
    [
      1,
      "database"
    ]
  ],
  "total": 2,
  "size": 2
}


## Test Case 3 - Filter by Service and Level `<SAME ERROR MENTIONED ABOVE>`

In [7]:
parameters = {
    "question": "Show API service logs with WARNING or ERROR level",
    "index": index_name
}

print("‚ùì Question: Show API service WARN/ERROR logs")
print("="*60)
response = execute_agent(client, agent_id, parameters)
print("\nüìä Results:")
print(json.dumps(response, indent=2))

‚ùì Question: Show API service WARN/ERROR logs


RequestError: RequestError(400, 'IllegalArgumentException')

## Test Case 4 - Recent Errors `<SAME ERROR ABOVE>`

In [7]:
parameters = {
    "question": "What were the most recent 5 error messages?",
    "index": index_name
}

print("‚ùì Question: Most recent 5 error messages")
print("="*60)
response = execute_agent(client, agent_id, parameters)
print("\nüìä Results:")
print(json.dumps(response, indent=2))

‚ùì Question: Most recent 5 error messages


RequestError: RequestError(400, 'IllegalArgumentException')

## Key Takeaways

### PPLTool Challenges:

The **PPLTool** has compatibility issues with OpenAI models in the current OpenSearch version:
- `model_type="OPENAI"` requires specific connector formats not fully documented
- `model_type="FINETUNE"` works with `/v1/completions` but newer OpenAI models use `/v1/chat/completions`
- The tool encounters inference failures when calling models through the agent framework

### ‚úÖ Working Alternative - Direct PPL Generation:

Instead of using PPLTool, use this two-step approach:

1. **Generate PPL Query**: Call OpenAI model directly with a PPL-focused prompt
2. **Execute Query**: Use OpenSearch PPL API `/_plugins/_ppl` endpoint

```python
# Step 1: Generate PPL
model_request = {
    "parameters": {
        "messages": [{"role": "user", "content": ppl_prompt}]
    }
}
response = client.transport.perform_request(
    'POST',
    f'/_plugins/_ml/models/{model_id}/_predict',
    body=model_request
)
ppl_query = response['inference_results'][0]['output'][0]['dataAsMap']['choices'][0]['message']['content']

# Step 2: Execute PPL
result = client.transport.perform_request(
    'POST',
    '/_plugins/_ppl',
    body={"query": ppl_query}
)
```

### PPL Syntax Reference:

| Command | Purpose | Example |
|---------|---------|---------|
| `source=index` | Specify index | `source=logs` |
| `where field='value'` | Filter | `where level='ERROR'` |
| `stats count() by field` | Aggregate | `stats count() by service` |
| `sort field desc` | Order results | `sort timestamp desc` |
| `head N` | Limit results | `head 10` |

### Best Practices:

‚úÖ **Be specific**: Include field names and values in your natural language question  
‚úÖ **Case sensitivity**: Specify uppercase/lowercase when important  
‚úÖ **Prompt engineering**: Provide PPL examples in your prompt for better results  
‚úÖ **Validation**: Check generated queries before execution  

---

## üßπ Cleanup

In [None]:
# # cleanup_resources(
# #     client=client,
# #     agent_ids=[agent_id],
# #     model_ids=[model_id],
# #     connector_ids=[connector_id]
# # )
# # client.indices.delete(index=index_name)
# # print("‚úÖ Cleanup complete!")

## üöÄ Next Steps

- **LogPatternTool**: Extract patterns from logs
- **LogPatternAnalysisTool**: Advanced log analysis
- **QueryPlanningTool**: DSL query generation

üìö [PPLTool Documentation](https://opensearch.org/docs/latest/ml-commons-plugin/agents-tools/tools/ppl-tool/)