# Log Monitor Agent with MLflow Tracing & Evaluation

An event-driven agent that monitors server logs using LangGraph.

**Source:** The agent is from https://github.com/jwm4/agents/tree/001-log-monitor-agent/examples/log-monitor-agent

**What we added:** MLflow tracing and Agent-as-a-Judge evaluation to demonstrate observability and automated evaluation of agent execution.

**Prerequisites:**
- Llama Stack server running at `http://localhost:8321`
- `OPENAI_API_KEY` in environment (for Agent-as-a-Judge)

In [1]:
import os
import sys
from dotenv import load_dotenv

# Add parent directory to path for imports
sys.path.insert(0, os.path.dirname(os.getcwd()))

# Load .env from parent directory (agents_tracing-eval_mlflow/.env)
env_path = os.path.join(os.path.dirname(os.getcwd()), ".env")
load_dotenv(env_path)

import mlflow
from mlflow.entities import SpanType, AssessmentSource, AssessmentSourceType
from mlflow.genai.judges import make_judge
from typing import Literal

  from .autonotebook import tqdm as notebook_tqdm


## Configuration

In [2]:
# MLflow setup
db_path = os.path.join(os.getcwd(), "mlflow.db")
mlflow.set_tracking_uri(f"sqlite:///{db_path}")
mlflow.set_experiment("log-monitor-agent")
print(f"MLflow database: {db_path}")

2026/01/29 16:59:12 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.schemas
2026/01/29 16:59:12 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.tables
2026/01/29 16:59:12 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.types
2026/01/29 16:59:12 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.constraints
2026/01/29 16:59:12 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.defaults
2026/01/29 16:59:12 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.comments
2026/01/29 16:59:12 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/01/29 16:59:12 INFO mlflow.store.db.utils: Updating database tables
2026/01/29 16:59:12 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/01/29 16:59:12 INFO alembic.runtime.migration: Will assume non-transactional DDL.
2026/01/29 16:59:12 INFO alembic.runtime.migration: Running upgrade  -> 451aebb31d03, add metric step
2026/01/29 16:5

MLflow database: /Users/nnarendr/Documents/Repos/agents/agents_tracing-eval_mlflow/log_monitor/mlflow.db


In [3]:
JUDGE_MODEL = "openai:/gpt-4o"

## Import the Agent

The log monitor agent (from the source repo) implements a LangGraph workflow:
1. **Classify** - error/warning/normal
2. **Diagnose** - root cause analysis (uses MCP tools for documentation lookup)
3. **Assess Severity** - high/low
4. **Route** - Slack alert (high) or GitHub ticket (low)

We added `@mlflow.trace` decorator and `mlflow.start_span` calls to capture execution traces.

In [4]:
from log_monitor_agent.agent import process_log_message

## Agent-as-a-Judge (Added by us)

We added this evaluation layer using MLflow's `make_judge`. It evaluates the agent's trace using MCP tools to inspect execution details.

In [5]:
log_monitor_judge = make_judge(
    name="log_monitor_evaluator",
    instructions=(
        "Evaluate the log monitor agent's performance in {{ trace }}.\n\n"
        "Check for:\n"
        "1. Classification Accuracy: Was the log correctly classified as error/warning/normal?\n"
        "2. Diagnosis Quality: Was the root cause analysis accurate and helpful?\n"
        "3. Severity Assessment: Was the severity (high/low) appropriate?\n"
        "4. Action Routing: Was the correct action taken (Slack for high, GitHub for low)?\n\n"
        "Rate as: 'good', 'acceptable', or 'poor'"
    ),
    feedback_value_type=Literal["good", "acceptable", "poor"],
    model=JUDGE_MODEL,
)

In [7]:
def evaluate_trace(trace):
    """Run Agent-as-a-Judge evaluation and log to MLflow."""
    feedback = log_monitor_judge(trace=trace)
    
    trace_id = trace.info.trace_id
    mlflow.log_feedback(
        trace_id=trace_id,
        name="log_monitor_evaluation",
        value=feedback.value,
        rationale=feedback.rationale,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id=f"agent-as-a-judge/{JUDGE_MODEL}",
        ),
    )
    
    print(f"\nEvaluation: {feedback.value}")
    print(f"Rationale: {feedback.rationale}")
    return feedback

## Sample Log Messages

Real-world examples from common libraries. These logs benefit from the agent's MCP tools (DeepWiki, Context7) to look up documentation for accurate diagnosis.

Categories:
- **Kubernetes** - RBAC, resource not found, conflicts
- **Redis** - Connection, watch, timeout errors
- **Kafka** - Metadata, partition, producer errors
- **SQLAlchemy** - Connection, integrity, pool errors
- **LangChain** - Parser, tool call errors
- **AWS Boto3** - Access denied, throttling
- **Warnings** - Memory, certificate expiration
- **Info** - Normal operational logs (no action needed)

In [8]:
# Sample log messages - real-world examples that benefit from MCP tool research
EXAMPLES = [
    # === KUBERNETES PYTHON CLIENT ERRORS ===
    "ERROR: kubernetes.client.rest.ApiException: (403) Forbidden: pods is forbidden: User 'system:serviceaccount:default:myapp' cannot list resource 'pods' in API group '' in namespace 'production'",
    "ERROR: kubernetes.client.rest.ApiException: (404) Not Found: deployments.apps 'nginx-deployment' not found in namespace 'staging'",
    "ERROR: kubernetes.client.rest.ApiException: (409) Conflict: Operation cannot be fulfilled on configmaps 'app-config': the object has been modified; please apply your changes to the latest version",
    
    # === REDIS-PY ERRORS ===
    "ERROR: redis.exceptions.ConnectionError: Error 111 connecting to redis-master:6379. Connection refused.",
    "ERROR: redis.exceptions.WatchError: Watched variable changed during transaction - key 'inventory:item:12345' was modified by another client",
    "ERROR: redis.exceptions.TimeoutError: Timeout reading from redis-cluster:6379 after 30.0 seconds",
    
    # === KAFKA ERRORS ===
    "ERROR: kafka.errors.KafkaTimeoutError: Failed to update metadata after 60.0 secs - broker may be unreachable",
    "ERROR: kafka.errors.NotLeaderForPartitionError: This server is not the leader for topic-partition orders-events-3",
    "ERROR: org.apache.kafka.common.errors.ProducerFencedException: Producer with transactionalId 'order-processor' has been fenced by a newer producer instance",
    
    # === SQLALCHEMY / DATABASE ERRORS ===
    "ERROR: sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at 'db.example.com' (10.0.1.50), port 5432 failed: Connection timed out",
    "ERROR: sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint 'users_email_key' - DETAIL: Key (email)=(user@example.com) already exists",
    "ERROR: sqlalchemy.pool.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00",
    
    # === LANGCHAIN ERRORS ===
    "ERROR: langchain_core.exceptions.OutputParserException: Failed to parse LLM output - expected JSON object but received malformed response",
    "ERROR: langchain.schema.InvalidToolCall: Tool 'search_database' received invalid arguments: missing required parameter 'query'",
    
    # === AWS BOTO3 ERRORS ===
    "ERROR: botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied for s3://my-bucket/private/data.json",
    "ERROR: botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeInstances operation: Rate exceeded",
    
    # === HIGH SEVERITY WARNINGS ===
    "WARNING: Memory usage at 94% on pod ml-inference-worker-7b9c4 - OOMKilled likely imminent",
    "WARNING: Certificate for *.api.example.com expires in 12 hours (NotAfter: 2024-01-15T23:59:59Z)",
    
    # === NORMAL/INFO LOGS (no action expected) ===
    "INFO: Successfully connected to PostgreSQL database at db.example.com:5432",
    "INFO: Kafka consumer group 'order-processors' rebalanced - now consuming from partitions [0, 1, 2]",
]

In [9]:
# Process a log message (pick any index from EXAMPLES)
log_message = EXAMPLES[0]  # Kubernetes RBAC error
print(f"Processing: {log_message}\n")

result = process_log_message(log_message)
print(f"\nResult: {result}")

# Evaluate the trace
trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id)
evaluate_trace(trace)

Processing: ERROR: kubernetes.client.rest.ApiException: (403) Forbidden: pods is forbidden: User 'system:serviceaccount:default:myapp' cannot list resource 'pods' in API group '' in namespace 'production'

[LLM] Using Llama Stack at http://localhost:8321
[LLM] Model: openai/gpt-4o
[Classify] Classification: error (confidence: 0.95)
[Classify] Indicators: ['ERROR', 'kubernetes.client.rest.ApiException', 'Forbidden']
[Diagnose] Analyzing root cause...
[LLM] Using Llama Stack at http://localhost:8321
[LLM] Model: openai/gpt-4o
[MCP] Connecting to research tools...
[MCP]   - DeepWiki: https://mcp.deepwiki.com/mcp
[MCP]   - Context7: https://mcp.context7.com/mcp
[MCP] Available tools: ['read_wiki_structure', 'read_wiki_contents', 'ask_question', 'resolve-library-id', 'query-docs']
[Diagnose] MCP research tools available
[Diagnose] Diagnosis: 1. **What went wrong**: The error log indicates that the service account `system:serviceaccount:default:myapp` does not have permission to list the res

Feedback(name='log_monitor_evaluator', source=AssessmentSource(source_type='LLM_JUDGE', source_id='openai:/gpt-4o'), trace_id='tr-a391faf96e2b09c55e3691880236a3dc', run_id=None, rationale='1. **Classification Accuracy:** The log was correctly classified as an "error" based on the input log message indicating a 403 Forbidden error related to Kubernetes.\n\n2. **Diagnosis Quality:** The diagnosis provided accurately identifies the root cause of the error: missing necessary permissions for the service account to list pods in the specified namespace. The analysis is clear, detailed, and helps identify the misconfiguration or missing role/role binding.\n\n3. **Severity Assessment:** The severity was appropriately assessed as "high". Given the error\'s potential impact of preventing the service or application from accessing critical resources, this assessment is justified.\n\n4. **Action Routing:** The correct action was taken by routing the alert to Slack, appropriate for a high-severity er

## View Traces in MLflow UI

Start the MLflow UI to view traces and assessments:

```bash
mlflow ui --port 5001
```

Then open http://localhost:5001 in your browser.

### How to Navigate

1. **Select the Experiment** - Click on `log-monitor-agent` in the left sidebar
2. **Go to Traces tab** - Click the "Traces" tab to see all agent executions
3. **View Trace Details** - Click on any Trace ID to open the trace detail view
   - You'll see the span hierarchy showing each step (classify → diagnose → assess severity → route)
   - Click on individual spans to see inputs/outputs for each step
4. **View Assessments** - In the trace detail view, look for the assessments side-panel on the right
   - This shows the Agent-as-a-Judge evaluation results (rating + rationale)
   - In MLflow 3.2+, you can also see assessment columns directly in the traces list
