# SLO Exploration Notebook

An interactive walkthrough of **Service Level Objectives** for AI agent systems
using `agent-sre`.

We will:
1. Define SLOs for latency, accuracy, and cost
2. Simulate 200 agent calls with varying performance
3. Compute SLI values and inspect error budgets
4. Visualize compliance, burn rate, and latency distribution
5. Check alert thresholds
6. Run a what-if analysis

## 1 ‚Äî Setup

In [None]:
import random
import math

import matplotlib.pyplot as plt

from agent_sre import SLO, ErrorBudget
from agent_sre.slo.indicators import (
    CostPerTask,
    ResponseLatency,
    TaskSuccessRate,
    ToolCallAccuracy,
)
from agent_sre.slo.objectives import ExhaustionAction, SLOStatus
from agent_sre.slo.dashboard import SLODashboard

random.seed(42)
print("Setup complete ‚úÖ")

## 2 ‚Äî Define SLOs

We create three SLIs and combine them into a single SLO with an error budget.

| Indicator | Target | Window | Description |
|---|---|---|---|
| Response Latency (p95) | ‚â§ 3 000 ms | 1 h | 95th-percentile latency |
| Tool-Call Accuracy | ‚â• 99 % | 24 h | Fraction of correct tool selections |
| Cost per Task | ‚â§ $0.50 | 24 h | Average USD cost per task |

In [None]:
# --- SLI definitions ---
latency_sli = ResponseLatency(target_ms=3000.0, percentile=0.95, window="1h")
accuracy_sli = ToolCallAccuracy(target=0.99, window="24h")
cost_sli = CostPerTask(target_usd=0.50, window="24h")

# --- Error budget (5 %) ---
budget = ErrorBudget(
    total=0.05,
    burn_rate_alert=2.0,
    burn_rate_critical=10.0,
    exhaustion_action=ExhaustionAction.FREEZE_DEPLOYMENTS,
)

# --- SLO ---
slo = SLO(
    name="code-review-agent",
    description="Reliability targets for an AI code-review agent",
    indicators=[latency_sli, accuracy_sli, cost_sli],
    error_budget=budget,
    agent_id="code-review-agent",
)

print(slo)

## 3 ‚Äî Simulate Traffic

Generate **200 mock agent calls** with realistic performance characteristics.
Success rate is intentionally set *below* the target to watch the error budget drain.

In [None]:
NUM_CALLS = 200

# We track per-call metrics for later visualization
latencies = []
accuracies_running = []
costs = []
good_events = []
budget_remaining = []

for i in range(NUM_CALLS):
    # Simulate outcomes
    task_ok = random.random() < 0.92   # 92 % success (below 95 % budget)
    tool_ok = random.random() < 0.985  # 98.5 % accuracy (below 99 % target)
    latency_ms = max(100, random.gauss(2400, 700))
    cost_usd = max(0.01, random.gauss(0.35, 0.15))

    # Record into SLIs
    accuracy_sli.record_call(tool_ok)
    latency_sli.record_latency(latency_ms)
    cost_sli.record_cost(cost_usd)

    # Record event against error budget
    is_good = task_ok and tool_ok
    slo.record_event(good=is_good)

    # Store for visualization
    latencies.append(latency_ms)
    accuracies_running.append(accuracy_sli.current_value())
    costs.append(cost_usd)
    good_events.append(is_good)
    budget_remaining.append(slo.error_budget.remaining_percent)

print(f"Simulated {NUM_CALLS} agent calls")
print(f"  Good events:  {sum(good_events)} / {NUM_CALLS}")
print(f"  Bad events:   {NUM_CALLS - sum(good_events)} / {NUM_CALLS}")

## 4 ‚Äî Compute SLIs

Read the current indicator values computed by `agent-sre`.

In [None]:
print("Indicator Summary")
print("=" * 55)
for ind in slo.indicators:
    val = ind.current_value()
    comp = ind.compliance()
    if val is not None and comp is not None:
        met = "‚úÖ" if comp >= 0.95 else "‚ùå"
        print(f"  {met} {ind.name}")
        print(f"       Value:      {val:.4f}")
        print(f"       Target:     {ind.target}")
        print(f"       Compliance: {comp:.1%}")
        print()

## 5 ‚Äî Error Budget

The error budget tracks how many "bad" events we can tolerate before the SLO
is breached.  A burn rate > 1√ó means we are consuming budget faster than planned.

In [None]:
status = slo.evaluate()

print("Error Budget Report")
print("=" * 45)
print(f"  SLO Status:          {status.value}")
print(f"  Budget Total:        {slo.error_budget.total:.2%}")
print(f"  Budget Consumed:     {slo.error_budget.consumed}")
print(f"  Budget Remaining:    {slo.error_budget.remaining_percent:.1f}%")
print(f"  Exhausted?           {slo.error_budget.is_exhausted}")
print(f"  Burn Rate (1 h):     {slo.error_budget.burn_rate(3600):.1f}√ó")
print()
print("Interpretation:")
if status == SLOStatus.EXHAUSTED:
    print("  üö® Budget exhausted ‚Äî action: "
          f"{slo.error_budget.exhaustion_action.value}")
elif status in (SLOStatus.CRITICAL, SLOStatus.WARNING):
    print("  ‚ö†Ô∏è  SLO at risk ‚Äî consider slowing deployments")
else:
    print("  ‚úÖ Budget healthy ‚Äî keep shipping")

## 6 ‚Äî Visualization

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
calls = range(1, NUM_CALLS + 1)

# --- 6a: Error budget remaining over time ---
ax = axes[0, 0]
ax.plot(calls, budget_remaining, linewidth=1.5)
ax.axhline(y=0, color="red", linestyle="--", label="Budget exhausted")
ax.set_title("Error Budget Remaining")
ax.set_xlabel("Agent Call #")
ax.set_ylabel("Remaining (%)")
ax.legend()
ax.grid(True, alpha=0.3)

# --- 6b: Latency distribution ---
ax = axes[0, 1]
ax.hist(latencies, bins=30, edgecolor="black", alpha=0.7)
ax.axvline(x=3000, color="red", linestyle="--", label="Target (3 000 ms)")
p95_val = sorted(latencies)[int(len(latencies) * 0.95)]
ax.axvline(x=p95_val, color="orange", linestyle="--",
           label=f"p95 ({p95_val:.0f} ms)")
ax.set_title("Latency Distribution")
ax.set_xlabel("Latency (ms)")
ax.set_ylabel("Count")
ax.legend()
ax.grid(True, alpha=0.3)

# --- 6c: Running accuracy ---
ax = axes[1, 0]
ax.plot(calls, accuracies_running, linewidth=1.5, color="green")
ax.axhline(y=0.99, color="red", linestyle="--", label="Target (99 %)")
ax.set_title("Tool-Call Accuracy (running)")
ax.set_xlabel("Agent Call #")
ax.set_ylabel("Accuracy")
ax.set_ylim(0.9, 1.01)
ax.legend()
ax.grid(True, alpha=0.3)

# --- 6d: Good vs bad events ---
ax = axes[1, 1]
cumulative_bad = []
running_total = 0
for g in good_events:
    if not g:
        running_total += 1
    cumulative_bad.append(running_total)
ax.plot(calls, cumulative_bad, linewidth=1.5, color="crimson")
ax.set_title("Cumulative Bad Events (budget burn)")
ax.set_xlabel("Agent Call #")
ax.set_ylabel("Bad Events")
ax.grid(True, alpha=0.3)

fig.suptitle(f"SLO Dashboard ‚Äî {slo.name}", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

## 7 ‚Äî Alert Thresholds

Burn-rate alerts fire when the budget is being consumed faster than expected.

| Alert | Threshold | Severity |
|---|---|---|
| Fast burn | 2√ó normal rate | ‚ö†Ô∏è Warning |
| Critical burn | 10√ó normal rate | üö® Critical |

In [None]:
print("Alert Status")
print("=" * 50)

current_burn = slo.error_budget.burn_rate(3600)
print(f"  Current burn rate (1 h): {current_burn:.1f}√ó")
print()

for alert in slo.error_budget.alerts():
    firing = alert.is_firing(current_burn)
    icon = "üîî FIRING" if firing else "üîá ok"
    print(f"  {icon}  {alert.name}  "
          f"(threshold: {alert.rate:.0f}√ó, severity: {alert.severity})")

print()
firing_alerts = slo.error_budget.firing_alerts()
if firing_alerts:
    print(f"  ‚Üí {len(firing_alerts)} alert(s) currently firing")
else:
    print("  ‚Üí No alerts firing")

## 8 ‚Äî What-If Analysis

**Scenario:** What if latency increases by 20 %?

We replay the same calls with inflated latency and compare the impact on the
error budget.

In [None]:
LATENCY_INCREASE = 1.20  # +20 %

# Create a fresh SLO for the what-if scenario
wif_latency = ResponseLatency(target_ms=3000.0, percentile=0.95, window="1h")
wif_accuracy = ToolCallAccuracy(target=0.99, window="24h")
wif_cost = CostPerTask(target_usd=0.50, window="24h")

wif_budget = ErrorBudget(
    total=0.05,
    burn_rate_alert=2.0,
    burn_rate_critical=10.0,
    exhaustion_action=ExhaustionAction.FREEZE_DEPLOYMENTS,
)

wif_slo = SLO(
    name="code-review-agent-whatif",
    indicators=[wif_latency, wif_accuracy, wif_cost],
    error_budget=wif_budget,
    agent_id="code-review-agent",
)

# Replay with inflated latency
random.seed(42)
wif_budget_remaining = []
wif_latencies = []

for i in range(NUM_CALLS):
    task_ok = random.random() < 0.92
    tool_ok = random.random() < 0.985
    latency_ms = max(100, random.gauss(2400, 700)) * LATENCY_INCREASE
    cost_usd = max(0.01, random.gauss(0.35, 0.15))

    wif_accuracy.record_call(tool_ok)
    wif_latency.record_latency(latency_ms)
    wif_cost.record_cost(cost_usd)
    wif_slo.record_event(good=task_ok and tool_ok)

    wif_latencies.append(latency_ms)
    wif_budget_remaining.append(wif_slo.error_budget.remaining_percent)

# Compare
orig_p95 = sorted(latencies)[int(len(latencies) * 0.95)]
wif_p95 = sorted(wif_latencies)[int(len(wif_latencies) * 0.95)]

print("What-If: Latency +20 %")
print("=" * 50)
print(f"  Original p95 latency:  {orig_p95:.0f} ms")
print(f"  What-if p95 latency:   {wif_p95:.0f} ms")
print(f"  Original budget left:  {budget_remaining[-1]:.1f}%")
print(f"  What-if budget left:   {wif_budget_remaining[-1]:.1f}%")
print()

# Side-by-side chart
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(calls, budget_remaining, label="Baseline", linewidth=1.5)
ax1.plot(calls, wif_budget_remaining, label="+20 % latency",
         linewidth=1.5, linestyle="--")
ax1.axhline(y=0, color="red", linestyle=":", alpha=0.5)
ax1.set_title("Error Budget Comparison")
ax1.set_xlabel("Agent Call #")
ax1.set_ylabel("Remaining (%)")
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.hist(latencies, bins=30, alpha=0.5, label="Baseline", edgecolor="black")
ax2.hist(wif_latencies, bins=30, alpha=0.5, label="+20 %", edgecolor="black")
ax2.axvline(x=3000, color="red", linestyle="--", label="Target")
ax2.set_title("Latency Distribution Comparison")
ax2.set_xlabel("Latency (ms)")
ax2.set_ylabel("Count")
ax2.legend()
ax2.grid(True, alpha=0.3)

fig.suptitle("What-If Analysis: +20 % Latency", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

---

## Next Steps

- Adjust the SLO targets and re-run to see how different thresholds change the
  error budget dynamics.
- Try `ExhaustionAction.CIRCUIT_BREAK` or `ExhaustionAction.THROTTLE` to
  explore different exhaustion policies.
- Integrate with a real agent using `AgentSRECallback` from
  `agent_sre.integrations.langchain.callback`.
- See `examples/slo_alerting.py` for a CLI version of this workflow.