# 04 - Alert Prioritization: From Detection to Decision

**Objective**: Transform raw anomaly scores into actionable SOC workflows.

This notebook extends BSAD beyond detection into **post-detection triage**:
- Risk scoring with uncertainty and entity context
- Alert budget calibration
- Operational metrics (alerts/1k, FPR@fixed recall)
- Entity-enriched alert tickets

---

## The Problem: Alert Fatigue

Detection is only half the battle. A SOC analyst faces:
- Thousands of alerts per day
- Limited time to investigate each
- Need to prioritize: which alerts matter most?

**This notebook answers**: Given a fixed analyst capacity, how do we maximize attack detection while minimizing false positives?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import sys

# Add src to path
sys.path.insert(0, str(Path.cwd().parent / "src"))

from triage import (
    compute_risk_score,
    RiskScorer,
    calibrate_threshold,
    AlertBudget,
    build_alert_budget_curve,
    precision_at_k,
    recall_at_k,
    fpr_at_fixed_recall,
    alerts_per_k_windows,
    workload_reduction,
    ranking_report,
    build_entity_history,
    enrich_alerts,
)

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

## 1. Load Detection Results

We'll use the multi-regime comparison results from CSE-CIC-IDS2018.

In [None]:
# Load multi-regime results
results_path = Path.cwd().parent / "outputs" / "datasets" / "cse-cic-ids2018" / "multi-regime" / "multi_regime_results.json"

with open(results_path) as f:
    multi_regime = json.load(f)

print("Regimes analyzed:")
for regime in multi_regime['regimes']:
    print(f"  - {regime['target_rate']*100:.0f}% attack rate")

In [None]:
# Also load UNSW scored data for detailed analysis
unsw_path = Path.cwd().parent / "outputs" / "datasets" / "unsw-nb15" / "rare-attack" / "scored_df_2pct.parquet"

if unsw_path.exists():
    df = pd.read_parquet(unsw_path)
    print(f"UNSW-NB15 Rare Attack (2%)")
    print(f"  Observations: {len(df):,}")
    print(f"  Attack rate: {df['has_attack'].mean():.2%}")
    print(f"  Columns: {df.columns.tolist()}")
else:
    print("UNSW data not found. Run train_rare_attack_model.py first.")

## 2. The Risk Score Formula

Raw anomaly scores are not actionable. We need a **composite risk score** that incorporates:

```
Risk = w1 * normalize(anomaly_score) 
     + w2 * confidence(1/uncertainty)
     + w3 * novelty(entity_history)
```

Where:
- `anomaly_score`: How unusual is this observation?
- `confidence`: How certain are we about the score?
- `novelty`: Is this entity new (less history = higher risk)?

This is configurable through weights.

In [None]:
# Demonstrate risk score computation
if 'df' in dir():
    # Default weights: (anomaly=0.5, confidence=0.3, novelty=0.2)
    risk_scores = compute_risk_score(
        df,
        score_col="anomaly_score",
        std_col="score_std" if "score_std" in df.columns else None,
        entity_col="entity" if "entity" in df.columns else None,
    )
    
    df["risk_score"] = risk_scores
    
    print("Risk Score Statistics:")
    print(df["risk_score"].describe())

In [None]:
# Visualize risk score vs anomaly score
if 'df' in dir():
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Scatter: anomaly vs risk
    ax1 = axes[0]
    colors = ['crimson' if x else 'steelblue' for x in df['has_attack']]
    ax1.scatter(df['anomaly_score'], df['risk_score'], c=colors, alpha=0.5, s=20)
    ax1.set_xlabel('Anomaly Score (raw)')
    ax1.set_ylabel('Risk Score (composite)')
    ax1.set_title('Anomaly Score vs Risk Score\n(Red = attack, Blue = normal)')
    
    # Distribution comparison
    ax2 = axes[1]
    attacks = df[df['has_attack'] == 1]['risk_score']
    benign = df[df['has_attack'] == 0]['risk_score']
    ax2.hist(benign, bins=30, alpha=0.6, label=f'Normal (n={len(benign)})', color='steelblue', density=True)
    ax2.hist(attacks, bins=30, alpha=0.6, label=f'Attack (n={len(attacks)})', color='crimson', density=True)
    ax2.set_xlabel('Risk Score')
    ax2.set_ylabel('Density')
    ax2.set_title('Risk Score Distribution by Class')
    ax2.legend()
    
    plt.tight_layout()
    plt.show()

## 3. Alert Budget Calibration

SOCs have limited capacity. Instead of asking "what's the best threshold?", we ask:

> **"If I can only review X alerts per day, what recall can I achieve?"**

This is the **alert budget** approach.

In [None]:
if 'df' in dir():
    y_true = df['has_attack'].astype(int).values
    scores = df['anomaly_score'].values
    
    # Build alert budget curve
    budget_curve = build_alert_budget_curve(scores, y_true)
    
    print("Alert Budget Curve:")
    print(budget_curve[['recall_target', 'actual_recall', 'fpr', 'alerts']].to_string(index=False))

In [None]:
if 'df' in dir():
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Alert budget curve
    ax1 = axes[0]
    ax1.plot(budget_curve['actual_recall'] * 100, budget_curve['alerts'], 'o-', 
             color='#2ecc71', linewidth=2, markersize=10)
    ax1.set_xlabel('Recall (%)', fontsize=12)
    ax1.set_ylabel('Total Alerts', fontsize=12)
    ax1.set_title('Alert Budget Curve\n"How many alerts to catch X% of attacks?"', fontsize=14)
    ax1.grid(True, alpha=0.3)
    
    # Annotate key points
    for _, row in budget_curve.iterrows():
        if row['recall_target'] in [0.3, 0.5, 0.7]:
            ax1.annotate(f"{int(row['alerts'])} alerts\n@{int(row['actual_recall']*100)}% recall",
                        xy=(row['actual_recall']*100, row['alerts']),
                        xytext=(10, 10), textcoords='offset points',
                        fontsize=10, ha='left',
                        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
    
    # FPR at fixed recall
    ax2 = axes[1]
    ax2.bar(budget_curve['recall_target'] * 100, budget_curve['fpr'], 
            width=8, color='#9b59b6', alpha=0.7)
    ax2.set_xlabel('Target Recall (%)', fontsize=12)
    ax2.set_ylabel('False Positive Rate', fontsize=12)
    ax2.set_title('Cost of Detection\n"FPR required to achieve X% recall"', fontsize=14)
    
    plt.tight_layout()
    plt.show()

## 4. Ranking Metrics: Precision@k and Recall@k

For SOC analysts, what matters is:
- **Precision@k**: "Of my top k alerts, how many are real attacks?"
- **Recall@k**: "What fraction of all attacks are in my top k?"

These are more actionable than ROC-AUC.

In [None]:
if 'df' in dir():
    # Generate ranking report
    report = ranking_report(y_true, scores)
    
    print("Ranking Metrics Report:")
    print("=" * 40)
    print(report.to_string(index=False))

In [None]:
if 'df' in dir():
    # Visualize precision@k and recall@k
    ks = [5, 10, 25, 50, 100]
    prec = [precision_at_k(y_true, scores, k) for k in ks]
    rec = [recall_at_k(y_true, scores, k) for k in ks]
    
    fig, ax = plt.subplots(figsize=(10, 6))
    
    x = np.arange(len(ks))
    width = 0.35
    
    bars1 = ax.bar(x - width/2, prec, width, label='Precision@k', color='#3498db')
    bars2 = ax.bar(x + width/2, rec, width, label='Recall@k', color='#e74c3c')
    
    ax.set_xlabel('k (number of top alerts)', fontsize=12)
    ax.set_ylabel('Score', fontsize=12)
    ax.set_title('Precision and Recall at Top-k\n"Quality of the ranked alert list"', fontsize=14)
    ax.set_xticks(x)
    ax.set_xticklabels(ks)
    ax.legend()
    ax.set_ylim(0, 1)
    
    # Add value labels
    for bar, val in zip(bars1, prec):
        ax.annotate(f'{val:.2f}', xy=(bar.get_x() + bar.get_width()/2, val),
                   xytext=(0, 3), textcoords='offset points', ha='center', fontsize=9)
    for bar, val in zip(bars2, rec):
        ax.annotate(f'{val:.2f}', xy=(bar.get_x() + bar.get_width()/2, val),
                   xytext=(0, 3), textcoords='offset points', ha='center', fontsize=9)
    
    plt.tight_layout()
    plt.show()

## 5. Multi-Regime Comparison: Operational Metrics

Let's compare BSAD vs Random Forest across attack rate regimes using **operational metrics**.

In [None]:
# Build comparison table from multi-regime results
comparison_data = []

for regime in multi_regime['regimes']:
    rate = regime['target_rate']
    
    for model in ['BSAD', 'RandomForest']:
        metrics = regime['metrics'][model]
        comparison_data.append({
            'Attack Rate': f"{rate*100:.0f}%",
            'Model': model,
            'ROC-AUC': metrics['roc_auc'],
            'FPR@R=0.3': metrics['fpr_at_recall_30'],
            'Alerts/1k': metrics['alerts_per_1k_windows'],
        })

comparison_df = pd.DataFrame(comparison_data)

print("Multi-Regime Comparison: BSAD vs Random Forest")
print("=" * 60)
print(comparison_df.to_string(index=False))

In [None]:
# Visualize the key insight: alert volume reduction
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

rates = [r['target_rate'] * 100 for r in multi_regime['regimes']]
bsad_alerts = [r['metrics']['BSAD']['alerts_per_1k_windows'] for r in multi_regime['regimes']]
rf_alerts = [r['metrics']['RandomForest']['alerts_per_1k_windows'] for r in multi_regime['regimes']]

# Alert volume comparison
ax1 = axes[0]
ax1.plot(rates, bsad_alerts, 'o-', label='BSAD', color='#2ecc71', linewidth=2, markersize=10)
ax1.plot(rates, rf_alerts, 's-', label='Random Forest', color='#e74c3c', linewidth=2, markersize=10)
ax1.set_xlabel('Attack Rate (%)', fontsize=12)
ax1.set_ylabel('Alerts per 1,000 Windows', fontsize=12)
ax1.set_title('Alert Volume at Fixed Recall (30%)\n"Lower is better for SOC"', fontsize=14)
ax1.legend(fontsize=11)
ax1.invert_xaxis()  # Rare events on right
ax1.grid(True, alpha=0.3)

# Reduction factor
ax2 = axes[1]
reduction = [rf/bsad if bsad > 0 else 0 for rf, bsad in zip(rf_alerts, bsad_alerts)]
colors = ['#2ecc71' if r > 1 else '#e74c3c' for r in reduction]
bars = ax2.bar([f"{r:.0f}%" for r in rates], reduction, color=colors, alpha=0.7)
ax2.axhline(y=1, color='black', linestyle='--', linewidth=1)
ax2.set_xlabel('Attack Rate', fontsize=12)
ax2.set_ylabel('Alert Reduction Factor (RF/BSAD)', fontsize=12)
ax2.set_title('BSAD Alert Reduction vs Random Forest\n"Higher = BSAD generates fewer alerts"', fontsize=14)

# Add value labels
for bar, val in zip(bars, reduction):
    ax2.annotate(f'{val:.1f}×', xy=(bar.get_x() + bar.get_width()/2, val),
                xytext=(0, 5), textcoords='offset points', ha='center', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

## 6. Entity Context: Enriched Alert Tickets

Analysts need **context**, not just scores. For each alert, we provide:
- Entity baseline behavior
- Deviation from baseline (σ)
- Historical alert count
- Confidence level

In [None]:
if 'df' in dir() and 'entity' in df.columns and 'event_count' in df.columns:
    # Build entity history
    history = build_entity_history(df, entity_col='entity', value_col='event_count')
    
    # Enrich top alerts
    enriched = enrich_alerts(df, history, top_k=10)
    
    print("Top 10 Enriched Alert Tickets:")
    print("=" * 70)
    
    for i, alert in enumerate(enriched, 1):
        print(f"\n[Ticket #{i}]")
        print(f"  Entity: {alert['entity_id']}")
        print(f"  Anomaly Score: {alert['anomaly_score']:.2f}")
        print(f"  Deviation: {alert['sigma_deviation']:.1f}σ from baseline")
        print(f"  Baseline: {alert['baseline_mean']:.1f} ± {alert['baseline_std']:.1f}")
        print(f"  Current Value: {alert['current_value']:.1f}")
        print(f"  Confidence: {alert['confidence']}")
        print(f"  Prior Alerts: {alert['historical_alerts']}")

## 7. Key Takeaways

### The Operational Value of BSAD

| Metric | Meaning | BSAD Advantage |
|--------|---------|----------------|
| **Alerts/1k** | Workload per analyst | 8-14× fewer alerts |
| **FPR@R=0.3** | Cost of 30% recall | Up to 92% lower |
| **Precision@k** | Quality of top alerts | Comparable |
| **Entity context** | Analyst decision support | Built-in |

### When to Use This Approach

✅ **Use alert prioritization when**:
- SOC has limited analyst capacity
- False positives cause alert fatigue
- Need to justify detection thresholds
- Attacks are rare (<5%)

❌ **Don't use when**:
- Every alert must be reviewed (compliance)
- Real-time blocking required
- Attack rate is high (use classification)

### The Bottom Line

> **Detection systems should be evaluated not only by how well they separate classes, but by how well they manage human attention under uncertainty.**

---

## Reproduce with One Command

All results in this notebook can be reproduced with:

```bash
python scripts/alert_prioritization.py
```

Outputs saved to `outputs/triage/`.