# Stage 14 — Deployment & Monitoring (Conceptual Overview)

This notebook demonstrates **conceptual** patterns for logging and simple checks. No real infrastructure.

**Layers to monitor:** Data · Model · System · Business


## 0. Imports & Setup
- We use only standard library to keep focus on concepts.
- Output files are written to a temporary folder (`/tmp/`).

In [None]:
from pathlib import Path
from datetime import datetime
import random, time, csv, json, math, statistics

BASE = Path('./tmp/stage14_demo')
BASE.mkdir(parents=True, exist_ok=True)
METRICS_CSV = BASE / 'metrics.csv'
if not METRICS_CSV.exists():
    with METRICS_CSV.open('w', newline='') as f:
        w = csv.writer(f)
        w.writerow(['ts','name','value','layer','model_version','window','context'])
print(f'Writing metrics to: {METRICS_CSV}')

## 1. Risk → Metric Mapping (conceptual)

| Risk | Metric | Layer |
|---|---|---|
| Schema change | Schema hash mismatch | Data |
| Null spike | % nulls by column | Data |
| Concept drift | Rolling MAE/AUC | Model |
| Latency spike | p95 latency (ms) | System |
| Silent degradation | Calibration / business KPI shift | Business |

## 2. Minimal Metric Logger
Append a metric row to `metrics.csv` with ISO timestamp and a small JSON context.

In [None]:
def log_metric(name, value, *, layer, model_version='v1', window='1m', **ctx):
    row = [
        datetime.utcnow().isoformat(timespec='seconds') + 'Z',
        name,
        float(value),
        layer,
        model_version,
        window,
        json.dumps(ctx)
    ]
    with METRICS_CSV.open('a', newline='') as f:
        csv.writer(f).writerow(row)
    return row

# quick smoke test
log_metric('job_success', 1.0, layer='system', window='1d', job='nightly_batch')

## 3. Simulate Predictions & Rolling MAE
We simulate a prediction task and compute a rolling error to log as a model metric.

In [None]:
random.seed(42)
y_true, y_pred = [], []
for t in range(120):
    y = random.random()  # ground truth in [0,1]
    # simulate a model that slowly degrades after t>80
    noise = 0.05 if t <= 80 else 0.15
    pred = min(1.0, max(0.0, y + random.uniform(-noise, noise)))
    y_true.append(y); y_pred.append(pred)
    if t >= 19:
        window_true = y_true[t-19:t+1]
        window_pred = y_pred[t-19:t+1]
        mae = sum(abs(a-b) for a,b in zip(window_true, window_pred))/20
        row = log_metric('rolling_mae', mae, layer='model', window='20obs', step=t)
        # fake latency (ms)
        latency = random.randint(60, 160) if t <= 80 else random.randint(120, 300)
        log_metric('p95_latency_ms', latency, layer='system', window='1m', step=t)
row[:3]

## 4. Simple Threshold Checks
Toy rules: if MAE > 0.12 or p95 latency > 250ms → alert.

In [None]:
RULES = {
    'rolling_mae': {'op': '>', 'value': 0.12, 'severity': 'warning'},
    'p95_latency_ms': {'op': '>', 'value': 250, 'severity': 'warning'}
}

def check_threshold(name, value):
    r = RULES.get(name)
    if not r:
        return {'status': 'ok'}
    op = r['op']; thresh = r['value']
    ok = (value < thresh) if op == '>' else (value > thresh)
    return {'status': 'ok' if ok else 'alert', 'rule': r}

# example
check_threshold('rolling_mae', 0.15)

## 5. Batch Scoring Skeleton (pseudocode)

`
def nightly_batch():
    data = load_inputs()
    preds = model.predict(data)
    write_output(preds)
    # monitoring artifacts
    log_metric('row_count', len(data), layer='data', window='1d')
    drift = compute_simple_drift(data)  # e.g., population mean shift
    log_metric('feature_mean_shift', drift['amount'], layer='data', feature=drift['feature'])
`

Key takeaway: even simple logs provide early warning signals.

## 6. Dashboard Contents (conceptual)
- **Data**: freshness minutes, %nulls, schema hash
- **Model**: rolling MAE/AUC, calibration, stability index
- **System**: p95 latency, error rate, availability
- **Business**: approval rate, bad-rate, revenue per decision
- **Runbook**: on-call, remediation steps, suppression windows

## 7. Handoff README — What to Include
- Endpoint(s) & auth
- Data contracts & schema versioning
- Model versioning & rollback
- Monitoring metrics & alert thresholds
- Ownership: contacts, escalation
- Change log

> This course does **not** require real deployment; we prepare you to collaborate with platform teams.