# Drift Detection with IBM Qiskit Runtime

This notebook demonstrates **production monitoring and drift detection** workflows.

We:
1. Build a **calibration-sensitive benchmark circuit**
2. Establish a **baseline** run (your "known-good" reference)
3. Execute **candidate** runs under different conditions
4. Use devqubit to:
   - capture artifacts and device snapshots
   - compute distribution distance (**TVD**)
   - enforce a **VerifyPolicy** (CI/CD-style)
   - generate a **JUnit** report

> We use **FakeManilaV2** (local testing mode, no IBM credentials needed).  
> In production, point the same code at a real backend/session.

In [None]:
from importlib.metadata import entry_points

if not any(
    ep.name == "qiskit-runtime" for ep in entry_points(group="devqubit.adapters")
):
    raise ImportError(
        "devqubit Qiskit Runtime adapter is not installed.\n"
        "Install with: pip install 'devqubit[qiskit-runtime]'"
    )
else:
    print("Qiskit Runtime adapter available!")

In [None]:
from pathlib import Path
import shutil
import numpy as np

from qiskit import QuantumCircuit
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit_ibm_runtime import SamplerV2
from qiskit_ibm_runtime.fake_provider import FakeManilaV2

# Check for Aer (optional, enables drift simulation)
try:
    from qiskit_aer import AerSimulator
    from qiskit_aer.noise import NoiseModel, depolarizing_error

    AER_AVAILABLE = True
except ImportError:
    AER_AVAILABLE = False

from devqubit import Config, set_config, track
from devqubit.compare import diff, verify_baseline, VerifyPolicy
from devqubit.runs import set_baseline, load_run, get_baseline, list_runs
from devqubit.ci import write_junit

print(f"Aer available: {AER_AVAILABLE} (needed for calibration drift simulation)")

### Setup

Configuration for drift detection:

| Parameter | Value | Purpose |
|-----------|-------|--------|
| `SHOTS_BASELINE` | 4096 | High precision for reference |
| `TVD_THRESHOLD` | 0.03 | Maximum allowed distribution drift |
| `DRIFT_DEPOL_1Q` | 0.01 | 1-qubit depolarizing error for drift simulation |
| `DRIFT_DEPOL_2Q` | 0.03 | 2-qubit depolarizing error for drift simulation |

In [None]:
# Configuration
PROJECT = "production_monitoring"
N_QUBITS = 3
OPTIMIZATION_LEVEL = 2
SEED = 42

# Shots for different scenarios
SHOTS_BASELINE = 4096
SHOTS_LOW = 256
SHOTS_HIGH = 8192

# Drift detection threshold
TVD_THRESHOLD = 0.03

# Drift simulation parameters (for Aer)
DRIFT_DEPOL_1Q = 0.01
DRIFT_DEPOL_2Q = 0.03

# Workspace
WORKSPACE = Path(".devqubit_drift_demo")
if WORKSPACE.exists():
    shutil.rmtree(WORKSPACE)
WORKSPACE.mkdir(parents=True)

set_config(Config(root_dir=WORKSPACE))
np.random.seed(SEED)

# Backend (local testing mode)
fake_backend = FakeManilaV2()
print(f"Backend: {fake_backend.name} ({fake_backend.num_qubits} qubits)")
print(f"TVD threshold: {TVD_THRESHOLD:.1%}")
print(f"Workspace: {WORKSPACE.resolve()}")

---
## 1. Benchmark Circuit

We design a **calibration-sensitive benchmark** that responds to:
- Gate errors (RX, RZ rotations)
- Crosstalk (CNOT gates)
- Readout errors (measurement)

The output distribution becomes our "health signal" over time.

In [None]:
def create_benchmark_circuit(n_qubits: int) -> QuantumCircuit:
    """Create a calibration-sensitive benchmark circuit."""
    qc = QuantumCircuit(n_qubits, name="calibration_benchmark")

    # Layer 1: superposition
    for q in range(n_qubits):
        qc.h(q)

    # Layer 2: entanglement chain
    for q in range(n_qubits - 1):
        qc.cx(q, q + 1)

    # Layer 3: rotations (gate-error sensitive)
    for q in range(n_qubits):
        qc.rz(np.pi / 4, q)
        qc.rx(np.pi / 3, q)

    # Layer 4: more entanglement (depth)
    for q in range(n_qubits - 1):
        qc.cx(q, q + 1)

    qc.measure_all()
    return qc


def concentration_topk(counts: dict, k: int = 3) -> float:
    """Fraction of probability in top-k outcomes."""
    total = sum(counts.values())
    top = sorted(counts.values(), reverse=True)[:k]
    return sum(top) / total if total > 0 else 0.0


def shannon_entropy(counts: dict) -> float:
    """Shannon entropy in bits."""
    total = sum(counts.values())
    if total == 0:
        return 0.0
    probs = np.array(list(counts.values())) / total
    return float(-np.sum(probs * np.log2(probs + 1e-12)))

In [None]:
# Build and transpile circuit once
circuit = create_benchmark_circuit(N_QUBITS)
print("Original circuit:")
print(circuit.draw())

pm = generate_preset_pass_manager(
    backend=fake_backend,
    optimization_level=OPTIMIZATION_LEVEL,
)
circuit_isa = pm.run(circuit)

print("\nTranspiled (ISA) circuit:")
print(circuit_isa.draw(fold=300))

### Transpilation in devqubit

**Qiskit Runtime V2 primitives require ISA-compatible circuits** - they do not perform layout, routing, or translation internally.

devqubit handles this automatically with three transpilation modes:

| Mode | Behavior |
|------|----------|
| `'auto'` (default) | Transpile only if circuit is not ISA-compatible |
| `'managed'` | Always transpile through devqubit |
| `'manual'` | Never transpile (user is responsible) |

**How it works:** When you call `run.wrap(sampler)`, devqubit checks if each circuit is ISA-compatible with the backend target. If not (in `'auto'` or `'managed'` mode), it transpiles using `generate_preset_pass_manager`.

**In this notebook:** We transpile manually to show the ISA circuit for educational purposes. devqubit will detect that the circuit is already ISA-compatible and skip re-transpilation.

```python
# Option A: Manual transpilation (what we do here)
circuit_isa = pm.run(circuit)
job = tracked_sampler.run([circuit_isa])

# Option B: Let devqubit transpile (simpler)
job = tracked_sampler.run([circuit])  # devqubit transpiles in 'auto' mode
```

---
## 2. Establish Baseline

A baseline is the "known-good" distribution we compare against later.

**Important:** We use `run.wrap()` to track the sampler execution. This captures:
- Circuit artifacts
- Device snapshot
- Execution metadata
- Result counts

In [None]:
def make_sampler(mode, shots: int) -> SamplerV2:
    """Create a SamplerV2 with specified shot budget."""
    sampler = SamplerV2(mode=mode)
    sampler.options.default_shots = shots
    return sampler


def run_benchmark(
    *,
    project: str,
    sampler: SamplerV2,
    circuit: QuantumCircuit,
    params: dict,
    tags: dict,
) -> str:
    """Execute benchmark with full devqubit tracking."""

    with track(project=project) as run:
        # Wrap the sampler to capture execution context
        tracked_sampler = run.wrap(sampler)

        run.log_params(params)
        run.set_tags(tags)

        # Execute via tracked sampler
        job = tracked_sampler.run([circuit], devqubit_transpilation_mode="manual")
        pub_result = job.result()[0]
        counts = pub_result.data.meas.get_counts()

        # Log distribution metrics
        run.log_metrics(
            {
                "concentration_top3": concentration_topk(counts, k=3),
                "unique_outcomes": len(counts),
                "entropy_bits": shannon_entropy(counts),
            }
        )

        return run.run_id

In [None]:
baseline_sampler = make_sampler(mode=fake_backend, shots=SHOTS_BASELINE)

baseline_id = run_benchmark(
    project=PROJECT,
    sampler=baseline_sampler,
    circuit=circuit_isa,
    params={
        "circuit_name": "calibration_benchmark",
        "n_qubits": N_QUBITS,
        "shots": SHOTS_BASELINE,
        "optimization_level": OPTIMIZATION_LEVEL,
        "backend": fake_backend.name,
    },
    tags={"role": "baseline", "scenario": "reference"},
)

# Register as project baseline
set_baseline(PROJECT, baseline_id)
print(f"Baseline run set: {baseline_id}")

### 3. Inspect Device Snapshot & Fingerprints

devqubit captures a **device snapshot** that helps answer:
- Did the hardware configuration/calibration change between runs?
- Is this the same circuit structure?

**Fingerprints** enable fast comparison without loading full artifacts.

In [None]:
baseline_rec = load_run(baseline_id)

print("Device Snapshot")
print("=" * 60)

device_snap = baseline_rec.record.get("device_snapshot", {})
if device_snap:
    print(f"Backend:     {device_snap.get('backend_name')}")
    print(f"Captured at: {device_snap.get('captured_at', 'N/A')[:19]}")
    print(f"Qubits:      {device_snap.get('num_qubits')}")

    cal = device_snap.get("calibration", {})
    if cal:
        print("\nCalibration summary (median):")
        for key in ("t1_us", "t2_us", "readout_error"):
            if key in cal:
                print(f"  {key:>14s}: {cal[key].get('median', 'N/A')}")
else:
    print("(Device snapshot not available for this backend)")

print("\nFingerprints:")
for key, value in baseline_rec.fingerprints.items():
    print(f"  {key}: {value[:48]}...")

### 4. Candidate Runs (Production Scenarios)

We run the same circuit under different conditions:

| Scenario | Shots | Backend | Expected Result |
|----------|-------|---------|----------------|
| `low_shots` | 256 | FakeManila | Higher TVD (shot noise) |
| `stable` | 4096 | FakeManila | Should pass |
| `high_shots` | 8192 | FakeManila | Lower TVD |
| `calibration_drift` | 4096 | Aer + noise | Should fail (drift) |

In [None]:
def make_drift_backend(depol_1q: float, depol_2q: float, seed: int):
    """Create Aer backend with extra depolarizing noise (drift simulation)."""
    if not AER_AVAILABLE:
        return None

    noise_model = NoiseModel()

    # IBM basis gates after transpilation
    noise_model.add_all_qubit_quantum_error(
        depolarizing_error(depol_1q, 1), ["rz", "sx", "x", "id"]
    )
    noise_model.add_all_qubit_quantum_error(depolarizing_error(depol_2q, 2), ["cx"])

    return AerSimulator(noise_model=noise_model, seed_simulator=seed)


# Build scenarios
scenarios = [
    {
        "name": "low_shots",
        "desc": "Reduced shot budget",
        "shots": SHOTS_LOW,
        "mode": fake_backend,
        "backend_name": fake_backend.name,
    },
    {
        "name": "stable",
        "desc": "Normal operation",
        "shots": SHOTS_BASELINE,
        "mode": fake_backend,
        "backend_name": fake_backend.name,
    },
    {
        "name": "high_shots",
        "desc": "Higher precision run",
        "shots": SHOTS_HIGH,
        "mode": fake_backend,
        "backend_name": fake_backend.name,
    },
]

# Add drift scenario if Aer available
drift_backend = make_drift_backend(DRIFT_DEPOL_1Q, DRIFT_DEPOL_2Q, SEED)
if drift_backend:
    scenarios.append(
        {
            "name": "calibration_drift",
            "desc": "Simulated calibration degradation (Aer noise)",
            "shots": SHOTS_BASELINE,
            "mode": drift_backend,
            "backend_name": "aer_simulator",
        }
    )
    print("Added Aer-based drift scenario.")
else:
    print("Drift scenario skipped (qiskit-aer not installed).")

In [None]:
candidate_ids = []

print("\nExecuting production scenarios...")
print("-" * 60)

for sc in scenarios:
    sampler = make_sampler(mode=sc["mode"], shots=sc["shots"])

    run_id = run_benchmark(
        project=PROJECT,
        sampler=sampler,
        circuit=circuit_isa,
        params={
            "circuit_name": "calibration_benchmark",
            "n_qubits": N_QUBITS,
            "shots": sc["shots"],
            "optimization_level": OPTIMIZATION_LEVEL,
            "backend": sc["backend_name"],
            "scenario_description": sc["desc"],
        },
        tags={"role": "candidate", "scenario": sc["name"]},
    )

    candidate_ids.append({"name": sc["name"], "run_id": run_id})
    print(f"{sc['name']:>18s} -> {run_id}")

### 5. Drift Analysis (TVD)

**Total Variation Distance** measures how different two distributions are:

$\text{TVD}(P, Q) = \frac{1}{2} \sum_x |P(x) - Q(x)|$

| TVD | Interpretation |
|-----|---------------|
| 0.00 | Identical distributions |
| 0.01-0.02 | Expected shot noise |
| 0.03-0.05 | Investigate |
| > 0.05 | Significant drift |

devqubit computes TVD automatically in `diff()` and `verify_baseline()`.

In [None]:
print("Drift Analysis")
print("=" * 60)
print(f"Threshold: TVD ≤ {TVD_THRESHOLD:.1%}\n")

for cand in candidate_ids:
    comp = diff(baseline_id, cand["run_id"])

    sc = load_run(cand["run_id"]).tags.get("scenario", "?")
    tvd = comp.tvd

    if tvd is None:
        print(f"{sc:>18s}: TVD not available")
        continue

    status = "[OK] OK" if tvd <= TVD_THRESHOLD else "[!] DRIFT"
    print(f"{sc:>18s}: TVD = {tvd:.4f}  {status}")

### 6. CI/CD Verification (Policy)

In pipelines you want **automated pass/fail**. Define a `VerifyPolicy`:

```python
# Example CI usage:
result = verify_baseline(candidate_id, project=PROJECT, policy=policy)
assert result.ok, f"Drift detected: {result.failures}"
```

In [None]:
production_policy = VerifyPolicy(
    params_must_match=False,  # Allow different shot counts
    program_must_match=True,  # Same circuit required
    fingerprint_must_match=False,  # Hardware snapshots can change
    tvd_max=TVD_THRESHOLD,
)

print("CI/CD Verification")
print("=" * 60)

verification_results = []

for cand in candidate_ids:
    candidate_rec = load_run(cand["run_id"])
    sc = candidate_rec.tags.get("scenario", "?")

    result = verify_baseline(
        candidate=candidate_rec,
        project=PROJECT,
        policy=production_policy,
    )

    verification_results.append({"name": sc, "result": result})

    status = "[OK] PASSED" if result.ok else "[X] FAILED"
    print(f"\n{sc}: {status}")

    if not result.ok:
        for failure in result.failures:
            print(f"  └─ {failure}")

### 7. JUnit Report

Export to JUnit XML for CI systems (Jenkins, GitHub Actions, GitLab CI, Azure DevOps).

In [None]:
# Export the last scenario's result
last_result = verification_results[-1]["result"]

junit_path = WORKSPACE / "test-results.xml"
write_junit(last_result, junit_path)

print(f"JUnit report written to: {junit_path}")
print("\n--- test-results.xml ---")
print(junit_path.read_text())

### 8. Detailed Comparison

When something fails, get a human-readable diff for debugging.

In [None]:
# Compare baseline vs. last candidate
last_candidate_id = candidate_ids[-1]["run_id"]
comparison = diff(baseline_id, last_candidate_id)
print(comparison)

### 9. Monitoring Dashboard

A simple dashboard pulling all runs from the registry and verifying against baseline.

In [None]:
all_runs = list_runs(project=PROJECT)
baseline_info = get_baseline(PROJECT)

print("Production Monitoring Dashboard")
print("=" * 80)
print(
    f"Baseline: {baseline_info['run_id'][:16]}... (set: {baseline_info['set_at'][:10]})"
)
print(f"Policy: TVD ≤ {TVD_THRESHOLD:.1%}\n")

print(
    f"{'Run ID':<18} {'Role':<10} {'Scenario':<18} {'Entropy':>8} {'TVD':>8} {'Status':>8}"
)
print("-" * 80)

for run_info in all_runs:
    rec = load_run(run_info["run_id"])
    role = rec.tags.get("role", "N/A")
    scenario = rec.tags.get("scenario", "-")
    entropy = rec.metrics.get("entropy_bits", 0.0)

    if role == "candidate":
        v = verify_baseline(candidate=rec, project=PROJECT, policy=production_policy)
        comp = diff(baseline_id, rec.run_id)
        tvd_str = f"{comp.tvd:.4f}" if comp.tvd else "N/A"
        status = "[OK]" if v.ok else "[!]"
    else:
        tvd_str = "-"
        status = "REF"

    print(
        f"{rec.run_id[:16]:<18} {role:<10} {scenario:<18} {entropy:>8.3f} {tvd_str:>8} {status:>8}"
    )

### Summary

| Component | Purpose |
|-----------|--------|
| **Benchmark circuit** | Calibration-sensitive health signal |
| **Transpilation** | devqubit auto-transpiles non-ISA circuits (or use manual) |
| **run.wrap()** | Capture execution context automatically |
| **Device snapshot** | Track hardware/calibration changes |
| **Fingerprints** | Fast comparison without full data |
| **TVD analysis** | Quantify distribution drift |
| **VerifyPolicy** | Automated pass/fail for CI/CD |
| **JUnit export** | Integration with CI systems |

**Key insight:** The `calibration_drift` scenario shows real drift detection — extra depolarizing noise changes the distribution enough to exceed our threshold, triggering a verification failure.

In [None]:
shutil.rmtree(WORKSPACE)
print(f"Workspace cleaned up: {WORKSPACE}")