v0.5.3: Adaptive episode sampling
What's new in v0.5.3
Adaptive episode sampling: CI-width-guaranteed convergence
Previously, `n_episodes` was fixed at a single number. With `--adaptive`, the audit now runs in batches and keeps collecting until every scenario's 95% bootstrap CI on the return ratio is narrow enough — giving you statistical guarantees without over- or under-sampling.
Usage:
```bash
deltatau-audit audit-sb3 --model m.zip --algo ppo --env CartPole-v1
--adaptive --target-ci-width 0.05 --max-episodes 300
```
New flags (on `audit`, `audit-sb3`, `audit-cleanrl`, `audit-hf`):
| Flag | Default | Description |
|---|---|---|
| `--adaptive` | off | Enable adaptive sampling |
| `--target-ci-width` | 0.10 | Target 95% CI width on return ratio |
| `--max-episodes` | 500 | Hard cap on episodes per scenario |
Python API:
```python
result = run_full_audit(
adapter, env_factory,
adaptive=True,
target_ci_width=0.05,
max_episodes=300,
)
n_used = result["robustness"]["n_episodes_used"] # per-scenario count
```
When `adaptive=True`, `result["robustness"]["n_episodes_used"]` contains the actual number of episodes used per scenario. The non-adaptive default path is unchanged.
Flaky test fix
`test_run_full_audit_strict_threshold_changes_quadrant` now uses `seed=42` for deterministic results.
11 new tests (263 total)
```
pip install -U deltatau-audit
```
Full Changelog: v0.5.2...v0.5.3