Skip to content

v0.5.3: Adaptive episode sampling

Choose a tag to compare

@maruyamakoju maruyamakoju released this 19 Feb 15:43
· 39 commits to main since this release

What's new in v0.5.3

Adaptive episode sampling: CI-width-guaranteed convergence

Previously, `n_episodes` was fixed at a single number. With `--adaptive`, the audit now runs in batches and keeps collecting until every scenario's 95% bootstrap CI on the return ratio is narrow enough — giving you statistical guarantees without over- or under-sampling.

Usage:
```bash
deltatau-audit audit-sb3 --model m.zip --algo ppo --env CartPole-v1
--adaptive --target-ci-width 0.05 --max-episodes 300
```

New flags (on `audit`, `audit-sb3`, `audit-cleanrl`, `audit-hf`):

Flag Default Description
`--adaptive` off Enable adaptive sampling
`--target-ci-width` 0.10 Target 95% CI width on return ratio
`--max-episodes` 500 Hard cap on episodes per scenario

Python API:
```python
result = run_full_audit(
adapter, env_factory,
adaptive=True,
target_ci_width=0.05,
max_episodes=300,
)
n_used = result["robustness"]["n_episodes_used"] # per-scenario count
```

When `adaptive=True`, `result["robustness"]["n_episodes_used"]` contains the actual number of episodes used per scenario. The non-adaptive default path is unchanged.

Flaky test fix

`test_run_full_audit_strict_threshold_changes_quadrant` now uses `seed=42` for deterministic results.

11 new tests (263 total)

```
pip install -U deltatau-audit
```

Full Changelog: v0.5.2...v0.5.3