Skip to content

v0.5.2: Richer failure diagnostics

Choose a tag to compare

@maruyamakoju maruyamakoju released this 19 Feb 14:58
· 40 commits to main since this release

What's new in v0.5.2

Failure diagnostics: pattern → root cause → fix

Every audit now produces a structured failure analysis. When the agent fails or degrades under a timing perturbation, you immediately see why and how to fix it — not just a bare score.

5 named failure patterns:

Pattern Scenario Fix summary
Speed Jitter Sensitivity jitter Retrain with JitterWrapper
Observation Recency Dependency delay Add ObservationDelayWrapper to training
Frequency Spike Fragility spike Train with PiecewiseSwitchWrapper
Observation Noise Sensitivity obs_noise Add ObsNoiseWrapper to training
Extreme Frequency Fragility speed_5x Wide speed randomization + Δτ module

Terminal output (_print_summary) now shows:

  Failure Analysis  (1 FAIL scenario: jitter)
  ──────────────────────────────────────────────────────
  Pattern:  Speed Jitter Sensitivity  [FAIL]
  Cause:    The agent is sensitive to step-by-step speed fluctuations...
  Fix:      Retrain with speed jitter: randomize the environment speed...

--format markdown outputs a blockquote section with pattern, cause, and fix.

HTML reports include a styled amber Failure Analysis card after the Prescription section.

run_full_audit() result gains a diagnosis key:

result["diagnosis"]["primary_pattern"]     # "Speed Jitter Sensitivity"
result["diagnosis"]["root_cause"]          # explanation string
result["diagnosis"]["fix_recommendation"]  # fix string
result["diagnosis"]["failing_scenarios"]   # ["jitter"]
result["diagnosis"]["issues"]              # full list, sorted by severity

17 new tests (252 total)

pip install -U deltatau-audit

Full Changelog: v0.5.1...v0.5.2