PUMA reproducibility anchor (v2.7.0-baseline-anchor)
·
232 commits
to main
since this release
Canonical, SHA-pinned reproducibility anchor for the PUMA empirical baseline.
Commit: 6671108
Verified reproducible metrics (qwen2.5:3b, N=200, seed=42, temperature=0.0):
- triage_jira F1-macro = 0.5867 +/- 0.01 (contextual-anchoring)
- estimation_tawos MAE = 5.7150 +/- 0.05 SP (zero-shot)
Reproduce:
puma validate-baseline --expected-f1 0.5867 --tolerance 0.01
puma validate-baseline --expected-mae 5.7150 --tolerance 0.05
Reproducibility gates F1=0.5867 and MAE=5.7150 are bit-exact stable across
releases v2.4.0, v2.5.0, v2.6.0 and v2.7.0.