Skip to content

PUMA reproducibility anchor (v2.7.0-baseline-anchor)

Choose a tag to compare

@pumacp pumacp released this 04 Jun 09:42
· 232 commits to main since this release

Canonical, SHA-pinned reproducibility anchor for the PUMA empirical baseline.

Commit: 6671108

Verified reproducible metrics (qwen2.5:3b, N=200, seed=42, temperature=0.0):

  • triage_jira F1-macro = 0.5867 +/- 0.01 (contextual-anchoring)
  • estimation_tawos MAE = 5.7150 +/- 0.05 SP (zero-shot)

Reproduce:
puma validate-baseline --expected-f1 0.5867 --tolerance 0.01
puma validate-baseline --expected-mae 5.7150 --tolerance 0.05

Reproducibility gates F1=0.5867 and MAE=5.7150 are bit-exact stable across
releases v2.4.0, v2.5.0, v2.6.0 and v2.7.0.