v7.7.7 — leaderboard CLI + concrete baselines + CI auto-verification
styxx 7.7.7 — styxx leaderboard CLI + concrete reference baselines + CI auto-verification
The empirical floor is now a runnable, terminal-accessible, CI-verified public challenge.
Added
styxx leaderboard— lightweight CLI that displays the current gauntlet leaderboard. Reads bundledLEADERBOARD.mdfromstyxx/_data/so it works on clean pip install.--rows-onlyflag filters to just the leaderboard rows for quick scanning.submissions/baseline_002_classifier/— the shipped dark-core classifier wrapped in the gauntlet. 1/3 bars passed (K2 accuracy 0.77 ✓; K1 F1 0.42 ✗; K3 F1 0.36 ✗).submissions/baseline_003_length/— a deliberately bad length-only heuristic. 0/3 bars; anchors the leaderboard floor with a real numeric row..github/workflows/gauntlet-pr.yml— CI workflow that auto-verifies external submission PRs (1e-3 float tolerance).submissions/GAUNTLET.md— submission protocol documentation.styxx/_data/LEADERBOARD.mdbundled as package data.- 4 new tests; full suite: 1083 passed, 8 skipped.
What this delivers
pip install styxx==7.7.7
styxx leaderboard --rows-only # see the floor
# ... write method.py + submission.json ...
styxx gauntlet --method submissions.<name>.method:predict --task classification
# ... open PR; CI re-verifies reported numbers ...The leaderboard is trustworthy by construction.
🤖 Generated with Claude Code
Zenodo DOI: 10.5281/zenodo.20418532 (v24 in the concept chain at 10.5281/zenodo.19326174; published 2026-05-27).