Skip to content

v7.7.7 — leaderboard CLI + concrete baselines + CI auto-verification

Choose a tag to compare

@github-actions github-actions released this 27 May 20:28
· 484 commits to main since this release

styxx 7.7.7 — styxx leaderboard CLI + concrete reference baselines + CI auto-verification

The empirical floor is now a runnable, terminal-accessible, CI-verified public challenge.

Added

  • styxx leaderboard — lightweight CLI that displays the current gauntlet leaderboard. Reads bundled LEADERBOARD.md from styxx/_data/ so it works on clean pip install. --rows-only flag filters to just the leaderboard rows for quick scanning.
  • submissions/baseline_002_classifier/ — the shipped dark-core classifier wrapped in the gauntlet. 1/3 bars passed (K2 accuracy 0.77 ✓; K1 F1 0.42 ✗; K3 F1 0.36 ✗).
  • submissions/baseline_003_length/ — a deliberately bad length-only heuristic. 0/3 bars; anchors the leaderboard floor with a real numeric row.
  • .github/workflows/gauntlet-pr.yml — CI workflow that auto-verifies external submission PRs (1e-3 float tolerance).
  • submissions/GAUNTLET.md — submission protocol documentation.
  • styxx/_data/LEADERBOARD.md bundled as package data.
  • 4 new tests; full suite: 1083 passed, 8 skipped.

What this delivers

pip install styxx==7.7.7
styxx leaderboard --rows-only         # see the floor
# ... write method.py + submission.json ...
styxx gauntlet --method submissions.<name>.method:predict --task classification
# ... open PR; CI re-verifies reported numbers ...

The leaderboard is trustworthy by construction.

🤖 Generated with Claude Code


Zenodo DOI: 10.5281/zenodo.20418532 (v24 in the concept chain at 10.5281/zenodo.19326174; published 2026-05-27).