Release v7.7.5 — styxx gauntlet: the empirical floor as a public challenge · fathom-lab/styxx

styxx 7.7.5 — `styxx gauntlet`: the empirical floor as a public challenge with deployable tooling

The seven-method empirical floor shipped in 7.7.3 is now a runnable public challenge. Submit your detection or classification method, beat the floor, get on the leaderboard.

Added

styxx gauntlet --method <module:attr> — public-challenge runner. Loads any user-supplied detection or classification method, runs it against the labeled benchmark, scores it against pre-registered bars.
styxx.gauntlet module — programmatic API for use in scripts and CI.
LEADERBOARD.md — public leaderboard with the seven-method floor as Baseline-001. Full submission protocol, locked bars, sanity submissions (majority + zero detector), honest scope statement, citation block.
20 new tests covering metric primitives, baseline failures, perfect-oracle upper bounds (validates the bars are beatable when given perfect signal), error handling, end-to-end CLI invocation. Full suite: 1078 passed, 8 skipped.

The frame

The seven-method floor we shipped IS the public bar. We assert we couldn't beat it with the seven methods we tested. The gauntlet invites the field to try. If anyone beats it, the synthesis gets revised; if nobody can, the floor compounds across submissions.

Two task modes

task	method signature	bars (PASS = all)
classification	`predict(question: str) -> {"class": ...}`	K1 folklore F1 ≥ 0.70, K2 accuracy ≥ 0.65, K3 cross-corpus F1 ≥ 0.60
detection	`detect(question: str, response: str) -> {"score": float}`	D1 misconception AUC ≥ 0.70, D2 folklore AUC ≥ 0.70

Install

```
pip install -U "styxx[mcp,nli]==7.7.5"
```

Submit

See LEADERBOARD.md for the submission protocol.

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v7.7.5 — styxx gauntlet: the empirical floor as a public challenge

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

styxx 7.7.5 — `styxx gauntlet`: the empirical floor as a public challenge with deployable tooling

Added

The frame

Two task modes

Install

Submit

Uh oh!

v7.7.5 — styxx gauntlet: the empirical floor as a public challenge

styxx 7.7.5 — styxx gauntlet: the empirical floor as a public challenge with deployable tooling

Added

The frame

Two task modes

Install

Submit

Uh oh!

styxx 7.7.5 — `styxx gauntlet`: the empirical floor as a public challenge with deployable tooling