Releases: scasella/society-of-thought-bench
Releases · scasella/society-of-thought-bench
v0.1.0-preview
v0.1.0-preview
This is the first public preview of society-of-thought-bench.
What is included:
- the benchmark package
- the current best paper-faithful adapter release
- the main debate-vs-monologue comparison result
- helper scripts for inspection, evaluation, and training
What this release claims:
- the benchmark can measure visible multi-persona debate inside exposed thinking traces
- the current best model shows that paper-style behavior in an inspectable way
- on this benchmark, the debate version beats the monologue version by a meaningful margin
What this release does not claim:
- that the model is fully reliable
- that the benchmark is final
- that the gain is already proven outside this benchmark
Links: