Skip to content

Releases: scasella/society-of-thought-bench

v0.1.0-preview

31 Mar 21:20

Choose a tag to compare

v0.1.0-preview

This is the first public preview of society-of-thought-bench.

What is included:

  • the benchmark package
  • the current best paper-faithful adapter release
  • the main debate-vs-monologue comparison result
  • helper scripts for inspection, evaluation, and training

What this release claims:

  • the benchmark can measure visible multi-persona debate inside exposed thinking traces
  • the current best model shows that paper-style behavior in an inspectable way
  • on this benchmark, the debate version beats the monologue version by a meaningful margin

What this release does not claim:

  • that the model is fully reliable
  • that the benchmark is final
  • that the gain is already proven outside this benchmark

Links: