Release Verity Benchmark v0.1 · lfglabs-dev/ethereum-verification-benchmark

Detailed per-task benchmark artifacts for benchmark version 0.1.

Assets are compressed per-model bundles from results/runs plus a JSON manifest containing run counts, pass/fail summaries, token totals, byte sizes, and SHA-256 hashes.

Important caveats: some assets are partial in-progress snapshots; the manifest records completed and nonzero-usage counts. Kimi K2.7 used kimi/kimi-for-coding, which is forced by the provider to temperature=1 and was rate-paced through the proxy, so it is not strictly comparable to temperature=0 high-throughput runs. The grok asset is the grok alias run, which routed to grok-build-0.1; the real xai/grok-4.3 run is uploaded separately as xai-grok-4-3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Verity Benchmark v0.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!