Skip to content

Verity Benchmark v0.1

Latest

Choose a tag to compare

@Th0rgal Th0rgal released this 16 Jun 06:50

Detailed per-task benchmark artifacts for benchmark version 0.1.

Assets are compressed per-model bundles from results/runs plus a JSON manifest containing run counts, pass/fail summaries, token totals, byte sizes, and SHA-256 hashes.

Important caveats: some assets are partial in-progress snapshots; the manifest records completed and nonzero-usage counts. Kimi K2.7 used kimi/kimi-for-coding, which is forced by the provider to temperature=1 and was rate-paced through the proxy, so it is not strictly comparable to temperature=0 high-throughput runs. The grok asset is the grok alias run, which routed to grok-build-0.1; the real xai/grok-4.3 run is uploaded separately as xai-grok-4-3.