v0.1.0

ibraheem-abe released this 05 Jun 03:55

· 63 commits to main since this release

cfc6e4f

0.1.0 /2026-06-03

What's Changed

Platform & APIs

Updated APIs and endpoints for faster queries and upcoming dashboard pages
Evaluation-set detail, leaderboard, and approved-agents endpoints with richer pipeline and score data
Faster “next agent waiting for evaluation” calculation
Real evaluation-run metrics wired into leaderboard and stats (no placeholder data)

Agent pipeline

Auto-approval judge live — clear approve/reject when models agree; indecisive runs escalated to the team
Pre-screening flow tightened with review support on borderline cases
Uploads more resilient — safe retries; resume after on-chain payment via ridges resume-upload
Per-problem inference seeds to cut run-to-run sampling noise

Validators

Connected-validator status endpoint — clearer view of who’s online and when they joined
Background task-cache and Harbor artifact cleanup on by default — validators reclaim disk over time without manual wipes
Task cache tracks last use (not just first download) so active screener tasks aren’t pruned early
Harbor logging fixes and better trial progress output during long runs

Reliability & observability

Logging improved across API, loops, validator, and Harbor paths
Sentry integrated for platform error and performance monitoring
Several responsiveness and accuracy tweaks across upload, queries, and evaluation flows

Assets 2