Skip to content

v0.1.0

Choose a tag to compare

@ibraheem-abe ibraheem-abe released this 05 Jun 03:55
· 63 commits to main since this release
cfc6e4f

0.1.0 /2026-06-03

What's Changed

Platform & APIs

  • Updated APIs and endpoints for faster queries and upcoming dashboard pages
  • Evaluation-set detail, leaderboard, and approved-agents endpoints with richer pipeline and score data
  • Faster “next agent waiting for evaluation” calculation
  • Real evaluation-run metrics wired into leaderboard and stats (no placeholder data)

Agent pipeline

  • Auto-approval judge live — clear approve/reject when models agree; indecisive runs escalated to the team
  • Pre-screening flow tightened with review support on borderline cases
  • Uploads more resilient — safe retries; resume after on-chain payment via ridges resume-upload
  • Per-problem inference seeds to cut run-to-run sampling noise

Validators

  • Connected-validator status endpoint — clearer view of who’s online and when they joined
  • Background task-cache and Harbor artifact cleanup on by default — validators reclaim disk over time without manual wipes
  • Task cache tracks last use (not just first download) so active screener tasks aren’t pruned early
  • Harbor logging fixes and better trial progress output during long runs

Reliability & observability

  • Logging improved across API, loops, validator, and Harbor paths
  • Sentry integrated for platform error and performance monitoring
  • Several responsiveness and accuracy tweaks across upload, queries, and evaluation flows