A public dashboard that ranks open-source repos by how friendly they are for AI coding agents — per model.
Next.js 16 + SQLite (better-sqlite3), styled with Tailwind CSS 4. Spans GitHub, GitLab, and Bitbucket out of the box. Current release: 0.1.0.
AI coding agents — Claude Code, Cursor, Devin, GPT-5 Codex — succeed dramatically more often on some repos than others. The difference is rarely the agent; it's the repo. A codebase with fast tests, a clear AGENTS.md, a Makefile, and CI is a massively different environment than one without.
Goal: a public leaderboard where anyone can look up a repo and see:
- How agent-friendly is it overall?
- How friendly is it for my agent? (Claude Code weights
AGENTS.mdheavily; Devin cares about CI + reproducible envs; Cursor prefers strong types + a good README.) - Why does it rank there, and what would it take to improve for my agent? — top-3 gaps ranked by score-gain.
Two audiences:
- Maintainers — a ranked checklist of what to fix to make their repo friendlier to agents.
- Agent users — when picking between forks, packages, or alternatives, agent-friendliness is a real dependency-choice signal.
| Project | What it does | What we do differently |
|---|---|---|
| Factory.ai Agent Readiness | Single-tenant scanner: you point it at your repo and get a score with auto-fix PRs. 8 pillars, 5 maturity levels. | Public + cross-forge + per-model. Factory rates your own repo in isolation — we rank across repos, on GitHub/GitLab/Bitbucket, and the ranking changes based on which agent you care about. |
kodustech/agent-readiness |
OSS alternative to Factory — static checks, local scan. | We're a public ranking service, not a local scanner. Scoring logic is similar in spirit; the product is the leaderboard and the per-model lens. |
jpequegn/agent-readiness-score |
Explicitly "inspired by Factory.ai" — OSS framework to measure codebase readiness. | Same delta as above — single-tenant vs. public + per-model. |
viktor-silakov/ai-ready |
39 checks, 7 pillars, 10+ languages. Scanner. | Same delta. |
ambient-code/agentready |
Assesses git repos against evidence-based attributes. | Same delta. |
| Cloudflare Agent Readiness | Rates websites for agent consumption. | Wrong object — we rate code repos. |
| Fern Agent Score | Public leaderboard rating documentation sites for AI-readiness. | Closest in shape — public + leaderboard — but scores docs, not code. |
| Clarvia | Scoring platform for MCP servers (Agent Experience Optimization). | Adjacent — rates tools, not repos. |
| SWE-Bench (Verified / Pro), GitTaskBench, FeatureBench, HAL, PR Arena | Rank agents on a fixed set of repos. | We want the transpose: rank repos per agent. Our measurement story (once the benchmark harness lands) looks a lot like these, with the axes flipped. |
| GitHub Trending, ossinsight | Popularity / activity rankings. | Stars ≠ agent-friendliness. |
Our differentiators, in one line: cross-forge, public, per-model, and explainable — every score decomposes to signals, and every repo page shows what to improve next for the selected model.
Not pretending the idea is free of risk:
- Per-model scoring is the hardest part and the easiest to fake. Today the weights are illustrative. Real "Claude ranks this higher than GPT-5" requires actually running each agent on each repo. That's
tasks/0.3.0/01-benchmark-harness.md. - Factory.ai is already in this space. Differentiation has to stay sharp.
- Public-shaming risk. Ranking #47,823 without consent invites angry maintainers. Planned via
tasks/0.4.0/04-opt-out-claim-flow.md. - Score gaming. Once public, people add boilerplate
AGENTS.mdto pass the rubric without being useful. Dynamic (actually-run-an-agent) checks are the counter — see benchmark harness. - Freshness. Scores decay with every push. Webhook-driven rescoring is roadmap.
See /methodology in the running app for a candid walkthrough of what's measured today and what isn't.
Short answer: low risk. The app:
- Only reads files after a shallow clone; never executes anything from the cloned tree (no
npm install, no post-clone hooks). - Uses
--depth 1 --single-branchand never clones submodules. - Runs all SQL via prepared statements.
- Renders through React (auto-escaping); no
dangerouslySetInnerHTML. - Has no auth and no writable API endpoints — read-only dashboard.
Operational concerns for a public launch (not code-level security):
- Disk quotas for
tmp-clones/. - Rate limiting the public API.
- Sandbox the cloner in a container (future-proofing against hypothetical git CVEs).
Auth and per-maintainer controls land with the opt-out / claim flow in v0.4.0.
bun install
bun run prepare-hooks # once — installs lefthook pre-commit (Biome + tsc)
bun run seed # score the curated set (~28 repos) across GH / GL / BB
bun run dev # http://localhost:3000Score a single repo:
bun run score https://github.com/vercel/next.js
bun run score https://gitlab.com/gitlab-org/cli
bun run score https://bitbucket.org/snakeyaml/snakeyaml
bun run score /path/to/local/checkoutOptional: GITHUB_TOKEN / GITLAB_TOKEN in env to raise API rate limits.
lib/version.ts and package.json carry the current release number (currently 0.1.0). Bumps happen only when we actually cut a release — never when merging intermediate work. The version pill in the header surfaces the number directly; /changelog lists what each release shipped.
| Choice | Why | When we'd revisit |
|---|---|---|
| Next.js 16 (App Router) | Future features (filters, charts, auth, diff views) are React's territory. File-based routing + API routes replace hand-rolled HTTP cleanly. | Unlikely. The core scorer is stack-agnostic, only app/ depends on this. |
Node runtime (with tsx for CLI scripts) |
Matches Vercel's serverless runtime — no Bun-only imports in prod. Bun still works locally as a fast package manager. | Unlikely — only if the deployment target changes. |
| Tailwind CSS 4 | Zero-config via @theme tokens, no tailwind.config.* needed. Tight bundle output. |
Would only leave for something with a stronger design-system story. |
better-sqlite3 |
Single file, inspectable, zero ops overhead. Node-native so Vercel's serverless runtime can load it directly. | Postgres when concurrent writers / access control arrive (tasks/1.0.0/01-postgres-migration.md). |
| Server components + links, no client JS | Cheap, fast, SEO-friendly. | When a feature genuinely needs interactivity — e.g. live filter combinators. |
Shallow git clones (--depth 1 --single-branch) |
Bandwidth + speed. Current signals don't need history. | History-aware signals → host APIs or --filter=blob:none partial clones. |
| Exact-pinned deps | Deterministic scoring across environments. | Never. |
| One file per signal | Each signal is a small, independent concern — keeps git log and code review focused. |
When we bundle signals into dynamic checks (then the unit becomes the bundle). |
- Static signals need to read file contents (AGENTS.md length,
pyproject.toml [tool.X]sections, package.json scripts count) — not just existence. - One clone is faster than N API calls for content-heavy scoring, and respects rate limits.
- Any real version of this dashboard needs dynamic signals (run tests, run an agent). Those absolutely need code on disk.
app/ Next.js App Router — pages + API + SEO
layout.tsx root layout, root metadata (OG + Twitter cards)
page.tsx leaderboard
repo/[id]/ repo detail (includes generateMetadata for shareable titles)
methodology/ how scoring works today
roadmap/ upcoming versions (from lib/roadmap.ts)
changelog/ what's shipped (from lib/changelog.ts)
api/ repos + repo/[id] JSON routes
robots.ts /robots.txt — allows "/", blocks "/api/"
sitemap.ts /sitemap.xml — static routes + every repo
globals.css Tailwind import + @theme tokens
components/ React components (Tailwind-styled)
lib/
scoring/ signals, weights, scorer — pure, no I/O outside the cloned tree
clients/ git clone, host API
constants/ thresholds, host labels, sort keys
utils/ format + score-tier helpers
db.ts better-sqlite3 schema + queries (all SQL lives here)
version.ts APP_NAME, APP_VERSION, APP_URL, APP_DESCRIPTION, REPO_URL
changelog.ts / roadmap.ts
scripts/ CLI entries run via `tsx` (Node) — score, seed, init-db
tasks/ Per-version task breakdown (agent-readable)
public/ Static assets — demo/ screenshots used by the README + OG image
.claude/ settings.json, hooks/ (Stop guard), skills/
data/ rank.db (committed — shipped as a build artifact; rescoring runs locally)
tmp-clones/ Shallow clones (gitignored)
AGENTS.md Agent instructions (source of truth)
CONTRIBUTING.md Human-contributor guide — PR workflow, review bar
CLAUDE.md Pointer → AGENTS.md
LICENSE MITSee /roadmap in the running app or the per-version tasks/ folders for the full picture.
- 0.2.0 — complete the dogfood: tests for scorer / signals / URL parser + self-score ≥ 90 on this repo's own rubric.
- 0.3.0 — real per-model weights: benchmark harness actually runs agents on scoped tasks per repo; current illustrative weights get replaced with measured ones.
- 0.4.0 — ecosystem integration: badge endpoint for READMEs, GitHub Action that comments score delta on PRs, webhook-driven rescoring, OAuth-gated opt-out / claim flow for maintainers.
- 0.5.0 — alternative recommender: "repo Y does the same thing and ranks higher for your model."
- 0.6.0 — package-registry overlay: rank npm / PyPI / Cargo packages by source-repo friendliness.
- 0.7.0 — history-aware signals: extend the scorer with maintenance recency, commit velocity, and contributor activity — closing the gap that the shallow clone leaves today.
- 1.0.0 — production stability: Postgres migration for concurrent writers; from here on, breaking API changes require a MAJOR bump.
- 1.1.0 — at-scale GitHub indexing: flip from a curated seed list to an auto-discovered crawl (GitHub search + trending + submissions). Target 10k repos on first delivery.
The score isn't defensible. The evaluation harness + the cross-forge dataset + the maintainer network are. Open-source the harness, publish the weights, keep the data + dashboard + badge network as the product.
MIT — see LICENSE.
See CONTRIBUTING.md for setup, branch/commit style, the PR description template, and the changelog discipline.

