Agent Friendly Code

A public dashboard that ranks open-source repos by how friendly they are for AI coding agents — per model.

Next.js 16 + SQLite (better-sqlite3), styled with Tailwind CSS 4. Spans GitHub, GitLab, and Bitbucket out of the box. Current release: 0.1.0.

Dark mode

Follows prefers-color-scheme automatically — same tokens, different values.

The idea

AI coding agents — Claude Code, Cursor, Devin, GPT-5 Codex — succeed dramatically more often on some repos than others. The difference is rarely the agent; it's the repo. A codebase with fast tests, a clear AGENTS.md, a Makefile, and CI is a massively different environment than one without.

Goal: a public leaderboard where anyone can look up a repo and see:

How agent-friendly is it overall?
How friendly is it for my agent? (Claude Code weights AGENTS.md heavily; Devin cares about CI + reproducible envs; Cursor prefers strong types + a good README.)
Why does it rank there, and what would it take to improve for my agent? — top-3 gaps ranked by score-gain.

Two audiences:

Maintainers — a ranked checklist of what to fix to make their repo friendlier to agents.
Agent users — when picking between forks, packages, or alternatives, agent-friendliness is a real dependency-choice signal.

Prior art + what we do differently

Project	What it does	What we do differently
Factory.ai Agent Readiness	Single-tenant scanner: you point it at your repo and get a score with auto-fix PRs. 8 pillars, 5 maturity levels.	Public + cross-forge + per-model. Factory rates your own repo in isolation — we rank across repos, on GitHub/GitLab/Bitbucket, and the ranking changes based on which agent you care about.
`kodustech/agent-readiness`	OSS alternative to Factory — static checks, local scan.	We're a public ranking service, not a local scanner. Scoring logic is similar in spirit; the product is the leaderboard and the per-model lens.
`jpequegn/agent-readiness-score`	Explicitly "inspired by Factory.ai" — OSS framework to measure codebase readiness.	Same delta as above — single-tenant vs. public + per-model.
`viktor-silakov/ai-ready`	39 checks, 7 pillars, 10+ languages. Scanner.	Same delta.
`ambient-code/agentready`	Assesses git repos against evidence-based attributes.	Same delta.
Cloudflare Agent Readiness	Rates websites for agent consumption.	Wrong object — we rate code repos.
Fern Agent Score	Public leaderboard rating documentation sites for AI-readiness.	Closest in shape — public + leaderboard — but scores docs, not code.
Clarvia	Scoring platform for MCP servers (Agent Experience Optimization).	Adjacent — rates tools, not repos.
SWE-Bench (Verified / Pro), GitTaskBench, FeatureBench, HAL, PR Arena	Rank agents on a fixed set of repos.	We want the transpose: rank repos per agent. Our measurement story (once the benchmark harness lands) looks a lot like these, with the axes flipped.
GitHub Trending, ossinsight	Popularity / activity rankings.	Stars ≠ agent-friendliness.

Our differentiators, in one line: cross-forge, public, per-model, and explainable — every score decomposes to signals, and every repo page shows what to improve next for the selected model.

Honest product concerns

Not pretending the idea is free of risk:

Per-model scoring is the hardest part and the easiest to fake. Today the weights are illustrative. Real "Claude ranks this higher than GPT-5" requires actually running each agent on each repo. That's tasks/0.3.0/01-benchmark-harness.md.
Factory.ai is already in this space. Differentiation has to stay sharp.
Public-shaming risk. Ranking #47,823 without consent invites angry maintainers. Planned via tasks/0.4.0/04-opt-out-claim-flow.md.
Score gaming. Once public, people add boilerplate AGENTS.md to pass the rubric without being useful. Dynamic (actually-run-an-agent) checks are the counter — see benchmark harness.
Freshness. Scores decay with every push. Webhook-driven rescoring is roadmap.

See /methodology in the running app for a candid walkthrough of what's measured today and what isn't.

Security posture (FAQ)

Short answer: low risk. The app:

Only reads files after a shallow clone; never executes anything from the cloned tree (no npm install, no post-clone hooks).
Uses --depth 1 --single-branch and never clones submodules.
Runs all SQL via prepared statements.
Renders through React (auto-escaping); no dangerouslySetInnerHTML.
Has no auth and no writable API endpoints — read-only dashboard.

Operational concerns for a public launch (not code-level security):

Disk quotas for tmp-clones/.
Rate limiting the public API.
Sandbox the cloner in a container (future-proofing against hypothetical git CVEs).

Auth and per-maintainer controls land with the opt-out / claim flow in v0.4.0.

Quickstart

bun install
bun run prepare-hooks  # once — installs lefthook pre-commit (Biome + tsc)
bun run seed           # score the curated set (~28 repos) across GH / GL / BB
bun run dev            # http://localhost:3000

Score a single repo:

bun run score https://github.com/vercel/next.js
bun run score https://gitlab.com/gitlab-org/cli
bun run score https://bitbucket.org/snakeyaml/snakeyaml
bun run score /path/to/local/checkout

Optional: GITHUB_TOKEN / GITLAB_TOKEN in env to raise API rate limits.

Versioning

lib/version.ts and package.json carry the current release number (currently 0.1.0). Bumps happen only when we actually cut a release — never when merging intermediate work. The version pill in the header surfaces the number directly; /changelog lists what each release shipped.

Stack & rationale

Choice	Why	When we'd revisit
Next.js 16 (App Router)	Future features (filters, charts, auth, diff views) are React's territory. File-based routing + API routes replace hand-rolled HTTP cleanly.	Unlikely. The core scorer is stack-agnostic, only `app/` depends on this.
Node runtime (with `tsx` for CLI scripts)	Matches Vercel's serverless runtime — no Bun-only imports in prod. Bun still works locally as a fast package manager.	Unlikely — only if the deployment target changes.
Tailwind CSS 4	Zero-config via `@theme` tokens, no `tailwind.config.*` needed. Tight bundle output.	Would only leave for something with a stronger design-system story.
`better-sqlite3`	Single file, inspectable, zero ops overhead. Node-native so Vercel's serverless runtime can load it directly.	Postgres when concurrent writers / access control arrive (`tasks/1.0.0/01-postgres-migration.md`).
Server components + links, no client JS	Cheap, fast, SEO-friendly.	When a feature genuinely needs interactivity — e.g. live filter combinators.
Shallow git clones (`--depth 1 --single-branch`)	Bandwidth + speed. Current signals don't need history.	History-aware signals → host APIs or `--filter=blob:none` partial clones.
Exact-pinned deps	Deterministic scoring across environments.	Never.
One file per signal	Each signal is a small, independent concern — keeps `git log` and code review focused.	When we bundle signals into dynamic checks (then the unit becomes the bundle).

Why do we clone at all (instead of host APIs)?

Static signals need to read file contents (AGENTS.md length, pyproject.toml [tool.X] sections, package.json scripts count) — not just existence.
One clone is faster than N API calls for content-heavy scoring, and respects rate limits.
Any real version of this dashboard needs dynamic signals (run tests, run an agent). Those absolutely need code on disk.

Layout

app/          Next.js App Router — pages + API + SEO
  layout.tsx       root layout, root metadata (OG + Twitter cards)
  page.tsx         leaderboard
  repo/[id]/       repo detail (includes generateMetadata for shareable titles)
  methodology/     how scoring works today
  roadmap/         upcoming versions (from lib/roadmap.ts)
  changelog/       what's shipped (from lib/changelog.ts)
  api/             repos + repo/[id] JSON routes
  robots.ts        /robots.txt — allows "/", blocks "/api/"
  sitemap.ts       /sitemap.xml — static routes + every repo
  globals.css      Tailwind import + @theme tokens
components/   React components (Tailwind-styled)
lib/
  scoring/    signals, weights, scorer — pure, no I/O outside the cloned tree
  clients/    git clone, host API
  constants/  thresholds, host labels, sort keys
  utils/      format + score-tier helpers
  db.ts       better-sqlite3 schema + queries (all SQL lives here)
  version.ts  APP_NAME, APP_VERSION, APP_URL, APP_DESCRIPTION, REPO_URL
  changelog.ts / roadmap.ts
scripts/      CLI entries run via `tsx` (Node) — score, seed, init-db
tasks/        Per-version task breakdown (agent-readable)
public/       Static assets — demo/ screenshots used by the README + OG image
.claude/      settings.json, hooks/ (Stop guard), skills/
data/         rank.db (committed — shipped as a build artifact; rescoring runs locally)
tmp-clones/   Shallow clones (gitignored)
AGENTS.md     Agent instructions (source of truth)
CONTRIBUTING.md  Human-contributor guide — PR workflow, review bar
CLAUDE.md     Pointer → AGENTS.md
LICENSE       MIT

Roadmap (high-level)

See /roadmap in the running app or the per-version tasks/ folders for the full picture.

0.2.0 — complete the dogfood: tests for scorer / signals / URL parser + self-score ≥ 90 on this repo's own rubric.
0.3.0 — real per-model weights: benchmark harness actually runs agents on scoped tasks per repo; current illustrative weights get replaced with measured ones.
0.4.0 — ecosystem integration: badge endpoint for READMEs, GitHub Action that comments score delta on PRs, webhook-driven rescoring, OAuth-gated opt-out / claim flow for maintainers.
0.5.0 — alternative recommender: "repo Y does the same thing and ranks higher for your model."
0.6.0 — package-registry overlay: rank npm / PyPI / Cargo packages by source-repo friendliness.
0.7.0 — history-aware signals: extend the scorer with maintenance recency, commit velocity, and contributor activity — closing the gap that the shallow clone leaves today.
1.0.0 — production stability: Postgres migration for concurrent writers; from here on, breaking API changes require a MAJOR bump.
1.1.0 — at-scale GitHub indexing: flip from a curated seed list to an auto-discovered crawl (GitHub search + trending + submissions). Target 10k repos on first delivery.

Defensibility

The score isn't defensible. The evaluation harness + the cross-forge dataset + the maintainer network are. Open-source the harness, publish the weights, keep the data + dashboard + badge network as the product.

License

MIT — see LICENSE.

Contributing

See CONTRIBUTING.md for setup, branch/commit style, the PR description template, and the changelog discipline.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.claude		.claude
.github		.github
app		app
components		components
data		data
lib		lib
public/demo		public/demo
scripts		scripts
tasks		tasks
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
lefthook.yml		lefthook.yml
next.config.ts		next.config.ts
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Friendly Code

The idea

Prior art + what we do differently

Honest product concerns

Security posture (FAQ)

Quickstart

Versioning

Stack & rationale

Why do we clone at all (instead of host APIs)?

Layout

Roadmap (high-level)

Defensibility

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Friendly Code

The idea

Prior art + what we do differently

Honest product concerns

Security posture (FAQ)

Quickstart

Versioning

Stack & rationale

Why do we clone at all (instead of host APIs)?

Layout

Roadmap (high-level)

Defensibility

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages