Skip to content

hsnice16/agent-friendly-code

Repository files navigation

Agent Friendly Code

Release License: MIT Next.js 16 Node ≥20.9

A public dashboard that ranks open-source repos by how friendly they are for AI coding agents — per model.

Next.js 16 + SQLite (better-sqlite3), styled with Tailwind CSS 4. Spans GitHub, GitLab, and Bitbucket out of the box. Current release: 0.1.0.

Agent Friendly Code — leaderboard

Dark mode

Agent Friendly Code — leaderboard (dark theme)

Follows prefers-color-scheme automatically — same tokens, different values.


The idea

AI coding agents — Claude Code, Cursor, Devin, GPT-5 Codex — succeed dramatically more often on some repos than others. The difference is rarely the agent; it's the repo. A codebase with fast tests, a clear AGENTS.md, a Makefile, and CI is a massively different environment than one without.

Goal: a public leaderboard where anyone can look up a repo and see:

  1. How agent-friendly is it overall?
  2. How friendly is it for my agent? (Claude Code weights AGENTS.md heavily; Devin cares about CI + reproducible envs; Cursor prefers strong types + a good README.)
  3. Why does it rank there, and what would it take to improve for my agent? — top-3 gaps ranked by score-gain.

Two audiences:

  • Maintainers — a ranked checklist of what to fix to make their repo friendlier to agents.
  • Agent users — when picking between forks, packages, or alternatives, agent-friendliness is a real dependency-choice signal.

Prior art + what we do differently

Project What it does What we do differently
Factory.ai Agent Readiness Single-tenant scanner: you point it at your repo and get a score with auto-fix PRs. 8 pillars, 5 maturity levels. Public + cross-forge + per-model. Factory rates your own repo in isolation — we rank across repos, on GitHub/GitLab/Bitbucket, and the ranking changes based on which agent you care about.
kodustech/agent-readiness OSS alternative to Factory — static checks, local scan. We're a public ranking service, not a local scanner. Scoring logic is similar in spirit; the product is the leaderboard and the per-model lens.
jpequegn/agent-readiness-score Explicitly "inspired by Factory.ai" — OSS framework to measure codebase readiness. Same delta as above — single-tenant vs. public + per-model.
viktor-silakov/ai-ready 39 checks, 7 pillars, 10+ languages. Scanner. Same delta.
ambient-code/agentready Assesses git repos against evidence-based attributes. Same delta.
Cloudflare Agent Readiness Rates websites for agent consumption. Wrong object — we rate code repos.
Fern Agent Score Public leaderboard rating documentation sites for AI-readiness. Closest in shape — public + leaderboard — but scores docs, not code.
Clarvia Scoring platform for MCP servers (Agent Experience Optimization). Adjacent — rates tools, not repos.
SWE-Bench (Verified / Pro), GitTaskBench, FeatureBench, HAL, PR Arena Rank agents on a fixed set of repos. We want the transpose: rank repos per agent. Our measurement story (once the benchmark harness lands) looks a lot like these, with the axes flipped.
GitHub Trending, ossinsight Popularity / activity rankings. Stars ≠ agent-friendliness.

Our differentiators, in one line: cross-forge, public, per-model, and explainable — every score decomposes to signals, and every repo page shows what to improve next for the selected model.

Honest product concerns

Not pretending the idea is free of risk:

  • Per-model scoring is the hardest part and the easiest to fake. Today the weights are illustrative. Real "Claude ranks this higher than GPT-5" requires actually running each agent on each repo. That's tasks/0.3.0/01-benchmark-harness.md.
  • Factory.ai is already in this space. Differentiation has to stay sharp.
  • Public-shaming risk. Ranking #47,823 without consent invites angry maintainers. Planned via tasks/0.4.0/04-opt-out-claim-flow.md.
  • Score gaming. Once public, people add boilerplate AGENTS.md to pass the rubric without being useful. Dynamic (actually-run-an-agent) checks are the counter — see benchmark harness.
  • Freshness. Scores decay with every push. Webhook-driven rescoring is roadmap.

See /methodology in the running app for a candid walkthrough of what's measured today and what isn't.

Security posture (FAQ)

Short answer: low risk. The app:

  • Only reads files after a shallow clone; never executes anything from the cloned tree (no npm install, no post-clone hooks).
  • Uses --depth 1 --single-branch and never clones submodules.
  • Runs all SQL via prepared statements.
  • Renders through React (auto-escaping); no dangerouslySetInnerHTML.
  • Has no auth and no writable API endpoints — read-only dashboard.

Operational concerns for a public launch (not code-level security):

  • Disk quotas for tmp-clones/.
  • Rate limiting the public API.
  • Sandbox the cloner in a container (future-proofing against hypothetical git CVEs).

Auth and per-maintainer controls land with the opt-out / claim flow in v0.4.0.

Quickstart

bun install
bun run prepare-hooks  # once — installs lefthook pre-commit (Biome + tsc)
bun run seed           # score the curated set (~28 repos) across GH / GL / BB
bun run dev            # http://localhost:3000

Score a single repo:

bun run score https://github.com/vercel/next.js
bun run score https://gitlab.com/gitlab-org/cli
bun run score https://bitbucket.org/snakeyaml/snakeyaml
bun run score /path/to/local/checkout

Optional: GITHUB_TOKEN / GITLAB_TOKEN in env to raise API rate limits.

Versioning

lib/version.ts and package.json carry the current release number (currently 0.1.0). Bumps happen only when we actually cut a release — never when merging intermediate work. The version pill in the header surfaces the number directly; /changelog lists what each release shipped.

Stack & rationale

Choice Why When we'd revisit
Next.js 16 (App Router) Future features (filters, charts, auth, diff views) are React's territory. File-based routing + API routes replace hand-rolled HTTP cleanly. Unlikely. The core scorer is stack-agnostic, only app/ depends on this.
Node runtime (with tsx for CLI scripts) Matches Vercel's serverless runtime — no Bun-only imports in prod. Bun still works locally as a fast package manager. Unlikely — only if the deployment target changes.
Tailwind CSS 4 Zero-config via @theme tokens, no tailwind.config.* needed. Tight bundle output. Would only leave for something with a stronger design-system story.
better-sqlite3 Single file, inspectable, zero ops overhead. Node-native so Vercel's serverless runtime can load it directly. Postgres when concurrent writers / access control arrive (tasks/1.0.0/01-postgres-migration.md).
Server components + links, no client JS Cheap, fast, SEO-friendly. When a feature genuinely needs interactivity — e.g. live filter combinators.
Shallow git clones (--depth 1 --single-branch) Bandwidth + speed. Current signals don't need history. History-aware signals → host APIs or --filter=blob:none partial clones.
Exact-pinned deps Deterministic scoring across environments. Never.
One file per signal Each signal is a small, independent concern — keeps git log and code review focused. When we bundle signals into dynamic checks (then the unit becomes the bundle).

Why do we clone at all (instead of host APIs)?

  • Static signals need to read file contents (AGENTS.md length, pyproject.toml [tool.X] sections, package.json scripts count) — not just existence.
  • One clone is faster than N API calls for content-heavy scoring, and respects rate limits.
  • Any real version of this dashboard needs dynamic signals (run tests, run an agent). Those absolutely need code on disk.

Layout

app/          Next.js App Router — pages + API + SEO
  layout.tsx       root layout, root metadata (OG + Twitter cards)
  page.tsx         leaderboard
  repo/[id]/       repo detail (includes generateMetadata for shareable titles)
  methodology/     how scoring works today
  roadmap/         upcoming versions (from lib/roadmap.ts)
  changelog/       what's shipped (from lib/changelog.ts)
  api/             repos + repo/[id] JSON routes
  robots.ts        /robots.txt — allows "/", blocks "/api/"
  sitemap.ts       /sitemap.xml — static routes + every repo
  globals.css      Tailwind import + @theme tokens
components/   React components (Tailwind-styled)
lib/
  scoring/    signals, weights, scorer — pure, no I/O outside the cloned tree
  clients/    git clone, host API
  constants/  thresholds, host labels, sort keys
  utils/      format + score-tier helpers
  db.ts       better-sqlite3 schema + queries (all SQL lives here)
  version.ts  APP_NAME, APP_VERSION, APP_URL, APP_DESCRIPTION, REPO_URL
  changelog.ts / roadmap.ts
scripts/      CLI entries run via `tsx` (Node) — score, seed, init-db
tasks/        Per-version task breakdown (agent-readable)
public/       Static assets — demo/ screenshots used by the README + OG image
.claude/      settings.json, hooks/ (Stop guard), skills/
data/         rank.db (committed — shipped as a build artifact; rescoring runs locally)
tmp-clones/   Shallow clones (gitignored)
AGENTS.md     Agent instructions (source of truth)
CONTRIBUTING.md  Human-contributor guide — PR workflow, review bar
CLAUDE.md     Pointer → AGENTS.md
LICENSE       MIT

Roadmap (high-level)

See /roadmap in the running app or the per-version tasks/ folders for the full picture.

  • 0.2.0 — complete the dogfood: tests for scorer / signals / URL parser + self-score ≥ 90 on this repo's own rubric.
  • 0.3.0 — real per-model weights: benchmark harness actually runs agents on scoped tasks per repo; current illustrative weights get replaced with measured ones.
  • 0.4.0 — ecosystem integration: badge endpoint for READMEs, GitHub Action that comments score delta on PRs, webhook-driven rescoring, OAuth-gated opt-out / claim flow for maintainers.
  • 0.5.0 — alternative recommender: "repo Y does the same thing and ranks higher for your model."
  • 0.6.0 — package-registry overlay: rank npm / PyPI / Cargo packages by source-repo friendliness.
  • 0.7.0 — history-aware signals: extend the scorer with maintenance recency, commit velocity, and contributor activity — closing the gap that the shallow clone leaves today.
  • 1.0.0 — production stability: Postgres migration for concurrent writers; from here on, breaking API changes require a MAJOR bump.
  • 1.1.0 — at-scale GitHub indexing: flip from a curated seed list to an auto-discovered crawl (GitHub search + trending + submissions). Target 10k repos on first delivery.

Defensibility

The score isn't defensible. The evaluation harness + the cross-forge dataset + the maintainer network are. Open-source the harness, publish the weights, keep the data + dashboard + badge network as the product.

License

MIT — see LICENSE.

Contributing

See CONTRIBUTING.md for setup, branch/commit style, the PR description template, and the changelog discipline.

About

A public dashboard that ranks open-source repos by how friendly they are for AI coding agents — per model.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors