Turn any GitHub repo into a navigable, shareable website and answer "what did this commit mean?" — for humans and AI agents.
CodeScope indexes a codebase once, then keeps that understanding live — incremental re-index on every push, intent-level diffs ("the validate() function lost its rate-limit check"), and an MCP server so Claude Code, Cursor, or Codex can read the same understanding you do.
Bring-your-own keys. Your Anthropic / OpenAI / GitHub credits. Nothing leaves your Postgres.
# 1. Postgres + pgvector running locally
docker compose up -d
npx codescope-ai --database-url postgres://codescope:codescope@localhost:5433/codescope
# 2. Or with a CLI install:
npm i -g codescope-ai
codescope-ai ./path/to/repo
codescope-ai vercel/next.js # GitHub repo
codescope-ai https://github.com/vercel/next.jsThe package is published as codescope-ai on npm (the unscoped
codescope was rejected by npm's name-similarity policy). The product
itself is still called CodeScope.
The CLI clones (or re-uses a local checkout), runs the indexer, starts a
Next.js dev server on the first free port from 5173, and opens your
browser at /<owner>/<name>. Re-running the same repo is incremental —
only files whose content_hash changed get re-walked.
- Overview — Sonnet-synthesised identity, top-5 files by importance, layer breakdown, recent commits, PDF export.
- Architecture — toggle between collapsible Tree view and a
Hanging force graph (
react-force-graph-2d). Click any node for imports / exports / callers / callees + AI explanation. - Tours — pre-generated AI walkthroughs grounded in the real call graph. Ask any question, get a step-by-step answer with code highlights.
After re-indexing, the Recent changes drawer shows:
- Sonnet 2-sentence top-line summary
- Per-layer breakdown (added / modified / deleted counts)
- Files grouped by Added / Modified / Renamed / Deleted with Haiku
was X / now Ynarration - ⚠ Regression badge when Haiku flags a guard removal, scope narrowing, capability loss, or vulnerability surface
- Click any file row → FileViewer opens directly into a unified-diff view with green/red line tinting
Subscribe a repo with one click and CodeScope re-indexes on every push.
HMAC-verified, signed-POST handler, idempotent on X-GitHub-Delivery
retries. Live delivery feed in the UI.
- Tiered models — Sonnet for synthesis (1 call/repo) + top-importance files; Haiku for the long tail. ~10× cheaper than all-Sonnet.
- Globally-keyed caches —
explain_cacheandembedding_cacheare content-addressable. Two repos with identical content (forks, vendored deps) share one cached answer. - Re-track idempotency — re-asking "what did commit X do?" returns
the cached
index_runsdiff for free.
codescope-ai # index current dir + serve
codescope-ai <path> # index local checkout + serve
codescope-ai <owner>/<name> # clone + index + serve
codescope-ai https://github.com/… # same
codescope-ai track <repo>[@<sha>] # pin to a commit, print intent-diff (no server)Flags (also work via env vars — see --help):
| Flag | Env var | Purpose |
|---|---|---|
--anthropic-key |
ANTHROPIC_API_KEY |
Tours, intent-diff narrations, regression flags, line explanations |
--openai-key |
OPENAI_API_KEY |
Semantic search via text-embedding-3-small. Without it, search degrades to keyword LIKE |
--github-token |
GITHUB_TOKEN |
Bumps clone rate limit 60/hr → 5000/hr; required for private repos |
--database-url |
DATABASE_URL |
Postgres connection string |
-p, --port |
— | Dev server port (default: first free port from 5173) |
--no-open |
— | Skip opening browser |
--no-index |
— | Skip indexing, just serve |
CodeScope ships with a hosted embedding gateway at
codescope-embed.harsh-185.workers.dev that proxies to Voyage AI's voyage-code-3
— the highest-scoring code embedding model available. Authentication
piggybacks on your Anthropic key: paste it into CodeScope, the gateway
verifies it's real, and you get OpenAI-or-better-quality embeddings
without a second API key.
| What you have | Embedding provider | Quality vs OpenAI text-embedding-3-small |
|---|---|---|
| Anthropic key (default) | voyage-code-3 via codescope-embed.harsh-185.workers.dev |
clearly better (~68 vs 51.6 MTEB code retrieval) |
Anthropic + CODESCOPE_EMBED_PROVIDER=openai |
OpenAI text-embedding-3-small | baseline |
| OpenAI only (no Anthropic) | OpenAI text-embedding-3-small | baseline (no AI tours/intent-diff) |
| Neither | none — search degrades to FTS over LLM purposes if Anthropic is set | usable but limited |
Free tier: the hosted gateway gives every validated Anthropic key
10,000 embeddings/day for free. A typical user indexing 5–10
repos/week never approaches the cap. Above that, self-host the gateway
in 5 minutes (HOSTING.md) — it's a 150-line
Cloudflare Worker open-sourced in this repo at cloudflare-worker/.
Privacy: code text passes through codescope-embed.harsh-185.workers.dev on its way
to Voyage. The gateway hashes your Anthropic key (sha256) for rate
limiting; never stores raw keys; never logs request bodies. For full
privacy: self-host the worker (point CODESCOPE_EMBED_ENDPOINT at
your URL), use CODESCOPE_EMBED_PROVIDER=openai (your own OpenAI
account), or skip embeddings entirely (FTS-only search still works).
Just run codescope — on first run with missing keys, the CLI
prompts you interactively:
Some API keys aren't set yet. Want to enter them now?
All optional. Press Enter to skip any. Saved to ~/.codescope/keys.env (0600).
Anthropic (Claude) API key
Unlocks: AI tours, intent-diff narration, regression flagging, line explanations
Get one: https://console.anthropic.com/settings/keys
Paste ANTHROPIC_API_KEY (Enter to skip): █
Paste the key once → saved to ~/.codescope/keys.env (0600 perms,
only readable by you) → all subsequent runs skip the prompt.
Precedence (highest wins):
- CLI flag (
--anthropic-key sk-ant-…) - Environment variable (
ANTHROPIC_API_KEY) ~/.codescope/keys.env(saved from a previous prompt)- Interactive prompt — TTY only, never in CI
Helpful flags:
--no-prompt— skip the prompt entirely (CI-safe)--reset-keys— wipe~/.codescope/keys.envand exit
The MCP server never prompts (stdin/stdout are reserved for
JSON-RPC) but does read ~/.codescope/keys.env, so once you've
saved keys via the CLI, the MCP server picks them up automatically.
Without any LLM key, indexing still works — heuristic-only purposes, keyword search, no AI tours. The startup banner shows exactly what's wired.
CodeScope uses the standard API key flow. Some notes on subscriptions:
- Anthropic Pro / Max — generate a Claude OAuth API key from the same console page. Calls bill against your subscription rather than separate pay-as-you-go credits. Paste it into the prompt like any other key.
- OpenAI ChatGPT Plus / Team — these subscriptions are separate
from the API; you'll still need a pay-as-you-go API key.
Used only for embeddings (
text-embedding-3-small≈ $0.02 per 1M tokens — indexing 1000 files costs cents). - Why we don't auto-detect Claude Code / Codex auth — those tools
store OAuth tokens in
~/.claude/and~/.codex/respectively, but reading another tool's credential store is fragile + may violate ToS. Cleaner: each tool generates its own key from the same console.
CodeScope exposes the same understanding to coding agents over a stdio MCP server. 11 tools that fit the agent's natural workflow:
| Tool | Use case |
|---|---|
list_repos |
Discover what's been indexed |
get_overview |
Repo identity + top files + layers — orientation before editing |
list_layers |
Files grouped by responsibility |
search_files |
pgvector cosine semantic search |
get_file |
Content + imports + exports + callers + callees |
explain_lines |
Claude line-range explanation |
list_tours / get_tour |
Guided walkthroughs |
index_local_repo |
Bootstrap a fresh checkout |
get_intent_diff |
Latest run's per-file was X / now Y + regression flags + summary |
track_commit |
Pin to an arbitrary SHA, return its intent-diff |
~/.claude.json (global) or .mcp.json (project):
{
"mcpServers": {
"codescope": {
"command": "npx",
"args": ["-y", "-p", "codescope-ai", "codescope-ai-mcp"],
"env": {
"DATABASE_URL": "postgres://codescope:codescope@localhost:5433/codescope",
"ANTHROPIC_API_KEY": "sk-ant-…",
"OPENAI_API_KEY": "sk-…"
}
}
}
}~/.cursor/mcp.json:
{
"mcpServers": {
"codescope": {
"command": "npx",
"args": ["-y", "-p", "codescope-ai", "codescope-ai-mcp"],
"env": { "...same as above..." }
}
}
}- Agent edits files + commits
- Calls
track_commit local/repo(orindex_local_repo) - Reads back
get_intent_diffto confirm what its own changes meant - Decides whether the diff matches intent — and uses the 2-sentence summary + per-file drift narrations as PR description fodder
This is the loop CodeScope was designed for. Agents stop guessing what their edits did to the system; they read it back in plain English.
# After installing codescope-ai + Postgres
codescope-ai --database-url postgres://codescope:codescope@localhost:5433/codescope --no-index --no-open &
# (or just have CodeScope's Next.js dev server running)
# In another terminal:
git clone https://github.com/harsh-185/CodeScope && cd CodeScope
npm install
npm run demo:setup # scaffolds 5-commit URL-shortener exercising every feature
npm run demo:index # indexes the demo repo + opens browserThen walk through docs/DEMO.md — 12 tours covering
indexing, intent-diff, regression flagging, rename detection, line-level
overlay, webhook auto-sync, MCP, and the global cache hits in action.
clone/walk → parse (Babel AST) → score → embed (OpenAI) → understand (Haiku per file)
→ analyze (Sonnet synthesis) → ready
│
├── Postgres + pgvector (per-repo tables + global caches)
├── Next.js 15 + React 19 (Overview / Architecture / Tours)
└── MCP stdio server (11 tools) ──→ Claude Code, Cursor, Codex
On push:
GitHub webhook ──HMAC verify──> trackCommit ──> incremental re-index ──> intent-diff
└─→ regression flags
See docs/BUILD.md for the full architecture + file-by-
file module map.
| Concept | Tables |
|---|---|
| Per-repo | repos, layers, files, file_edges, file_embeddings, functions, function_calls, tours, tour_steps, commits |
| Audit + diff intent | index_runs, file_purpose_history |
| Auto-sync | webhook_subscriptions, webhook_events |
| Global caches | explain_cache (content-keyed), embedding_cache (content-hash PK) |
- Frontend — Next.js 15 (App Router) · React 19 · TypeScript · bespoke CSS (no Tailwind)
- Database — Postgres 16 + pgvector · Drizzle ORM · drizzle-kit migrations
- Indexer — Babel AST · isomorphic-git · token-bucket-style content-hash dedup
- LLM — Anthropic SDK (Claude Sonnet 4.6 + Haiku 4.5) · OpenAI text-embedding-3-small
- MCP —
@modelcontextprotocol/sdkover stdio - Diff —
diffpackage for line-level rendering, custom shingle/Jaccard for rename detection
git clone https://github.com/harsh-185/CodeScope
cd CodeScope
npm install
docker compose up -d
cp .env.example .env # set ANTHROPIC_API_KEY + OPENAI_API_KEY
npm run db:migrate
npm run dev # http://localhost:5173Useful scripts:
npm run cli # dev mode CLI (no install)
npm run mcp # dev mode MCP server (no install)
npm run typecheck # tsc --noEmit
npm run db:studio # Drizzle web UI for the databaseSee docs/PUBLISHING.md for the full walkthrough
of publishing to npm + submitting to the MCP server registry.
Shipped: Phase 0 (foundation) · 1 (incremental + audit) · 2 (intent-diff) · 2.5 (commit tracker + cache globalization) · 3.1 (auto- sync) · 3.2 (line-level diff) · 3.3 (regressions) · 3.4 (rename detection)
Next (Phase 4 — bigger swings, independent):
- 4.1 Multi-repo auth (next-auth + per-user repo access)
- 4.2 Function-level diff (track function identity across edits)
- 4.3 VS Code extension (sidebar webview + jump-to-line)
- 4.4 Browser overlay (chrome extension on github.com)
- 4.5 Streaming SSE for
explain_lines - 4.6 Slack / Linear regression notifications
See docs/FEATURES.md for the canonical Done/Next
list and docs/BUILD.md §7 for the per-feature
implementation plan.
MIT — see LICENSE.
PRs welcome. Run npm run typecheck + npm run lint before opening.
For larger changes, please open an issue first to discuss the design —
docs/BUILD.md §8 has the "how to pick up the next
chunk" recipe.