CodeScope

Turn any GitHub repo into a navigable, shareable website and answer "what did this commit mean?" — for humans and AI agents.

CodeScope indexes a codebase once, then keeps that understanding live — incremental re-index on every push, intent-level diffs ("the validate() function lost its rate-limit check"), and an MCP server so Claude Code, Cursor, or Codex can read the same understanding you do.

Bring-your-own keys. Your Anthropic / OpenAI / GitHub credits. Nothing leaves your Postgres.

Quick start

# 1. Postgres + pgvector running locally
docker compose up -d
npx codescope-ai --database-url postgres://codescope:codescope@localhost:5433/codescope

# 2. Or with a CLI install:
npm i -g codescope-ai
codescope-ai ./path/to/repo
codescope-ai vercel/next.js                  # GitHub repo
codescope-ai https://github.com/vercel/next.js

The package is published as codescope-ai on npm (the unscoped codescope was rejected by npm's name-similarity policy). The product itself is still called CodeScope.

The CLI clones (or re-uses a local checkout), runs the indexer, starts a Next.js dev server on the first free port from 5173, and opens your browser at /<owner>/<name>. Re-running the same repo is incremental — only files whose content_hash changed get re-walked.

What you get

Three views per indexed repo

Overview — Sonnet-synthesised identity, top-5 files by importance, layer breakdown, recent commits, PDF export.
Architecture — toggle between collapsible Tree view and a Hanging force graph (react-force-graph-2d). Click any node for imports / exports / callers / callees + AI explanation.
Tours — pre-generated AI walkthroughs grounded in the real call graph. Ask any question, get a step-by-step answer with code highlights.

Intent-diff for every commit

After re-indexing, the Recent changes drawer shows:

Sonnet 2-sentence top-line summary
Per-layer breakdown (added / modified / deleted counts)
Files grouped by Added / Modified / Renamed / Deleted with Haiku was X / now Y narration
⚠ Regression badge when Haiku flags a guard removal, scope narrowing, capability loss, or vulnerability surface
Click any file row → FileViewer opens directly into a unified-diff view with green/red line tinting

GitHub auto-sync

Subscribe a repo with one click and CodeScope re-indexes on every push. HMAC-verified, signed-POST handler, idempotent on X-GitHub-Delivery retries. Live delivery feed in the UI.

Cost-controlled, BYO keys

Tiered models — Sonnet for synthesis (1 call/repo) + top-importance files; Haiku for the long tail. ~10× cheaper than all-Sonnet.
Globally-keyed caches — explain_cache and embedding_cache are content-addressable. Two repos with identical content (forks, vendored deps) share one cached answer.
Re-track idempotency — re-asking "what did commit X do?" returns the cached index_runs diff for free.

CLI

codescope-ai                            # index current dir + serve
codescope-ai <path>                     # index local checkout + serve
codescope-ai <owner>/<name>             # clone + index + serve
codescope-ai https://github.com/…       # same
codescope-ai track <repo>[@<sha>]       # pin to a commit, print intent-diff (no server)

Flags (also work via env vars — see --help):

Flag	Env var	Purpose
`--anthropic-key`	`ANTHROPIC_API_KEY`	Tours, intent-diff narrations, regression flags, line explanations
`--openai-key`	`OPENAI_API_KEY`	Semantic search via `text-embedding-3-small`. Without it, search degrades to keyword `LIKE`
`--github-token`	`GITHUB_TOKEN`	Bumps clone rate limit 60/hr → 5000/hr; required for private repos
`--database-url`	`DATABASE_URL`	Postgres connection string
`-p`, `--port`	—	Dev server port (default: first free port from 5173)
`--no-open`	—	Skip opening browser
`--no-index`	—	Skip indexing, just serve

One key. Same quality. No second signup.

CodeScope ships with a hosted embedding gateway at codescope-embed.harsh-185.workers.dev that proxies to Voyage AI's voyage-code-3 — the highest-scoring code embedding model available. Authentication piggybacks on your Anthropic key: paste it into CodeScope, the gateway verifies it's real, and you get OpenAI-or-better-quality embeddings without a second API key.

What you have	Embedding provider	Quality vs OpenAI text-embedding-3-small
Anthropic key (default)	`voyage-code-3` via codescope-embed.harsh-185.workers.dev	clearly better (~68 vs 51.6 MTEB code retrieval)
Anthropic + `CODESCOPE_EMBED_PROVIDER=openai`	OpenAI text-embedding-3-small	baseline
OpenAI only (no Anthropic)	OpenAI text-embedding-3-small	baseline (no AI tours/intent-diff)
Neither	none — search degrades to FTS over LLM purposes if Anthropic is set	usable but limited

Free tier: the hosted gateway gives every validated Anthropic key 10,000 embeddings/day for free. A typical user indexing 5–10 repos/week never approaches the cap. Above that, self-host the gateway in 5 minutes (HOSTING.md) — it's a 150-line Cloudflare Worker open-sourced in this repo at cloudflare-worker/.

Privacy: code text passes through codescope-embed.harsh-185.workers.dev on its way to Voyage. The gateway hashes your Anthropic key (sha256) for rate limiting; never stores raw keys; never logs request bodies. For full privacy: self-host the worker (point CODESCOPE_EMBED_ENDPOINT at your URL), use CODESCOPE_EMBED_PROVIDER=openai (your own OpenAI account), or skip embeddings entirely (FTS-only search still works).

What if you don't have a key yet?

Just run codescope — on first run with missing keys, the CLI prompts you interactively:

  Some API keys aren't set yet. Want to enter them now?
  All optional. Press Enter to skip any. Saved to ~/.codescope/keys.env (0600).

  Anthropic (Claude) API key
    Unlocks: AI tours, intent-diff narration, regression flagging, line explanations
    Get one: https://console.anthropic.com/settings/keys
    Paste ANTHROPIC_API_KEY (Enter to skip): █

Paste the key once → saved to ~/.codescope/keys.env (0600 perms, only readable by you) → all subsequent runs skip the prompt.

Precedence (highest wins):

CLI flag (--anthropic-key sk-ant-…)
Environment variable (ANTHROPIC_API_KEY)
~/.codescope/keys.env (saved from a previous prompt)
Interactive prompt — TTY only, never in CI

Helpful flags:

--no-prompt — skip the prompt entirely (CI-safe)
--reset-keys — wipe ~/.codescope/keys.env and exit

The MCP server never prompts (stdin/stdout are reserved for JSON-RPC) but does read ~/.codescope/keys.env, so once you've saved keys via the CLI, the MCP server picks them up automatically.

Without any LLM key, indexing still works — heuristic-only purposes, keyword search, no AI tours. The startup banner shows exactly what's wired.

Pro / Max subscribers (Claude Code, ChatGPT Plus)

CodeScope uses the standard API key flow. Some notes on subscriptions:

Anthropic Pro / Max — generate a Claude OAuth API key from the same console page. Calls bill against your subscription rather than separate pay-as-you-go credits. Paste it into the prompt like any other key.
OpenAI ChatGPT Plus / Team — these subscriptions are separate from the API; you'll still need a pay-as-you-go API key. Used only for embeddings (text-embedding-3-small ≈ $0.02 per 1M tokens — indexing 1000 files costs cents).
Why we don't auto-detect Claude Code / Codex auth — those tools store OAuth tokens in ~/.claude/ and ~/.codex/ respectively, but reading another tool's credential store is fragile + may violate ToS. Cleaner: each tool generates its own key from the same console.

MCP server (Claude Code, Cursor, Codex)

CodeScope exposes the same understanding to coding agents over a stdio MCP server. 11 tools that fit the agent's natural workflow:

Tool	Use case
`list_repos`	Discover what's been indexed
`get_overview`	Repo identity + top files + layers — orientation before editing
`list_layers`	Files grouped by responsibility
`search_files`	pgvector cosine semantic search
`get_file`	Content + imports + exports + callers + callees
`explain_lines`	Claude line-range explanation
`list_tours` / `get_tour`	Guided walkthroughs
`index_local_repo`	Bootstrap a fresh checkout
`get_intent_diff`	Latest run's per-file `was X / now Y` + regression flags + summary
`track_commit`	Pin to an arbitrary SHA, return its intent-diff

Setup in Claude Code

~/.claude.json (global) or .mcp.json (project):

{
  "mcpServers": {
    "codescope": {
      "command": "npx",
      "args": ["-y", "-p", "codescope-ai", "codescope-ai-mcp"],
      "env": {
        "DATABASE_URL": "postgres://codescope:codescope@localhost:5433/codescope",
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…"
      }
    }
  }
}

Setup in Cursor

~/.cursor/mcp.json:

{
  "mcpServers": {
    "codescope": {
      "command": "npx",
      "args": ["-y", "-p", "codescope-ai", "codescope-ai-mcp"],
      "env": { "...same as above..." }
    }
  }
}

The agent's post-edit verification flow

Agent edits files + commits
Calls track_commit local/repo (or index_local_repo)
Reads back get_intent_diff to confirm what its own changes meant
Decides whether the diff matches intent — and uses the 2-sentence summary + per-file drift narrations as PR description fodder

This is the loop CodeScope was designed for. Agents stop guessing what their edits did to the system; they read it back in plain English.

Demo (10 minutes, exercises every feature)

# After installing codescope-ai + Postgres
codescope-ai --database-url postgres://codescope:codescope@localhost:5433/codescope --no-index --no-open &
# (or just have CodeScope's Next.js dev server running)

# In another terminal:
git clone https://github.com/harsh-185/CodeScope && cd CodeScope
npm install
npm run demo:setup     # scaffolds 5-commit URL-shortener exercising every feature
npm run demo:index     # indexes the demo repo + opens browser

Then walk through docs/DEMO.md — 12 tours covering indexing, intent-diff, regression flagging, rename detection, line-level overlay, webhook auto-sync, MCP, and the global cache hits in action.

Architecture (one-liner per layer)

clone/walk → parse (Babel AST) → score → embed (OpenAI) → understand (Haiku per file)
        → analyze (Sonnet synthesis) → ready
                  │
                  ├── Postgres + pgvector (per-repo tables + global caches)
                  ├── Next.js 15 + React 19 (Overview / Architecture / Tours)
                  └── MCP stdio server (11 tools) ──→ Claude Code, Cursor, Codex

On push:
  GitHub webhook ──HMAC verify──> trackCommit ──> incremental re-index ──> intent-diff
                                                                     └─→ regression flags

See docs/BUILD.md for the full architecture + file-by- file module map.

Storage

Concept	Tables
Per-repo	`repos`, `layers`, `files`, `file_edges`, `file_embeddings`, `functions`, `function_calls`, `tours`, `tour_steps`, `commits`
Audit + diff intent	`index_runs`, `file_purpose_history`
Auto-sync	`webhook_subscriptions`, `webhook_events`
Global caches	`explain_cache` (content-keyed), `embedding_cache` (content-hash PK)

Tech stack

Frontend — Next.js 15 (App Router) · React 19 · TypeScript · bespoke CSS (no Tailwind)
Database — Postgres 16 + pgvector · Drizzle ORM · drizzle-kit migrations
Indexer — Babel AST · isomorphic-git · token-bucket-style content-hash dedup
LLM — Anthropic SDK (Claude Sonnet 4.6 + Haiku 4.5) · OpenAI text-embedding-3-small
MCP — @modelcontextprotocol/sdk over stdio
Diff — diff package for line-level rendering, custom shingle/Jaccard for rename detection

Development

git clone https://github.com/harsh-185/CodeScope
cd CodeScope
npm install
docker compose up -d
cp .env.example .env  # set ANTHROPIC_API_KEY + OPENAI_API_KEY
npm run db:migrate
npm run dev           # http://localhost:5173

Useful scripts:

npm run cli           # dev mode CLI (no install)
npm run mcp           # dev mode MCP server (no install)
npm run typecheck     # tsc --noEmit
npm run db:studio     # Drizzle web UI for the database

Publishing

See docs/PUBLISHING.md for the full walkthrough of publishing to npm + submitting to the MCP server registry.

Roadmap

Shipped: Phase 0 (foundation) · 1 (incremental + audit) · 2 (intent-diff) · 2.5 (commit tracker + cache globalization) · 3.1 (auto- sync) · 3.2 (line-level diff) · 3.3 (regressions) · 3.4 (rename detection)

Next (Phase 4 — bigger swings, independent):

4.1 Multi-repo auth (next-auth + per-user repo access)
4.2 Function-level diff (track function identity across edits)
4.3 VS Code extension (sidebar webview + jump-to-line)
4.4 Browser overlay (chrome extension on github.com)
4.5 Streaming SSE for explain_lines
4.6 Slack / Linear regression notifications

See docs/FEATURES.md for the canonical Done/Next list and docs/BUILD.md §7 for the per-feature implementation plan.

License

MIT — see LICENSE.

Contributing

PRs welcome. Run npm run typecheck + npm run lint before opening. For larger changes, please open an issue first to discuss the design — docs/BUILD.md §8 has the "how to pick up the next chunk" recipe.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.claude		.claude
_prototype		_prototype
bin		bin
cloudflare-worker		cloudflare-worker
codespace		codespace
docs		docs
drizzle		drizzle
src		src
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
drizzle.config.ts		drizzle.config.ts
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.cli.json		tsconfig.cli.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeScope

Quick start

What you get

Three views per indexed repo

Intent-diff for every commit

GitHub auto-sync

Cost-controlled, BYO keys

CLI

One key. Same quality. No second signup.

What if you don't have a key yet?

Pro / Max subscribers (Claude Code, ChatGPT Plus)

MCP server (Claude Code, Cursor, Codex)

Setup in Claude Code

Setup in Cursor

The agent's post-edit verification flow

Demo (10 minutes, exercises every feature)

Architecture (one-liner per layer)

Storage

Tech stack

Development

Publishing

Roadmap

License

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeScope

Quick start

What you get

Three views per indexed repo

Intent-diff for every commit

GitHub auto-sync

Cost-controlled, BYO keys

CLI

One key. Same quality. No second signup.

What if you don't have a key yet?

Pro / Max subscribers (Claude Code, ChatGPT Plus)

MCP server (Claude Code, Cursor, Codex)

Setup in Claude Code

Setup in Cursor

The agent's post-edit verification flow

Demo (10 minutes, exercises every feature)

Architecture (one-liner per layer)

Storage

Tech stack

Development

Publishing

Roadmap

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages