Skip to content

harsh-185/CodeScope

Repository files navigation

CodeScope

Turn any GitHub repo into a navigable, shareable website and answer "what did this commit mean?" — for humans and AI agents.

npm version License: MIT

CodeScope indexes a codebase once, then keeps that understanding live — incremental re-index on every push, intent-level diffs ("the validate() function lost its rate-limit check"), and an MCP server so Claude Code, Cursor, or Codex can read the same understanding you do.

Bring-your-own keys. Your Anthropic / OpenAI / GitHub credits. Nothing leaves your Postgres.


Quick start

# 1. Postgres + pgvector running locally
docker compose up -d
npx codescope-ai --database-url postgres://codescope:codescope@localhost:5433/codescope

# 2. Or with a CLI install:
npm i -g codescope-ai
codescope-ai ./path/to/repo
codescope-ai vercel/next.js                  # GitHub repo
codescope-ai https://github.com/vercel/next.js

The package is published as codescope-ai on npm (the unscoped codescope was rejected by npm's name-similarity policy). The product itself is still called CodeScope.

The CLI clones (or re-uses a local checkout), runs the indexer, starts a Next.js dev server on the first free port from 5173, and opens your browser at /<owner>/<name>. Re-running the same repo is incremental — only files whose content_hash changed get re-walked.

What you get

Three views per indexed repo

  • Overview — Sonnet-synthesised identity, top-5 files by importance, layer breakdown, recent commits, PDF export.
  • Architecture — toggle between collapsible Tree view and a Hanging force graph (react-force-graph-2d). Click any node for imports / exports / callers / callees + AI explanation.
  • Tours — pre-generated AI walkthroughs grounded in the real call graph. Ask any question, get a step-by-step answer with code highlights.

Intent-diff for every commit

After re-indexing, the Recent changes drawer shows:

  • Sonnet 2-sentence top-line summary
  • Per-layer breakdown (added / modified / deleted counts)
  • Files grouped by Added / Modified / Renamed / Deleted with Haiku was X / now Y narration
  • ⚠ Regression badge when Haiku flags a guard removal, scope narrowing, capability loss, or vulnerability surface
  • Click any file row → FileViewer opens directly into a unified-diff view with green/red line tinting

GitHub auto-sync

Subscribe a repo with one click and CodeScope re-indexes on every push. HMAC-verified, signed-POST handler, idempotent on X-GitHub-Delivery retries. Live delivery feed in the UI.

Cost-controlled, BYO keys

  • Tiered models — Sonnet for synthesis (1 call/repo) + top-importance files; Haiku for the long tail. ~10× cheaper than all-Sonnet.
  • Globally-keyed cachesexplain_cache and embedding_cache are content-addressable. Two repos with identical content (forks, vendored deps) share one cached answer.
  • Re-track idempotency — re-asking "what did commit X do?" returns the cached index_runs diff for free.

CLI

codescope-ai                            # index current dir + serve
codescope-ai <path>                     # index local checkout + serve
codescope-ai <owner>/<name>             # clone + index + serve
codescope-ai https://github.com/…       # same
codescope-ai track <repo>[@<sha>]       # pin to a commit, print intent-diff (no server)

Flags (also work via env vars — see --help):

Flag Env var Purpose
--anthropic-key ANTHROPIC_API_KEY Tours, intent-diff narrations, regression flags, line explanations
--openai-key OPENAI_API_KEY Semantic search via text-embedding-3-small. Without it, search degrades to keyword LIKE
--github-token GITHUB_TOKEN Bumps clone rate limit 60/hr → 5000/hr; required for private repos
--database-url DATABASE_URL Postgres connection string
-p, --port Dev server port (default: first free port from 5173)
--no-open Skip opening browser
--no-index Skip indexing, just serve

One key. Same quality. No second signup.

CodeScope ships with a hosted embedding gateway at codescope-embed.harsh-185.workers.dev that proxies to Voyage AI's voyage-code-3 — the highest-scoring code embedding model available. Authentication piggybacks on your Anthropic key: paste it into CodeScope, the gateway verifies it's real, and you get OpenAI-or-better-quality embeddings without a second API key.

What you have Embedding provider Quality vs OpenAI text-embedding-3-small
Anthropic key (default) voyage-code-3 via codescope-embed.harsh-185.workers.dev clearly better (~68 vs 51.6 MTEB code retrieval)
Anthropic + CODESCOPE_EMBED_PROVIDER=openai OpenAI text-embedding-3-small baseline
OpenAI only (no Anthropic) OpenAI text-embedding-3-small baseline (no AI tours/intent-diff)
Neither none — search degrades to FTS over LLM purposes if Anthropic is set usable but limited

Free tier: the hosted gateway gives every validated Anthropic key 10,000 embeddings/day for free. A typical user indexing 5–10 repos/week never approaches the cap. Above that, self-host the gateway in 5 minutes (HOSTING.md) — it's a 150-line Cloudflare Worker open-sourced in this repo at cloudflare-worker/.

Privacy: code text passes through codescope-embed.harsh-185.workers.dev on its way to Voyage. The gateway hashes your Anthropic key (sha256) for rate limiting; never stores raw keys; never logs request bodies. For full privacy: self-host the worker (point CODESCOPE_EMBED_ENDPOINT at your URL), use CODESCOPE_EMBED_PROVIDER=openai (your own OpenAI account), or skip embeddings entirely (FTS-only search still works).

What if you don't have a key yet?

Just run codescope — on first run with missing keys, the CLI prompts you interactively:

  Some API keys aren't set yet. Want to enter them now?
  All optional. Press Enter to skip any. Saved to ~/.codescope/keys.env (0600).

  Anthropic (Claude) API key
    Unlocks: AI tours, intent-diff narration, regression flagging, line explanations
    Get one: https://console.anthropic.com/settings/keys
    Paste ANTHROPIC_API_KEY (Enter to skip): █

Paste the key once → saved to ~/.codescope/keys.env (0600 perms, only readable by you) → all subsequent runs skip the prompt.

Precedence (highest wins):

  1. CLI flag (--anthropic-key sk-ant-…)
  2. Environment variable (ANTHROPIC_API_KEY)
  3. ~/.codescope/keys.env (saved from a previous prompt)
  4. Interactive prompt — TTY only, never in CI

Helpful flags:

  • --no-prompt — skip the prompt entirely (CI-safe)
  • --reset-keys — wipe ~/.codescope/keys.env and exit

The MCP server never prompts (stdin/stdout are reserved for JSON-RPC) but does read ~/.codescope/keys.env, so once you've saved keys via the CLI, the MCP server picks them up automatically.

Without any LLM key, indexing still works — heuristic-only purposes, keyword search, no AI tours. The startup banner shows exactly what's wired.

Pro / Max subscribers (Claude Code, ChatGPT Plus)

CodeScope uses the standard API key flow. Some notes on subscriptions:

  • Anthropic Pro / Max — generate a Claude OAuth API key from the same console page. Calls bill against your subscription rather than separate pay-as-you-go credits. Paste it into the prompt like any other key.
  • OpenAI ChatGPT Plus / Team — these subscriptions are separate from the API; you'll still need a pay-as-you-go API key. Used only for embeddings (text-embedding-3-small ≈ $0.02 per 1M tokens — indexing 1000 files costs cents).
  • Why we don't auto-detect Claude Code / Codex auth — those tools store OAuth tokens in ~/.claude/ and ~/.codex/ respectively, but reading another tool's credential store is fragile + may violate ToS. Cleaner: each tool generates its own key from the same console.

MCP server (Claude Code, Cursor, Codex)

CodeScope exposes the same understanding to coding agents over a stdio MCP server. 11 tools that fit the agent's natural workflow:

Tool Use case
list_repos Discover what's been indexed
get_overview Repo identity + top files + layers — orientation before editing
list_layers Files grouped by responsibility
search_files pgvector cosine semantic search
get_file Content + imports + exports + callers + callees
explain_lines Claude line-range explanation
list_tours / get_tour Guided walkthroughs
index_local_repo Bootstrap a fresh checkout
get_intent_diff Latest run's per-file was X / now Y + regression flags + summary
track_commit Pin to an arbitrary SHA, return its intent-diff

Setup in Claude Code

~/.claude.json (global) or .mcp.json (project):

{
  "mcpServers": {
    "codescope": {
      "command": "npx",
      "args": ["-y", "-p", "codescope-ai", "codescope-ai-mcp"],
      "env": {
        "DATABASE_URL": "postgres://codescope:codescope@localhost:5433/codescope",
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…"
      }
    }
  }
}

Setup in Cursor

~/.cursor/mcp.json:

{
  "mcpServers": {
    "codescope": {
      "command": "npx",
      "args": ["-y", "-p", "codescope-ai", "codescope-ai-mcp"],
      "env": { "...same as above..." }
    }
  }
}

The agent's post-edit verification flow

  1. Agent edits files + commits
  2. Calls track_commit local/repo (or index_local_repo)
  3. Reads back get_intent_diff to confirm what its own changes meant
  4. Decides whether the diff matches intent — and uses the 2-sentence summary + per-file drift narrations as PR description fodder

This is the loop CodeScope was designed for. Agents stop guessing what their edits did to the system; they read it back in plain English.


Demo (10 minutes, exercises every feature)

# After installing codescope-ai + Postgres
codescope-ai --database-url postgres://codescope:codescope@localhost:5433/codescope --no-index --no-open &
# (or just have CodeScope's Next.js dev server running)

# In another terminal:
git clone https://github.com/harsh-185/CodeScope && cd CodeScope
npm install
npm run demo:setup     # scaffolds 5-commit URL-shortener exercising every feature
npm run demo:index     # indexes the demo repo + opens browser

Then walk through docs/DEMO.md — 12 tours covering indexing, intent-diff, regression flagging, rename detection, line-level overlay, webhook auto-sync, MCP, and the global cache hits in action.


Architecture (one-liner per layer)

clone/walk → parse (Babel AST) → score → embed (OpenAI) → understand (Haiku per file)
        → analyze (Sonnet synthesis) → ready
                  │
                  ├── Postgres + pgvector (per-repo tables + global caches)
                  ├── Next.js 15 + React 19 (Overview / Architecture / Tours)
                  └── MCP stdio server (11 tools) ──→ Claude Code, Cursor, Codex

On push:
  GitHub webhook ──HMAC verify──> trackCommit ──> incremental re-index ──> intent-diff
                                                                     └─→ regression flags

See docs/BUILD.md for the full architecture + file-by- file module map.

Storage

Concept Tables
Per-repo repos, layers, files, file_edges, file_embeddings, functions, function_calls, tours, tour_steps, commits
Audit + diff intent index_runs, file_purpose_history
Auto-sync webhook_subscriptions, webhook_events
Global caches explain_cache (content-keyed), embedding_cache (content-hash PK)

Tech stack

  • Frontend — Next.js 15 (App Router) · React 19 · TypeScript · bespoke CSS (no Tailwind)
  • Database — Postgres 16 + pgvector · Drizzle ORM · drizzle-kit migrations
  • Indexer — Babel AST · isomorphic-git · token-bucket-style content-hash dedup
  • LLM — Anthropic SDK (Claude Sonnet 4.6 + Haiku 4.5) · OpenAI text-embedding-3-small
  • MCP@modelcontextprotocol/sdk over stdio
  • Diffdiff package for line-level rendering, custom shingle/Jaccard for rename detection

Development

git clone https://github.com/harsh-185/CodeScope
cd CodeScope
npm install
docker compose up -d
cp .env.example .env  # set ANTHROPIC_API_KEY + OPENAI_API_KEY
npm run db:migrate
npm run dev           # http://localhost:5173

Useful scripts:

npm run cli           # dev mode CLI (no install)
npm run mcp           # dev mode MCP server (no install)
npm run typecheck     # tsc --noEmit
npm run db:studio     # Drizzle web UI for the database

Publishing

See docs/PUBLISHING.md for the full walkthrough of publishing to npm + submitting to the MCP server registry.


Roadmap

Shipped: Phase 0 (foundation) · 1 (incremental + audit) · 2 (intent-diff) · 2.5 (commit tracker + cache globalization) · 3.1 (auto- sync) · 3.2 (line-level diff) · 3.3 (regressions) · 3.4 (rename detection)

Next (Phase 4 — bigger swings, independent):

  • 4.1 Multi-repo auth (next-auth + per-user repo access)
  • 4.2 Function-level diff (track function identity across edits)
  • 4.3 VS Code extension (sidebar webview + jump-to-line)
  • 4.4 Browser overlay (chrome extension on github.com)
  • 4.5 Streaming SSE for explain_lines
  • 4.6 Slack / Linear regression notifications

See docs/FEATURES.md for the canonical Done/Next list and docs/BUILD.md §7 for the per-feature implementation plan.


License

MIT — see LICENSE.


Contributing

PRs welcome. Run npm run typecheck + npm run lint before opening. For larger changes, please open an issue first to discuss the design — docs/BUILD.md §8 has the "how to pick up the next chunk" recipe.

About

A tool that turns any GitHub repo into a navigable, shareable website with AI-generated tours

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors