For developers: You joined a new team. The README says "it's pretty simple." There are 47 open TODOs, a file that's 2,800 lines long, and everyone goes quiet when you ask about the payments module. You could spend three days reading files — or run this and know in an hour what to avoid, who owns what, and what to touch first.
For non-technical users: Someone told you the codebase is "in good shape." Before you say that in a board meeting, sign off on a launch, or make a vendor decision — run this. It maps the system in plain language, surfaces the real risk areas, and generates questions you can actually ask in your next engineering meeting without sounding like you're guessing.
Stop guessing. Build the right mental model before you break something.
New repo. Inherited codebase. Your own code after six months away. The instinct is to start reading files. That's slow, incomplete, and leaves you blind to the things that will actually burn you — the undocumented env var, the file everyone avoids, the test suite that breaks when run in parallel.
This skill does the archaeology. Claude runs the investigation, maps the architecture, hunts for gotchas, generates a working local dev guide, and produces a living CODEBASE.md. Then it stays useful: check a file before touching it, catch problems before pushing, map a ticket to the codebase before writing a line.
| Mode | Use when |
|---|---|
| join | First day on a team, inherited repo, colleague's codebase |
| return | Your own code you haven't touched in 3+ months |
| audit | Evaluating an OSS project before contributing |
| touch | About to modify a specific file — get a risk assessment first |
| preflight | About to push a PR — catch what reviewers will catch, before review |
| task | Assigned a ticket or feature — map it to the codebase before starting |
touch, preflight, and task are ongoing — they require an existing CODEBASE.md.
Phase 0 Bootstrap README, CI config, open issues, package manifests
Phase 1 Critical Paths Entry points, data stores, Mermaid architecture diagram
Phase 2 Conventions What git history reveals vs. what the README claims
Phase 3 Danger Zones High-churn files, debt clusters, frequently reverted code
Phase 4 Gotcha Detector Undocumented env vars, pre-commit/CI gaps, test traps
Phase 5 Local Dev Guide Step-by-step to get it running — real commands, common failures [technical]
Phase 6 Team Questions 1:1 format with priority tiers [technical]
Meeting Questions Sprint planning / roadmap / board framing [non-technical]
Phase 7 Executive Brief One-page health summary in business language [non-technical]
Phase 8 First Contribution Specific file + line + fix — not just a category [technical]
Phase 8b Ramp-up Timeline Week-by-week gates derived from findings — not a template [technical]
────────────────────────────────────────────────────────────────────────────────────────
Phase 9 Archaeology return only — why decisions were made, not just what they are
Phase 10 Contributor Signal audit only — merge rate, PR velocity, go/no-go
Before any phase runs, Claude asks two questions:
1. Technical or non-technical?
- Technical: file paths, code snippets, git commands, local dev guide, PR preflight
- Non-technical: plain language throughout, shareable architecture diagram, executive brief, questions framed for meetings — not for debugging sessions
2. What's your goal?
- Technical examples: make a contribution, take ownership, security review, evaluate OSS
- Non-technical examples: understand what the system does, assess risk before a launch, prepare for a roadmap or board conversation
The same investigation runs either way. The output is completely different.
Every section carries a confidence tag:
✅ Verified Based on CI config, git history, or explicit documentation
⚠️ Inferred Based on patterns — likely but not confirmed
❓ Gap Couldn't assess from code — needs a human answer
Gap sections automatically become Team Questions. If something is tagged ❓, there's a corresponding question to ask.
Example sections:
## Danger Zones ✅ Verified
| File / Area | Why dangerous | When to touch |
|---------------------|---------------------------------------|----------------|
| src/core/engine.go | 2,847 lines, 47 TODOs, in 89% of PRs | After 4+ weeks |
| migrations/ | Schema changes need team coordination | Never solo |
| auth/ | No tests, last touched 18 months ago | With review |
## Gotchas ✅ Verified
- `STRIPE_WEBHOOK_SECRET` required but absent from `.env.example` —
payments fail silently without it
- Pre-commit runs `eslint --fix`; CI runs `eslint` — passes locally,
fails CI if you don't re-stage after the hook fires
- `auth/` tests share a singleton — `pytest -n 4` causes random failures;
always run `pytest -p no:xdist auth/`
## Local Dev Guide ✅ Verified
1. `cp .env.example .env`
2. Set missing variables:
- `STRIPE_WEBHOOK_SECRET` — ask alice@example.com for the dev key
- `JWT_SECRET` — any 32-char string works locally
3. `npm install`
4. `docker-compose up -d postgres redis`
5. `npm run db:migrate`
6. `node scripts/seed.sh` ← not in README; required for tests
7. `npm run dev` → http://localhost:3000
Verify: `curl http://localhost:3000/health` → `{"status":"ok"}`For engineers:
graph LR
Client -->|HTTP| API[api/routes.go]
API --> Auth[auth/middleware.go]
Auth --> Handler[handlers/user.go]
Handler --> DB[(postgres)]
Handler --> Cache[(redis)]
For non-technical stakeholders — same investigation, plain language:
graph LR
User -->|sends request| API[Web API]
API --> Auth[Login Check]
Auth --> Logic[Business Logic]
Logic --> DB[(Database)]
Logic --> Cache[(Fast Cache)]
For technical users (1:1 format):
### 🔴 Blocking (ask in the first hour)
1. `STRIPE_WEBHOOK_SECRET` is in code but not `.env.example`. Shared dev key?
2. CI runs `pytest -x`, README says `make test`. Which for local dev?
### 🟡 Important (this week)
3. `payments/sync.go` reverted 3× in 6 months — active fix, or avoided?
### 🟢 Nice-to-know
4. `core/engine.go` is 2,400 lines — plan to split it, or intentional?For non-technical users (meeting format):
### For your next sprint planning
- The payment module has broken 3 times this year — what's the risk
if we ship features that touch it this sprint?
### For a board or investor conversation
- How would you describe the overall health of the engineering foundation?Generated after Phase 8. Every checkpoint references actual files, people, and question numbers found during the investigation — not generic milestones.
## Ramp-up Timeline ⚠️ Inferred
### Week 1 — get oriented and unblocked
□ Local dev running: `curl http://localhost:3000/health` → {"status":"ok"}
□ STRIPE_WEBHOOK_SECRET and JWT_SECRET added to .env (ask alice@example.com)
□ Can explain Client → API → Auth → Handler → DB without CODEBASE.md
□ Blocking team questions answered (questions 1 and 2 — see Team Questions 🔴)
□ First safe contribution submitted (target: test_auth.py line 47)
### Week 2 — know how the team works
□ First PR merged without commit message feedback (conventional commits format)
□ PR size within team norm (under 400 lines, based on git log)
□ Know who to ping: auth → alice@example.com, payments/API → bob@example.com
□ Important questions answered (questions 3–5 — see Team Questions 🟡)
### Week 4 — own the codebase
□ Can name all 3 Danger Zones without looking at CODEBASE.md
□ Touch mode no longer needed outside Danger Zones
□ Ready to review a teammate's PR for convention compliance
□ CODEBASE.md updated with anything that was wrong or missingReturn mode adds a recovery gate at end of Week 1: archaeology complete, changes since your absence absorbed, prior mental model assumptions flagged as outdated.
## Executive Brief
### Codebase health
| Area | Status | Business impact |
|------|--------|----------------|
| Core engine | 🔴 High risk | Changes here are slow and bug-prone |
| Payments | 🟡 Unstable | Has broken 3× in 6 months |
| Auth | 🟡 Untested | No safety net; bugs affect all users |
| API | 🟢 Healthy | Well-maintained, stable |
### Top risks
1. Payment processing has broken and been reverted three times — any change
here carries meaningful risk of customer-facing outage.
2. Authentication has no automated tests — bugs affect every user.
### Overall assessment
Medium risk. The API layer is healthy, but two critical areas (payments
and auth) need investment before safely shipping major new features."I'm about to modify
auth/middleware.go— run touch mode."
Before You Touch: auth/middleware.go
Risk level: HIGH — listed in Danger Zones
Recent commits:
3 days ago fix: token expiry edge case alice@example.com
2 weeks ago REVERT: "refactor auth flow" — broke staging
Who to ping: alice@example.com (14 of last 20 commits)
Known issues:
Line 47 TODO refresh token rotation not implemented
Line 203 FIXME breaks with multiple active sessions
Watch out for:
Session singleton on line 89 — caused the revert two weeks ago
"Run preflight on my current changes."
Every ❌ includes the corrected version and exact command. Every
PR Pre-flight: feat/add-rate-limiting
Branch: feat/add-rate-limiting (3 source files, 0 test files changed)
────────────────────────────────────────
COMMIT MESSAGE
────────────────────────────────────────
❌ Doesn't follow conventional commits (used in 28 of last 30 commits)
Current: "add rate limiting"
Fix to: "feat(api): add rate limiting to middleware"
Command: git commit --amend -m "feat(api): add rate limiting to middleware"
────────────────────────────────────────
FILES CHANGED
────────────────────────────────────────
✅ api/routes.go — not a Danger Zone
✅ api/middleware.go — not a Danger Zone
⚠️ auth/middleware.go — DANGER ZONE
Why: No tests, last touched 18 months ago, security-sensitive
Action: alice@example.com must review (14 of last 20 commits here)
Watch for: session singleton on line 89 — caused a revert last month
────────────────────────────────────────
TEST COVERAGE
────────────────────────────────────────
❌ No test files in diff (convention: every commit that touches source touches tests)
Add tests to:
api/middleware.go → tests/api/middleware_test.go
auth/middleware.go → tests/auth/middleware_test.go
Pattern to follow: tests/api/logging_test.go
(added with "feat(api): add request logging" — same structure)
────────────────────────────────────────
GOTCHAS
────────────────────────────────────────
⚠️ Pre-commit runs eslint --fix; CI runs eslint without fix
After hook fires: git add api/middleware.go && git push
⚠️ auth/ tests share a singleton — don't run with -n
Use: pytest -p no:xdist tests/auth/
────────────────────────────────────────
VERDICT: ⚠️ ADDRESS BEFORE PUSHING
────────────────────────────────────────
1. git commit --amend -m "feat(api): add rate limiting to middleware"
2. Add tests to tests/api/middleware_test.go and tests/auth/middleware_test.go
(follow tests/api/logging_test.go)
3. Re-stage after pre-commit hook fires, then push
"I need to add rate limiting to the API — where do I start?"
Task: Add rate limiting to the API
Relevant files:
api/routes.go — entry point; where rate limiting hooks in
api/middleware.go — existing pattern to follow ← start here
Danger Zone proximity:
auth/middleware.go ⚠️ adjacent — avoid unless necessary
Similar past work:
"feat(api): add request logging middleware" — bob@example.com, 3 months ago
Same pattern: middleware.go, not routes.go
Conventions:
Every new middleware needs an integration test in tests/api/
Who to loop in: bob@example.com (owns api/, built existing middleware)
Risk level: LOW — api/ is not a Danger Zone, pattern is established
mkdir -p ~/.claude/skills/codebase-onboarding
curl -o ~/.claude/skills/codebase-onboarding/SKILL.md \
https://raw.githubusercontent.com/googlarz/codebase-onboarding/main/SKILL.md/codebase-onboarding join # new team or repo
/codebase-onboarding return # your own code after months away
/codebase-onboarding audit # evaluating OSS before contributing
/codebase-onboarding touch # before modifying a file
/codebase-onboarding preflight # before pushing a PR
/codebase-onboarding task # when starting any new piece of work
See CONTRIBUTING.md.