A skill for AI coding agents (Claude) that retroactively generates MADR-format Architectural Decision Records from an existing codebase's git history, and annotates relevant code with inline @ADR reference tags.
Coding agents work great on greenfield projects — give them a spec and they build with context. But on existing codebases, they're flying blind. They see what the code does, but not why it looks the way it does. Why is Redis used here instead of Postgres? Why is this service async? Why did the team move away from REST?
That context lives in the heads of engineers who wrote the code, or buried in PRs nobody reads. New developers ramp up slowly. Agents make decisions that contradict ones already made.
ADRs solve this — but nobody writes them retroactively by hand for a three-year-old codebase.
This skill does it automatically.
- Reads your git history — commit by commit, PR by PR
- Detects what kind of history you have — merge commits, squash merges, rebase/linear, or mixed
- Groups commits into logical decision chunks — not just by time, but by what subsystems they touched and what they changed — then clusters related chunks across PR boundaries, so one decision spread over several PRs becomes a single ADR instead of three redundant ones
- Classifies which chunks represent architectural decisions — filtering out bug fixes, lint, version bumps, and test-only changes using diff signals, not just commit messages
- Generates a MADR file per decision — with context, considered options (only what's evidenced in the diff), outcome, and links back to the original commits
- Places
@ADRinline tags on the relevant functions and classes — so a coding agent grepping the codebase finds the decision log right at the code it affected
docs/decisions/
├── 0001-redis-cache-strategy.md
├── 0002-async-event-queue.md
├── 0003-grpc-inter-service-transport.md
└── ...
Each file follows the MADR template with a retroactive notice and links to original commits.
Pass --json to analyze.py to get machine-readable candidates — each entry includes score, classification, and a diff_summary (files added/deleted/modified) so the agent can rank and triage without running git diff manually. Edge specificity auto-tunes from commit count by default, so clustering is sane out-of-box with no extra config.
# @ADR-0002-async-event-queue: switched from sync to async event processing — see docs/decisions/0002-async-event-queue.md
async def process_events(self, batch: list[Event]) -> None:// @ADR-0003-grpc-transport: gRPC adopted over REST for inter-service comms — see docs/decisions/0003-grpc-inter-service-transport.md
func NewServiceClient(addr string) (*ServiceClient, error) {A coding agent can now grep -r "@ADR" and surface every architectural decision site in the codebase.
For small/medium repos the full multi-step loop is overkill. The agent runs analyze.py once as a scored classifier, reads its JSON output (each candidate carries score, classification, and a diff_summary), triages without calling git diff by hand, writes MARDs directly, then runs judge.py as the safety net.
"Run adr-generator on this repo"
The agent detects everything and runs the full chunking → clustering → classification → generation → verification loop without interruption. Best for large repos (≈500+ commits) or histories with many interleaved decisions across PR boundaries.
"Run adr-generator on this repo in autonomous mode, scope to src/payments since v2.0.0"
The agent pauses at each major step — chunking, classification, draft review, tag placement — and waits for your confirmation before proceeding. Best when you care about quality over speed, or when commit history is messy.
"Run adr-generator in assisted mode on the full repo"
History scope
| Option | What it does |
|---|---|
full |
Entire git history |
since:<commit-or-tag> |
From a specific commit or tag onward (e.g. since:v2.0.0) |
Code scope
| Option | What it does |
|---|---|
repo |
Entire repository |
module:<path> |
Specific folder only (e.g. module:src/payments) |
Any combination works. For large repos or monorepos, module: + since: is the recommended starting point.
Good fit:
- Onboarding new engineers or agents to an existing codebase
- Codebase has meaningful git history with decent commit messages
- You want to run a coding agent on legacy code without it undoing past decisions
- Modules with clear boundaries (payments, auth, infra) that you want to document independently
- Repos using squash-merge or merge-commit workflows (clear PR boundaries to chunk from — though a single decision may still span several PRs, which the clustering pass bundles back together)
Not a good fit:
- Shallow clones — the skill will stop and tell you to unshallow first
- Repos where every commit is
"fix"or"update"with no diff discipline — the output will be low quality - Repos with >500 commits and no clear versioning — use
since:to scope it down first - Codebases where all the decision context lives in Confluence/Notion and never touched the commit history
One of the trickier parts of retroactive ADR generation is that repos use git very differently. The skill detects your workflow automatically:
| Workflow | Detection | Chunking |
|---|---|---|
| Merge commits (classic PR flow) | git log --merges |
Each merge commit = one chunk |
| Squash merges (GitHub default) | Commits with (#NNN) in subject |
Each squash commit = one chunk |
| Rebase / linear history | No merges, no PR refs | Group by file-path affinity |
| Mixed | Both patterns present | Merge-boundary where possible, squash elsewhere |
Note: temporal proximity (grouping commits by time) is explicitly disabled for rebase workflows, since rebasing destroys original timestamps.
The biggest risk with retroactive documentation is the AI confidently making things up. The skill has explicit rules to prevent this:
- Considered Options only lists alternatives that are directly evidenced in the diff (code being replaced) or the commit message ("switched from X to Y"). If none are found, it writes
"No alternatives recorded in commit history."— not a plausible guess. - Rationale is only drawn from commit messages and diff context. If unclear, the MADR says so explicitly rather than fabricating a reason.
- Every generated MADR includes a header note:
"This ADR was retroactively generated from commit history. Rationale accuracy depends on commit message quality."
Beyond the in-generation rules above, a verification pass (scripts/judge.py) re-checks the output instead of trusting the generator to police itself.
Considered Options — citation-grounded and fully deterministic. The generator must cite each option inline (a hidden <!-- evidence: <sha> deleted:path --> comment). Because the git history is a closed corpus, every check is an exact lookup, not a model call:
- Confirm the cited SHA exists, then confirm the claimed change is real: file
deleted/added/renamed(name-status),removed:"line"(grep the commit's diff for that deleted line — this covers code replaced inside a modified file), or amessage:phrase. A fabricated SHA or a change that didn't happen → the option is dropped. An uncited option, or one citing an evidence type that can't be checked → dropped. - There is no LLM on the verification path. An option that can only be argued semantically is dropped, not adjudicated — for a retroactive ADR, if you can't point at the bytes, it shouldn't be asserted. (This replaced an earlier LLM-judge design: a model verdict over terse commit metadata measures sample variance, not the judge's systematic over-inference, so citation grounding is both stronger and reproducible.)
Surviving options are kept; the rest are dropped (collapsing to "No alternatives recorded" if none survive), and the ADR's frontmatter is stamped evidence-verified: true. A downstream agent reading that flag should trust the prose and not re-investigate the cited commits — the citations are an audit trail, not a re-check prompt.
Tag syntax check (deterministic). After @ADR tags are inserted, each touched file is parse-checked with the language's own compiler so a misplaced comment can't break the build. No model is involved.
The deterministic syntax check supports a fixed set of languages. Files outside this set still get tags placed, but the syntax check skips them (it never false-fails):
| Language | @ADR tags placed |
Syntax-checked |
|---|---|---|
| Python | ✅ | ✅ |
JavaScript (.js / .mjs / .cjs) |
✅ | ✅ |
| Ruby | ✅ | ✅ |
Shell (.sh / .bash) |
✅ | ✅ |
| Go | ✅ | ✅ (requires gofmt on PATH) |
TypeScript (.ts / .tsx) |
✅ | ❌ skipped |
| Java / Kotlin | ✅ | ❌ skipped |
| Rust | ✅ | ❌ skipped |
If a checker's tool isn't installed on the host (e.g. gofmt, node, ruby), that file is also skipped rather than failed.
The skill requires three components to be co-located: the skill definition (SKILL.md), the deterministic scripts (scripts/), and the reference files (references/). Clone the repo and symlink or copy the whole directory into your Claude skills folder:
git clone https://github.com/hailcpy/gen-adr.git
cp -r gen-adr ~/.claude/skills/adr-generatorThe resulting layout should be:
~/.claude/skills/
└── adr-generator/
├── SKILL.md
├── scripts/
│ ├── analyze.py ← deterministic chunking/clustering/classification
│ └── judge.py ← citation verifier and tag syntax checker
└── references/
├── madr-examples.md
└── language-tag-patterns.md
Requirements: Python 3 (stdlib only — no pip install needed). For Go syntax checking, gofmt must be on PATH; all other checked languages use built-in tools (node, ruby, bash -n). If a checker tool is absent, that file is skipped rather than failed.
The agent will pick up the skill automatically when you ask it to generate ADRs or document architectural decisions.
Using the adr-generator skill, run in assisted mode on the module src/billing,
since the tag v1.4.0. I want to review each classification before you generate anything.
Clustering is connected-components over a pairwise-affinity graph. A "hot" file or token touched by many commits can bridge otherwise-unrelated commits into one giant cluster.
Signs something is wrong:
- A candidate with
WARN: oversized clusterin its output - One candidate that absorbs >25% of all commits
Fix: re-run with a config file that caps edge specificity so only rare shared files/tokens/directories count as affinity:
python3 scripts/analyze.py <repo> --config knobs.json --json{ "file_df_max": 4, "token_df_max": 4, "dir_df_max": 4 }Start at roughly max(3, commits/40) for each cap and tighten until the largest cluster represents a believable single decision. These default to null (off), preserving the baseline for repos that don't need it.
- Not a substitute for writing ADRs as you go. Retroactive ADRs are always an approximation. They're better than nothing, but they'll miss context that lived in Slack threads, meetings, or engineers' heads.
- Quality scales with commit discipline. Repos with descriptive commit messages and clean PR history produce much better ADRs than repos full of
"wip"and"fix stuff"commits. - The
@ADRtag format is an extension of MADR, not part of the official spec. It's designed for grep-ability and agent discoverability, not for standard MADR tooling.
