refactor(audit): replace git worktree add with git archive | tar -x for audit-cache#89
Conversation
…tree add `codemap audit --base <ref>` materialised the sha into `.codemap/audit-cache/<sha>/` via `git worktree add --detach`. That created two downstream-cleanup footguns: - `git clean -xdf` refuses to descend into registered worktrees; consumers had to escalate to `-ff`, which also nukes legitimate nested repos. - Plain `rm -rf` left dangling registrations in `<repo>/.git/worktrees/`; consumers had to remember `git worktree prune` after cleanup. Switch the materialization primitive to `git archive --format=tar <sha>` piped through `tar -x`. Same on-disk content (identical bytes from the git object store), no `.git` pointer file inside the cache, no registered worktree. `git clean -xdf` and `rm -rf` both just work. Existing invariants preserved: - Cache path layout `<projectRoot>/.codemap/audit-cache/<sha>/`. - Cache-hit detection (`<sha>/.codemap/index.db` exists). - Atomic populate (per-pid temp + POSIX rename). - LRU eviction (5 entries / 500 MiB; orphan `.tmp.*` sweep > 10 min). - WorktreeError code names (`worktree-add-failed`, `reindex-failed`, ...) for API stability — external consumers discriminate on `code`, not primitive name. Migration for existing consumers: one-time `git worktree prune` after upgrade clears dangling registrations from old cache entries. Harmless if skipped — registrations are inert when the path is gone. Adds a regression test asserting the cache entry has no `.git` artifact and never appears in `git worktree list --porcelain`.
…v-parse
`indexFiles` ends every full rebuild with `setMeta('last_indexed_commit',
getCurrentCommit())`, which shells out `git rev-parse HEAD` in the project
root. After the audit-cache switch to `git archive | tar -x`, the cache
dir has no `.git` of its own, so that shell-out fails (status 128, empty
stdout) — the cache DB ends up with `last_indexed_commit = ""` and a
`fatal: not a git repository` line leaks to stderr.
Thread the resolved sha through:
- `RunIndexOptions.commit?: string` — when set, `indexFiles` stamps that
value instead of calling `getCurrentCommit()`.
- `ReindexFn` widens to `(worktreePath, commit?) => Promise<void>`;
`makeWorktreeReindex` forwards `commit` into `runCodemapIndex`.
- `populateWorktree` invokes `opts.reindex(tmpPath, opts.sha)` — the sha
is already resolved by the time we materialise the tree.
Existing test stubs typed as `(wp) => …` keep working — the second arg
is optional. Adds a regression test asserting the reindex callback
receives the sha.
Also slims the Slice 1 docstrings per `.agents/rules/concise-comments` —
the "why no git worktree" historical note belongs in the changeset, not
in every reader's eyeline forever.
Glossary entry, MCP `audit` tool description, and CLI `--base` help all described the cache as a `git worktree add` materialisation. Update each to reflect `git archive | tar -x` and the cleanup-UX consequences (plain tree, `git clean -xdf` and `rm -rf` both work). Adds a patch changeset covering this PR's two functional commits (cache primitive switch + sha plumbing into the reindex callback) and the one-time `git worktree prune` migration note for existing consumers.
🦋 Changeset detectedLatest commit: c50c7d3 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR replaces ChangesAudit cache materialization from git worktree to archive extraction
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cli/cmd-audit.ts (1)
254-254:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate overview text to reflect the new git archive approach.
This line still describes the old behavior ("materialises a worktree + reindex"), but the PR switches to
git archive | tar -xmaterialization. The detailed flag description at lines 261-266 correctly describes the new approach, but this overview text is now inconsistent.📝 Suggested update
-or against a git ref (\`--base <ref>\` materialises a worktree + reindex), and emit structural deltas +or against a git ref (\`--base <ref>\` materialises via git archive + reindex), and emit structural deltas🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/cmd-audit.ts` at line 254, Update the overview help text that currently reads "materialises a worktree + reindex" to reflect the new git archive approach—replace that phrase with something like "materialises via git archive | tar -x (no worktree; reindex after extract)" so the short overview matches the detailed flag description; locate and edit the help/usage string in src/cli/cmd-audit.ts that contains the phrase "materialises a worktree + reindex" and update it accordingly.
🧹 Nitpick comments (1)
.changeset/audit-cache-cleanup.md (1)
12-12: 💤 Low valueConsider rephrasing to avoid redundancy.
The phrase "silently silences" is redundant. Consider using "eliminates" or "prevents" instead for clearer prose.
✏️ Suggested rewording
-Also: the cache reindex now stamps `meta.last_indexed_commit` with the resolved sha directly instead of shelling out to `git rev-parse HEAD` inside the cache dir — silently silences a `fatal: not a git repository` stderr line that older revisions leaked. +Also: the cache reindex now stamps `meta.last_indexed_commit` with the resolved sha directly instead of shelling out to `git rev-parse HEAD` inside the cache dir — eliminates a `fatal: not a git repository` stderr line that older revisions leaked.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.changeset/audit-cache-cleanup.md at line 12, The sentence describing the cache reindex behavior uses redundant phrasing "silently silences"; update the line so it reads e.g. "Also: the cache reindex now stamps `meta.last_indexed_commit` with the resolved sha directly instead of shelling out to `git rev-parse HEAD` inside the cache dir — preventing a `fatal: not a git repository` stderr line that older revisions leaked." Replace "silently silences" with a single clearer verb such as "prevents" or "eliminates" and keep references to `meta.last_indexed_commit` and `git rev-parse HEAD` intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/cli/cmd-audit.ts`:
- Line 254: Update the overview help text that currently reads "materialises a
worktree + reindex" to reflect the new git archive approach—replace that phrase
with something like "materialises via git archive | tar -x (no worktree; reindex
after extract)" so the short overview matches the detailed flag description;
locate and edit the help/usage string in src/cli/cmd-audit.ts that contains the
phrase "materialises a worktree + reindex" and update it accordingly.
---
Nitpick comments:
In @.changeset/audit-cache-cleanup.md:
- Line 12: The sentence describing the cache reindex behavior uses redundant
phrasing "silently silences"; update the line so it reads e.g. "Also: the cache
reindex now stamps `meta.last_indexed_commit` with the resolved sha directly
instead of shelling out to `git rev-parse HEAD` inside the cache dir —
preventing a `fatal: not a git repository` stderr line that older revisions
leaked." Replace "silently silences" with a single clearer verb such as
"prevents" or "eliminates" and keep references to `meta.last_indexed_commit` and
`git rev-parse HEAD` intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c4c61722-4f8f-427b-801c-a57caaedf766
📒 Files selected for processing (9)
.changeset/audit-cache-cleanup.mddocs/glossary.mdsrc/application/audit-engine.tssrc/application/audit-worktree.test.tssrc/application/audit-worktree.tssrc/application/index-engine.tssrc/application/mcp-server.tssrc/application/run-index.tssrc/cli/cmd-audit.ts
- `cmd-audit.ts:254` overview help still said "materialises a worktree + reindex" — Slice 3 updated only the detailed flag section. Sync to `git archive | tar -x` for consistency. - Drop the redundant "silently silences" in the changeset's stderr note. Addresses CodeRabbit review on #89.
|
Both findings ✅ applied in
Thanks for the catches. |
Summary
codemap audit --base <ref>materialised the sha-keyed cache into.codemap/audit-cache/<sha>/viagit worktree add --detach. That created two downstream-cleanup footguns:git clean -xdfrefuses to descend into registered worktrees → consumers had to escalate to-ff, which also nukes unrelated nested repos.rm -rfleft dangling registrations in<repo>/.git/worktrees/→ consumers had to remembergit worktree pruneafter cleanup.This PR swaps the materialization primitive to
git archive --format=tar <sha>piped throughtar -x. The cache entry is now a plain extracted tree (no.gitartifact, no registered worktree) —git clean -xdfandrm -rfboth just work.Tracer-bullet slices
acd64a3populateWorktreefromgit worktree addto `git archivedb8b9bcReindexFn→runCodemapIndex({ commit })→indexFiles({ commit }), so the cache DB'smeta.last_indexed_commitis stamped without shelling out togit rev-parse HEADin the (.git-less) cache dir. Silences afatal: not a git repositorystderr line that older revisions leaked. Regression test asserts the reindex callback receives the sha.993dae2audittool description, and CLI--basehelp; add patch changeset.Invariants preserved
<projectRoot>/.codemap/audit-cache/<sha>/.<sha>/.codemap/index.dbexists).rename; orphan.tmp.*sweep > 10 min).WorktreeErrorcode names (worktree-add-failed,reindex-failed, …) kept — external consumers discriminate oncode, not the underlying primitive.DEFAULT_EXCLUDE_DIR_NAMESstill excludesaudit-cacheso codemap doesn't index snapshot copies during normal runs.Migration
Existing consumers with
.codemap/audit-cache/<sha>/worktrees from earlier versions can rungit worktree pruneonce after upgrade to clear dangling registrations. Harmless if skipped — registrations are inert when the path is gone.Test plan
bun test— 953 pass, 0 fail (2 skip, pre-existing).bun run typecheckclean.bun run lintclean.bun run format:checkclean..gitartifact and nogit worktree list --porcelainentry.Summary by CodeRabbit
Changes
git archiveextraction instead ofgit worktree, enabling standard cleanup tools (git clean -xdf/rm -rf) without worktree-specific pruning.Documentation
Migration
git worktree pruneonce after upgrade to clean up pre-existing cache worktrees.