Skip to content

refactor(audit): replace git worktree add with git archive | tar -x for audit-cache#89

Merged
SutuSebastian merged 4 commits into
mainfrom
feat/audit-cache-no-worktree
May 15, 2026
Merged

refactor(audit): replace git worktree add with git archive | tar -x for audit-cache#89
SutuSebastian merged 4 commits into
mainfrom
feat/audit-cache-no-worktree

Conversation

@SutuSebastian
Copy link
Copy Markdown
Contributor

@SutuSebastian SutuSebastian commented May 15, 2026

Summary

codemap audit --base <ref> materialised the sha-keyed cache into .codemap/audit-cache/<sha>/ via git worktree add --detach. That created two downstream-cleanup footguns:

  • git clean -xdf refuses to descend into registered worktrees → consumers had to escalate to -ff, which also nukes unrelated nested repos.
  • Plain rm -rf left dangling registrations in <repo>/.git/worktrees/ → consumers had to remember git worktree prune after cleanup.

This PR swaps the materialization primitive to git archive --format=tar <sha> piped through tar -x. The cache entry is now a plain extracted tree (no .git artifact, no registered worktree) — git clean -xdf and rm -rf both just work.

Tracer-bullet slices

Commit Slice
acd64a3 Slice 1 — switch populateWorktree from git worktree add to `git archive
db8b9bc Slice 2 — plumb the resolved sha through ReindexFnrunCodemapIndex({ commit })indexFiles({ commit }), so the cache DB's meta.last_indexed_commit is stamped without shelling out to git rev-parse HEAD in the (.git-less) cache dir. Silences a fatal: not a git repository stderr line that older revisions leaked. Regression test asserts the reindex callback receives the sha.
993dae2 Slice 3 — sync glossary, MCP audit tool description, and CLI --base help; add patch changeset.

Invariants preserved

  • Cache path layout <projectRoot>/.codemap/audit-cache/<sha>/.
  • Cache-hit detection (<sha>/.codemap/index.db exists).
  • Atomic populate (per-pid temp + POSIX rename; orphan .tmp.* sweep > 10 min).
  • LRU eviction (5 entries / 500 MiB).
  • WorktreeError code names (worktree-add-failed, reindex-failed, …) kept — external consumers discriminate on code, not the underlying primitive.
  • DEFAULT_EXCLUDE_DIR_NAMES still excludes audit-cache so codemap doesn't index snapshot copies during normal runs.

Migration

Existing consumers with .codemap/audit-cache/<sha>/ worktrees from earlier versions can run git worktree prune once after upgrade to clear dangling registrations. Harmless if skipped — registrations are inert when the path is gone.

Test plan

  • bun test — 953 pass, 0 fail (2 skip, pre-existing).
  • bun run typecheck clean.
  • bun run lint clean.
  • bun run format:check clean.
  • New regression: cache entry has no .git artifact and no git worktree list --porcelain entry.
  • New regression: reindex callback receives the resolved sha as 2nd arg.

Summary by CodeRabbit

  • Changes

    • Audit cache now materializes using git archive extraction instead of git worktree, enabling standard cleanup tools (git clean -xdf / rm -rf) without worktree-specific pruning.
  • Documentation

    • Updated audit help text and glossary to reflect the new cache materialization approach.
  • Migration

    • Users should run git worktree prune once after upgrade to clean up pre-existing cache worktrees.

Review Change Stack

…tree add

`codemap audit --base <ref>` materialised the sha into
`.codemap/audit-cache/<sha>/` via `git worktree add --detach`. That
created two downstream-cleanup footguns:

- `git clean -xdf` refuses to descend into registered worktrees; consumers
  had to escalate to `-ff`, which also nukes legitimate nested repos.
- Plain `rm -rf` left dangling registrations in `<repo>/.git/worktrees/`;
  consumers had to remember `git worktree prune` after cleanup.

Switch the materialization primitive to `git archive --format=tar <sha>`
piped through `tar -x`. Same on-disk content (identical bytes from the
git object store), no `.git` pointer file inside the cache, no registered
worktree. `git clean -xdf` and `rm -rf` both just work.

Existing invariants preserved:
- Cache path layout `<projectRoot>/.codemap/audit-cache/<sha>/`.
- Cache-hit detection (`<sha>/.codemap/index.db` exists).
- Atomic populate (per-pid temp + POSIX rename).
- LRU eviction (5 entries / 500 MiB; orphan `.tmp.*` sweep > 10 min).
- WorktreeError code names (`worktree-add-failed`, `reindex-failed`, ...)
  for API stability — external consumers discriminate on `code`, not
  primitive name.

Migration for existing consumers: one-time `git worktree prune` after
upgrade clears dangling registrations from old cache entries. Harmless
if skipped — registrations are inert when the path is gone.

Adds a regression test asserting the cache entry has no `.git` artifact
and never appears in `git worktree list --porcelain`.
…v-parse

`indexFiles` ends every full rebuild with `setMeta('last_indexed_commit',
getCurrentCommit())`, which shells out `git rev-parse HEAD` in the project
root. After the audit-cache switch to `git archive | tar -x`, the cache
dir has no `.git` of its own, so that shell-out fails (status 128, empty
stdout) — the cache DB ends up with `last_indexed_commit = ""` and a
`fatal: not a git repository` line leaks to stderr.

Thread the resolved sha through:

- `RunIndexOptions.commit?: string` — when set, `indexFiles` stamps that
  value instead of calling `getCurrentCommit()`.
- `ReindexFn` widens to `(worktreePath, commit?) => Promise<void>`;
  `makeWorktreeReindex` forwards `commit` into `runCodemapIndex`.
- `populateWorktree` invokes `opts.reindex(tmpPath, opts.sha)` — the sha
  is already resolved by the time we materialise the tree.

Existing test stubs typed as `(wp) => …` keep working — the second arg
is optional. Adds a regression test asserting the reindex callback
receives the sha.

Also slims the Slice 1 docstrings per `.agents/rules/concise-comments` —
the "why no git worktree" historical note belongs in the changeset, not
in every reader's eyeline forever.
Glossary entry, MCP `audit` tool description, and CLI `--base` help all
described the cache as a `git worktree add` materialisation. Update each
to reflect `git archive | tar -x` and the cleanup-UX consequences (plain
tree, `git clean -xdf` and `rm -rf` both work).

Adds a patch changeset covering this PR's two functional commits (cache
primitive switch + sha plumbing into the reindex callback) and the
one-time `git worktree prune` migration note for existing consumers.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: c50c7d3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@stainless-code/codemap Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@SutuSebastian has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 1 second before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: abb9cd17-d123-4217-b0a9-063dc31c1b5f

📥 Commits

Reviewing files that changed from the base of the PR and between 993dae2 and c50c7d3.

📒 Files selected for processing (2)
  • .changeset/audit-cache-cleanup.md
  • src/cli/cmd-audit.ts
📝 Walkthrough

Walkthrough

This PR replaces git worktree-based audit cache materialization with git archive | tar -x extraction, enabling simpler cleanup via rm -rf and introducing explicit commit-SHA passing through the reindex callback into the indexing pipeline for accurate metadata stamping.

Changes

Audit cache materialization from git worktree to archive extraction

Layer / File(s) Summary
Audit worktree population and cleanup implementation
src/application/audit-worktree.ts, src/application/audit-worktree.test.ts
populateWorktree now extracts cache entries via git archive piped to tar -x instead of git worktree add, calls the reindex callback with both path and resolved commit SHA, and simplified cleanup from git worktree remove --force to plain rm -rf across all eviction and test-wipe paths.
Reindex callback contract and indexing integration
src/application/audit-engine.ts, src/application/index-engine.ts, src/application/run-index.ts
ReindexFn type now accepts optional commit parameter; makeWorktreeReindex forwards it into runCodemapIndex with { mode: "full", commit }, and the commit flows through RunIndexOptions into indexFiles to stamp meta.last_indexed_commit directly instead of invoking git rev-parse HEAD.
Tests validating plain-tree cache behavior and commit passing
src/application/audit-worktree.test.ts
New test cases assert that archive-extracted cache entries contain no .git artifacts or git worktree registrations, and that populateWorktree correctly forwards the resolved commit SHA as the second callback argument.
Documentation and user-facing help text updates
.changeset/audit-cache-cleanup.md, docs/glossary.md, src/application/mcp-server.ts, src/cli/cmd-audit.ts
Changeset, glossary, MCP tool description, and CLI help text updated to reflect `git archive

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • stainless-code/codemap#52: Builds directly on the worktree-based --base caching implementation by changing the materialization strategy from git worktree add to archive extraction and updating the reindex callback to pass the resolved commit SHA.

Poem

🐰 A worktree once rooted, now archive-extracted,
Tar files unrolled, cleanup perfected—
The commit flows cleanly through reindex's call,
No git pointer lingering, just archives for all! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: replacing git worktree add with git archive | tar -x for audit-cache materialization, which is the core refactor across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/audit-cache-no-worktree

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cli/cmd-audit.ts (1)

254-254: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update overview text to reflect the new git archive approach.

This line still describes the old behavior ("materialises a worktree + reindex"), but the PR switches to git archive | tar -x materialization. The detailed flag description at lines 261-266 correctly describes the new approach, but this overview text is now inconsistent.

📝 Suggested update
-or against a git ref (\`--base <ref>\` materialises a worktree + reindex), and emit structural deltas
+or against a git ref (\`--base <ref>\` materialises via git archive + reindex), and emit structural deltas
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli/cmd-audit.ts` at line 254, Update the overview help text that
currently reads "materialises a worktree + reindex" to reflect the new git
archive approach—replace that phrase with something like "materialises via git
archive | tar -x (no worktree; reindex after extract)" so the short overview
matches the detailed flag description; locate and edit the help/usage string in
src/cli/cmd-audit.ts that contains the phrase "materialises a worktree +
reindex" and update it accordingly.
🧹 Nitpick comments (1)
.changeset/audit-cache-cleanup.md (1)

12-12: 💤 Low value

Consider rephrasing to avoid redundancy.

The phrase "silently silences" is redundant. Consider using "eliminates" or "prevents" instead for clearer prose.

✏️ Suggested rewording
-Also: the cache reindex now stamps `meta.last_indexed_commit` with the resolved sha directly instead of shelling out to `git rev-parse HEAD` inside the cache dir — silently silences a `fatal: not a git repository` stderr line that older revisions leaked.
+Also: the cache reindex now stamps `meta.last_indexed_commit` with the resolved sha directly instead of shelling out to `git rev-parse HEAD` inside the cache dir — eliminates a `fatal: not a git repository` stderr line that older revisions leaked.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.changeset/audit-cache-cleanup.md at line 12, The sentence describing the
cache reindex behavior uses redundant phrasing "silently silences"; update the
line so it reads e.g. "Also: the cache reindex now stamps
`meta.last_indexed_commit` with the resolved sha directly instead of shelling
out to `git rev-parse HEAD` inside the cache dir — preventing a `fatal: not a
git repository` stderr line that older revisions leaked." Replace "silently
silences" with a single clearer verb such as "prevents" or "eliminates" and keep
references to `meta.last_indexed_commit` and `git rev-parse HEAD` intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cli/cmd-audit.ts`:
- Line 254: Update the overview help text that currently reads "materialises a
worktree + reindex" to reflect the new git archive approach—replace that phrase
with something like "materialises via git archive | tar -x (no worktree; reindex
after extract)" so the short overview matches the detailed flag description;
locate and edit the help/usage string in src/cli/cmd-audit.ts that contains the
phrase "materialises a worktree + reindex" and update it accordingly.

---

Nitpick comments:
In @.changeset/audit-cache-cleanup.md:
- Line 12: The sentence describing the cache reindex behavior uses redundant
phrasing "silently silences"; update the line so it reads e.g. "Also: the cache
reindex now stamps `meta.last_indexed_commit` with the resolved sha directly
instead of shelling out to `git rev-parse HEAD` inside the cache dir —
preventing a `fatal: not a git repository` stderr line that older revisions
leaked." Replace "silently silences" with a single clearer verb such as
"prevents" or "eliminates" and keep references to `meta.last_indexed_commit` and
`git rev-parse HEAD` intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4c61722-4f8f-427b-801c-a57caaedf766

📥 Commits

Reviewing files that changed from the base of the PR and between 7c30c6c and 993dae2.

📒 Files selected for processing (9)
  • .changeset/audit-cache-cleanup.md
  • docs/glossary.md
  • src/application/audit-engine.ts
  • src/application/audit-worktree.test.ts
  • src/application/audit-worktree.ts
  • src/application/index-engine.ts
  • src/application/mcp-server.ts
  • src/application/run-index.ts
  • src/cli/cmd-audit.ts

- `cmd-audit.ts:254` overview help still said "materialises a worktree +
  reindex" — Slice 3 updated only the detailed flag section. Sync to
  `git archive | tar -x` for consistency.
- Drop the redundant "silently silences" in the changeset's stderr note.

Addresses CodeRabbit review on #89.
@SutuSebastian
Copy link
Copy Markdown
Contributor Author

Both findings ✅ applied in c50c7d3:

  • cmd-audit.ts:254 overview help text now reads git archive | tar -x (the detailed flag section at 261-266 was already updated in Slice 3; the overview at 254 slipped through).
  • Changeset: dropped the redundant "silently silences" → just "silences".

Thanks for the catches.

@SutuSebastian SutuSebastian merged commit 6e53458 into main May 15, 2026
11 checks passed
@SutuSebastian SutuSebastian deleted the feat/audit-cache-no-worktree branch May 15, 2026 18:07
@github-actions github-actions Bot mentioned this pull request May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant