Skip to content

fix(index): refuse $HOME, system roots, and .lumenignore-blanketed dirs#160

Merged
aeneasr merged 3 commits into
mainfrom
fix/refuse-unindexable-roots
May 13, 2026
Merged

fix(index): refuse $HOME, system roots, and .lumenignore-blanketed dirs#160
aeneasr merged 3 commits into
mainfrom
fix/refuse-unindexable-roots

Conversation

@aeneasr
Copy link
Copy Markdown
Member

@aeneasr aeneasr commented May 13, 2026

Summary

Indexing $HOME (or any tree the user has blanketed with a catch-all .lumenignore) was walking the whole subtree, ignoring every file, producing an empty index — and on macOS triggering TCC prompts for ~/Desktop, ~/Documents, ~/Library, etc. along the way. Multiple Claude/Conductor sessions starting from $HOME amplified this into many concurrent indexers each repeating the same walk.

This adds merkle.IsRootUnindexable(dir), called from the two places that pick index roots:

  • findAncestorIndex now skips candidates that are unindexable, so walking up from a non-git subdirectory can no longer resolve to $HOME or to a user-disabled root.
  • runIndex refuses such targets before any config/embedder setup, so lumen index $HOME fails fast with a clear error.

The check combines:

  1. A hardcoded refusal list: /, $HOME, /Users, /tmp, /var, /etc, /usr, /Applications, /Library and their macOS /private/* twins.
  2. A .lumenignore catch-all probe — matches against lumen-root-probe-… and lumen-root-probe-…/… so patterns like **, **/*, */*, or bare * register as "ignore everything" but specific patterns like node_modules/ do not.

Test plan

  • go test ./internal/merkle/ -run TestIsRootUnindexable — covers no-file, empty file, specific patterns, doublestar, combined, single-*, comments-only, hardcoded paths, $HOME, and a sibling-of-home regression.
  • go test ./cmd/ -run TestFindAncestorIndex — new sub-test verifies ancestor walk skips a candidate whose .lumenignore declares it un-indexable.
  • go test ./cmd/ -run TestRunIndex_RefusesUnindexableRoot — verifies runIndex returns an error mentioning .lumenignore before any config/embedder work.
  • make lint — 0 issues.
  • make build-local — clean build.

Out of scope (follow-up)

  • Spawn deduplication (per-path pid lockfile) — separate concern, tracked separately.
  • Teaching DiscoverNestedGitRepos to honor the parent's .lumenignore — only matters when a refused root somehow slips through, which the two new guards prevent.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Directories can be marked unindexable via .lumenignore patterns to prevent indexing and skip them during ancestor resolution.
  • Bug Fixes

    • Indexing now refuses roots marked unindexable early and re-checks after path normalization to prevent bypass.
  • Tests

    • Added tests validating unindexable-root detection and ancestor-resolution behavior when roots are catch-all ignored.

Review Change Stack

Indexing $HOME (or another tree the user has blanketed with a catch-all
.lumenignore) walks the whole subtree, ignores every file, produces an
empty index, and on macOS triggers TCC prompts for ~/Desktop, ~/Documents,
~/Library, etc. along the way. Multiple Claude sessions starting from $HOME
amplified this into many concurrent indexers each repeating the same walk.

Add merkle.IsRootUnindexable(dir), checked in two places that pick index
roots:

- findAncestorIndex now skips candidates that are unindexable, so walking
  up from a non-git subdirectory can no longer resolve to $HOME or to a
  user-disabled root.
- runIndex refuses such targets before any config/embedder setup, so
  `lumen index $HOME` fails fast with a clear error.

The check combines a hardcoded refusal list (/, $HOME, /Users, /tmp,
/var, /etc, /usr, /Applications, /Library and macOS /private/* twins)
with a .lumenignore catch-all probe (matches "**", "**/*", "*", etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@aeneasr has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 28 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7c88d929-6119-4356-990e-35c38529c5ca

📥 Commits

Reviewing files that changed from the base of the PR and between 94c13a8 and a53d441.

📒 Files selected for processing (2)
  • internal/merkle/ignore.go
  • internal/merkle/ignore_test.go
📝 Walkthrough

Walkthrough

Adds IsRootUnindexable(dir) to detect hardcoded/home/.lumenignore catch-all roots and enforces this check in runIndex and findAncestorIndex; tests exercise detection, index refusal, and ancestor traversal skipping.

Changes

Un-indexable root detection and enforcement

Layer / File(s) Summary
Core un-indexability detection
internal/merkle/ignore.go, internal/merkle/ignore_test.go
Adds refusedRoots and IsRootUnindexable(dir) (bool, string) which rejects hardcoded system roots and the user home directory, compiles .lumenignore when present, and probes sentinel paths to detect catch-all patterns; tests cover missing/empty/specific/catch-all .lumenignore patterns and hardcoded/home refusals.
Index command guard
cmd/index.go, cmd/index_test.go
Imports merkle; runIndex now resolves projectPath to an absolute path early, returns an error if IsRootUnindexable is true before config/logger setup, removes a later redundant filepath.Abs block, and re-checks after normalizing to a git repo root; test asserts runIndex refuses catch-all .lumenignore roots.
Ancestor walk filtering
cmd/ancestor.go, cmd/ancestor_test.go
Imports merkle; findAncestorIndex extends its skip condition to also skip candidates where IsRootUnindexable(candidate) is true, preventing resolution to un-indexable ancestors; test verifies an ancestor with catch-all .lumenignore is not returned.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: preventing indexing of unindexable roots including $HOME, system directories, and directories with blanket .lumenignore patterns.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/refuse-unindexable-roots

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/merkle/ignore_test.go (1)

86-193: ⚡ Quick win

Use a table-driven structure for TestIsRootUnindexable.

This block repeats setup/assert logic across many inputs; a table-driven form will reduce duplication and make future case additions safer.

As per coding guidelines, **/*_test.go: Use table-driven tests for multiple test cases in Go.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/merkle/ignore_test.go` around lines 86 - 193, Refactor
TestIsRootUnindexable into a table-driven test: create a slice of test cases
(struct with name, ignoreContents string, rootPath string or a flag to use
t.TempDir, and expected bool) and iterate with t.Run for each case; move
repeated setup (dir := t.TempDir(), writeFile calls, os.UserHomeDir handling)
into per-case setup functions and call IsRootUnindexable for assertions; keep
special hardcoded-paths and home-dir checks as separate cases in the table
(referencing IsRootUnindexable, writeFile, t.TempDir, and os.UserHomeDir) so
each scenario is a single table entry and duplicated logic is eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/index.go`:
- Around line 57-63: The initial unindexable check uses
merkle.IsRootUnindexable(projectPath) on the raw CLI path but projectPath is
later normalized/rewritten (e.g., to a git root), so re-run the same check after
any normalization step that mutates projectPath; specifically, after the code
that adjusts projectPath (the block around where projectPath is
reassigned/normalized) call merkle.IsRootUnindexable(projectPath) again and
return the same fmt.Errorf(...) if true so the protection cannot be bypassed
(apply the same fix to the later spot referenced by lines 83-87).
- Around line 61-63: The refusal message wrongly blames .lumenignore for all
cases; update the code around merkle.IsRootUnindexable and projectPath so the
error text is accurate: either change merkle.IsRootUnindexable to return a
reason (e.g., (bool, string) or an error) and include that reason in the
fmt.Errorf, or replace the hardcoded message with a generic one such as
"refusing to index %s: root is unindexable (possible reasons: .lumenignore,
hardcoded root, or $HOME)"; reference merkle.IsRootUnindexable and projectPath
when making the change.

---

Nitpick comments:
In `@internal/merkle/ignore_test.go`:
- Around line 86-193: Refactor TestIsRootUnindexable into a table-driven test:
create a slice of test cases (struct with name, ignoreContents string, rootPath
string or a flag to use t.TempDir, and expected bool) and iterate with t.Run for
each case; move repeated setup (dir := t.TempDir(), writeFile calls,
os.UserHomeDir handling) into per-case setup functions and call
IsRootUnindexable for assertions; keep special hardcoded-paths and home-dir
checks as separate cases in the table (referencing IsRootUnindexable, writeFile,
t.TempDir, and os.UserHomeDir) so each scenario is a single table entry and
duplicated logic is eliminated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ac57aeee-be4c-463e-95ba-4d47958a67f1

📥 Commits

Reviewing files that changed from the base of the PR and between 0537eae and c7fb2fd.

📒 Files selected for processing (6)
  • cmd/ancestor.go
  • cmd/ancestor_test.go
  • cmd/index.go
  • cmd/index_test.go
  • internal/merkle/ignore.go
  • internal/merkle/ignore_test.go

Comment thread cmd/index.go Outdated
Comment thread cmd/index.go Outdated
…ason

Addresses CodeRabbit review on PR #160:

1. `git.RepoRoot` could resolve `projectPath` upward into an un-indexable
   root (e.g. a git repo whose root is $HOME), bypassing the pre-
   normalization guard. Add a second `IsRootUnindexable` check immediately
   after normalization. `findAncestorIndex` already filters internally, but
   the git branch did not.

2. `IsRootUnindexable` now returns `(bool, string)` where the second value
   is the refusal reason ("hardcoded system root", "user home directory",
   ".lumenignore catch-all pattern"). `runIndex` includes that reason in
   its error so the message is accurate regardless of which branch fired.

Tests refactored to table-driven form for the .lumenignore scenarios.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/merkle/ignore.go (1)

291-306: ⚡ Quick win

Consider adding /System to the macOS refused roots.

The refusedRoots map includes macOS-specific paths like /Applications and /Library but is missing /System, which contains macOS system files. While System Integrity Protection (SIP) prevents modification of /System, adding it to the refused list would provide users with a clear, immediate error message rather than letting the indexer attempt a slow walk that would ultimately fail.

📝 Suggested addition
 var refusedRoots = map[string]bool{
 	"/":            true,
 	"/Users":       true,
 	"/tmp":         true,
 	"/private/tmp": true,
 	"/var":         true,
 	"/private/var": true,
 	"/etc":         true,
 	"/usr":         true,
 	"/Applications": true,
 	"/Library":     true,
+	"/System":      true,
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/merkle/ignore.go` around lines 291 - 306, Update the refusedRoots
map in internal/merkle/ignore.go (the variable refusedRoots) to include the
macOS system path "/System" with a true value so the indexer immediately refuses
that root instead of attempting a slow walk; add the entry alongside the
existing macOS paths like "/Library" and "/Applications".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/merkle/ignore.go`:
- Around line 291-306: refusedRoots currently lists only Unix-style
forward-slash paths but IsRootUnindexable() uses filepath.Clean() on the
incoming path, which on Windows produces backslash paths, so lookups miss
Windows system roots; update the refusedRoots map to include canonical Windows
entries (e.g., "C:\\", "C:\\Windows", "C:\\Program Files", "C:\\Program Files
(x86)") and optionally add "/System" and "/opt" for macOS/Unix, and ensure keys
are created using filepath.Clean or filepath.FromSlash so the map keys use the
same normalization as IsRootUnindexable() (or alternatively normalize the
checked path with filepath.ToSlash before map lookup) to make matching reliable
across platforms.
- Around line 325-332: IsRootUnindexable currently uses filepath.Clean only, so
symlinked paths (e.g. a symlink to $HOME) bypass the root checks; update
IsRootUnindexable to first call filepath.EvalSymlinks on the incoming dir and
use the resolved path for comparisons (falling back to filepath.Clean if
EvalSymlinks returns an error), and also resolve the user's home directory via
filepath.EvalSymlinks after os.UserHomeDir() before comparing; ensure all
subsequent checks in IsRootUnindexable (e.g., refusedRoots lookup and home-dir
equality) use the resolved/cleaned path.

---

Nitpick comments:
In `@internal/merkle/ignore.go`:
- Around line 291-306: Update the refusedRoots map in internal/merkle/ignore.go
(the variable refusedRoots) to include the macOS system path "/System" with a
true value so the indexer immediately refuses that root instead of attempting a
slow walk; add the entry alongside the existing macOS paths like "/Library" and
"/Applications".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e09130cb-a90a-416a-b6d4-52b13a7ea091

📥 Commits

Reviewing files that changed from the base of the PR and between c7fb2fd and 94c13a8.

📒 Files selected for processing (5)
  • cmd/ancestor.go
  • cmd/index.go
  • cmd/index_test.go
  • internal/merkle/ignore.go
  • internal/merkle/ignore_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • cmd/ancestor.go
  • cmd/index_test.go
  • internal/merkle/ignore_test.go
  • cmd/index.go

Comment thread internal/merkle/ignore.go
Comment thread internal/merkle/ignore.go
The refusal map previously held only Unix paths and IsRootUnindexable
relied on filepath.Clean alone, so on Windows the system roots were
never matched and a symlink to $HOME or a system root bypassed the
guard.

- Add Windows entries (C:\, C:\Windows, C:\Program Files, etc.) and
  the missing /opt, /home, /System Unix roots.
- Resolve symlinks via filepath.EvalSymlinks (fallback to Clean when
  the target does not exist) and compare both forms against the map
  so /etc still matches on macOS where it symlinks to /private/etc.
- Test the symlink-to-home case and scope the hardcoded-root test to
  paths that actually exist on the host OS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aeneasr aeneasr merged commit fc9eea2 into main May 13, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant