Skip to content

fix(embed): handle absolute file paths from native engine#783

Merged
carlos-alm merged 3 commits intomainfrom
fix/760-embed-absolute-paths
Apr 3, 2026
Merged

fix(embed): handle absolute file paths from native engine#783
carlos-alm merged 3 commits intomainfrom
fix/760-embed-absolute-paths

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Fix buildEmbeddings doubling file paths when the native engine stores absolute paths in the DB (e.g., /tmp/foo/bar.js became /tmp/foo/tmp/foo/bar.js)
  • Add path.isAbsolute() guard before joining rootDir with the DB file path
  • Add regression test that inserts nodes with absolute file paths and verifies embeddings are produced

Root Cause

buildEmbeddings in generator.ts unconditionally called path.join(rootDir, file) on line 103. When the native Rust engine stored absolute file paths in the DB nodes.file column, this doubled the path, causing ENOENT for every file read — silently producing 0 embeddings while still writing metadata.

CI evidence from the failing run showed:

[codegraph WARN] Cannot read /tmp/codegraph-embed-regression-MIdrzl/utils.js for embeddings: 
  ENOENT: open '/tmp/codegraph-embed-regression-MIdrzl/tmp/codegraph-embed-regression-MIdrzl/utils.js'

Closes #760

Test plan

  • embedding-strategy.test.ts — new test absolute file paths in DB (#760) passes
  • embedding-regression.test.ts — all 8 existing tests still pass
  • Biome lint clean on changed files

The native engine stores absolute file paths in the DB. buildEmbeddings
unconditionally joined rootDir + file, doubling the path and causing
ENOENT for every symbol — producing 0 embeddings silently.

Check path.isAbsolute() before joining, and add a regression test that
inserts nodes with absolute paths to prevent recurrence.

Closes #760
@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @carlos-alm's task —— View job


Code Review Complete ⭐⭐⭐⭐ (4/5)

Analyzing PR #783: fix(embed): handle absolute file paths from native engine

Review Checklist

  • Read and analyze changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Phase 5: Backlog compliance rating
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: The buildEmbeddings function in generator.ts:103 was unconditionally calling path.join(rootDir, file) on file paths stored in the database. When the native Rust engine stores absolute paths (e.g., /tmp/foo/bar.js), this resulted in path doubling (/tmp/foo/tmp/foo/bar.js), causing ENOENT errors during file reads and silent embedding generation failures (0 embeddings produced while metadata was still written).

Why This Approach: The fix adds a simple path.isAbsolute(file) guard before path joining. This is the correct approach because:

  1. Surgical: Only changes the path construction logic, not the broader architecture
  2. Platform-agnostic: path.isAbsolute() correctly handles both Unix and Windows absolute paths
  3. Preserves existing behavior: Relative paths continue to work exactly as before
  4. Addresses root cause: Handles the fundamental issue of engine inconsistency rather than masking symptoms

Risk Assessment: Very low risk. This is a defensive fix that makes the code more robust to different path storage formats from the dual-engine architecture.

Configuration Change Detection ✓

HIGH-RISK changes: None detected.

  • No biome/linting rule changes
  • No test timeout or coverage threshold modifications
  • No build configuration changes
  • No TypeScript strict mode flag changes

Test Integrity ✓

Test quality: Excellent regression test coverage.

  • New test absolute file paths in DB (#760): Directly reproduces the issue by inserting nodes with absolute paths and verifying embedding generation succeeds (lines 292-328)
  • Test isolation: Uses separate temporary directory with proper cleanup
  • Comprehensive assertions: Verifies both embedding generation (EMBEDDED_TEXTS.length === 1) and database storage (count === 1)
  • Realistic scenario: Simulates actual native engine behavior
  • No weakened assertions or reduced test strictness
  • All existing tests continue to pass

Code Quality Assessment

Strengths:

  1. Minimal surface area: Single-line change with maximum impact
  2. Correct API usage: path.isAbsolute() is the standard way to detect absolute paths
  3. Preserves existing semantics: No behavior change for relative paths
  4. Self-documenting: The intent is clear from the conditional logic
  5. Robust test coverage: Direct regression test prevents future regressions

Technical Analysis:

  • Line 103: The conditional path.isAbsolute(file) ? file : path.join(rootDir, file) correctly handles both path types
  • No edge cases missed: path.isAbsolute() handles all platform-specific absolute path formats
  • Error handling preserved: The existing try/catch for file reading remains intact
  • Performance impact: Negligible - path.isAbsolute() is a simple string check

Potential Concerns: None identified

  • No security implications
  • No performance degradation
  • No breaking changes to the API
  • Error messages remain clear and actionable

Backlog Compliance

  • Zero-dep: ✓ — Uses Node.js built-in path.isAbsolute(), no new dependencies
  • Foundation-aligned: ✓ — Maintains fast local computation (P1), keeps core functionality zero-cost (P4), no architectural changes
  • Problem-fit: 4 — Directly prevents embedding generation failures that would cause agents to miss semantic search results, reducing their ability to navigate large codebases effectively
  • Breaking: No — Purely additive fix, no changes to API surface or behavior for existing workflows
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None. This is a well-executed bug fix.

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning:
    • Addresses a real bug with a correct, minimal fix
    • Excellent regression test prevents future regressions
    • No architectural concerns or side effects
    • Aligns perfectly with project principles
    • Clean, professional implementation

Minor enhancement suggestion (not blocking): The warning message on line 108 could be slightly more specific by mentioning whether the path was treated as absolute or relative, but this is a very minor improvement and not necessary for approval.

This PR exemplifies quality software engineering: identify the root cause, apply the minimal correct fix, and add comprehensive test coverage to prevent regression.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

Adds a one-line path.isAbsolute() guard in buildEmbeddings so that absolute file paths stored by the native Rust engine are used directly instead of being re-joined with rootDir, fixing the silent zero-embedding case caused by the doubled path (/tmp/foo/tmp/foo/bar.js). The regression test in embedding-strategy.test.ts inserts a node with an absolute path and asserts that exactly one embedding is produced, directly reproducing the failure mode described in the PR.

Confidence Score: 5/5

Safe to merge — minimal, targeted fix with a direct regression test and no observable side effects.

The change is a single conditional expression that preserves all prior behaviour for relative paths while correctly handling absolute paths. The regression test reproduces the exact failure mode and confirms the fix. No P0/P1 issues found.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/search/generator.ts One-line guard added: path.isAbsolute(file) ? file : path.join(rootDir, file) prevents double-joining when the native engine stores absolute paths. Change is minimal and correct.
tests/search/embedding-strategy.test.ts New absolute file paths in DB (#760) describe block reproduces the bug (inserts a node with a full absolute path, calls buildEmbeddings, asserts 1 embedding in both EMBEDDED_TEXTS and the DB). Cleanup and state-reset patterns are consistent with the rest of the file.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant G as buildEmbeddings()
    participant DB as SQLite (nodes)
    participant FS as Filesystem

    C->>G: buildEmbeddings(rootDir, model, dbPath)
    G->>DB: SELECT * FROM nodes WHERE kind IN (...)
    DB-->>G: [{file: "/abs/path/math.js", ...}]  (native engine: absolute)
    Note over G: path.isAbsolute(file)?
    alt file is absolute
        G->>FS: readFileSync("/abs/path/math.js")  ✅
    else file is relative
        G->>FS: readFileSync(path.join(rootDir, file))  ✅
    end
    FS-->>G: source lines
    G->>G: build embedding text
    G->>DB: INSERT INTO embeddings ...
Loading

Reviews (1): Last reviewed commit: "style: fix biome formatting in test" | Re-trigger Greptile

@carlos-alm carlos-alm merged commit aad7874 into main Apr 3, 2026
12 of 13 checks passed
@carlos-alm carlos-alm deleted the fix/760-embed-absolute-paths branch April 3, 2026 04:22
@github-actions github-actions bot locked and limited conversation to collaborators Apr 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(test): embedding-regression test fails — buildEmbeddings produces 0 embeddings

1 participant