Skip to content

fix(build): warn before --no-incremental wipes embeddings#986

Merged
carlos-alm merged 3 commits intomainfrom
fix/warn-embeddings-wipe-on-full-rebuild
Apr 21, 2026
Merged

fix(build): warn before --no-incremental wipes embeddings#986
carlos-alm merged 3 commits intomainfrom
fix/warn-embeddings-wipe-on-full-rebuild

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

codegraph build --no-incremental (and any full rebuild promoted by engine/schema/version mismatch) silently drops every row in the embeddings table. Users who spent minutes on codegraph embed lose their embeddings with no warning; the next codegraph search returns zero hits with no clue why.

Add a warnOnEmbeddingsWipe step at the pipeline entry — just after the incremental/forceFullRebuild decision is made. When the build is full AND the embeddings table is non-empty, it logs:

[codegraph WARN] Full rebuild will discard N embeddings; re-run `codegraph embed` after the build.

Because the warning sits at the shared pipeline entry point, it fires uniformly regardless of which path (native orchestrator or JS fallback) actually performs the delete.

Found during

Dogfooding v3.9.4 — see #982

Test plan

  • Full rebuild with embeddings present → warning fires with correct count
  • Full rebuild with empty/missing embeddings table → no warning
  • Incremental build (even if promoted to full by mismatch) → warning still fires only when embeddings exist
  • npm run typecheck clean
  • npm run lint clean

A full rebuild (either via --no-incremental or when promoted by an engine/
schema/version mismatch) silently drops every row from the embeddings
table. Users who just spent minutes generating embeddings lose them with
no warning, and subsequent \`codegraph search\` returns zero hits with no
indication why.

Add a warnOnEmbeddingsWipe step to setupPipeline that runs after the
incremental/forceFullRebuild decision is made. When the build will be
full AND the embeddings table is non-empty, log:

  Full rebuild will discard N embeddings; re-run \`codegraph embed\`
  after the build.

The warning sits at the pipeline entry, so it fires uniformly for the
native orchestrator and the JS fallback path. It is silent when the
embeddings table is missing, empty, or when the build is incremental.

Closes #982
@claude
Copy link
Copy Markdown

claude Bot commented Apr 21, 2026

Claude finished @carlos-alm's task —— View job


Root Cause Analysis

Problem Being Solved: Users lose embeddings silently during full rebuilds (codegraph build --no-incremental or automatic full rebuilds triggered by engine/schema/version mismatches). After spending time running codegraph embed, users get zero search results with no indication that embeddings were wiped.

Why This Approach: Adding warning at the pipeline entry point (warnOnEmbeddingsWipe) is well-positioned - it fires after the incremental/forceFullRebuild decision but before the actual deletion. The placement at line 169 in setupPipeline() ensures it catches both explicit --no-incremental and promoted full rebuilds.

Risk Assessment: Very low risk. This is a purely additive warning with proper error handling and no behavioral changes.

Backlog Compliance

  • Zero-dep: ✓ - No new dependencies added
  • Foundation-aligned: ✓ - Improves user experience without violating any principles
  • Problem-fit: 3 - Nice UX improvement but doesn't directly address AI agent navigation problems
  • Breaking: No - Purely additive warning message
  • Tier: 1

Code Quality Assessment

Strengths

  1. Correct placement: warnOnEmbeddingsWipe() at pipeline.ts:169 executes after the incremental decision (line 168: checkEngineSchemaMismatch) but before any actual deletion
  2. Proper error handling: Try-catch around COUNT(*) query handles missing embeddings table gracefully (lines 113-117)
  3. Accurate conditions: Checks both !ctx.incremental and ctx.forceFullRebuild to catch all full rebuild scenarios
  4. Good message quality: Clear warning includes count and remediation (re-run codegraph embed after the build)
  5. Engine parity: Sits at the unified entry point, so both native and JS pipelines get the same warning

Technical Details

  • Function implementation (lines 109-122): Clean, focused function with single responsibility
  • SQL query: Simple COUNT(*) query is efficient for this use case
  • Integration point: Perfect placement in setupPipeline() after mismatch detection but before actual work

No Issues Found

  • No performance concerns (single COUNT query is negligible)
  • No security issues
  • No potential for false positives/negatives
  • Error handling covers all edge cases appropriately

Test Coverage Verification

The PR description claims comprehensive test coverage:

  • ✓ Full rebuild with embeddings → warning fires with correct count
  • ✓ Full rebuild with empty/missing embeddings table → no warning
  • ✓ Incremental build → warning only fires when embeddings exist
  • ✓ Clean lint and typecheck

This covers all the logical branches and edge cases properly.

Comparison with Similar Patterns

Looking at the codebase, this warning pattern is consistent with existing UX improvements:

  • Similar to the workspace detection logging at line 176: info(\Detected ${workspaces.size} workspace packages`)`
  • Follows the established logger pattern using warn() from infrastructure/logger.js

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-implemented, properly placed user experience improvement that solves a real pain point. The code is clean, safe, and follows established patterns. Only reason it's not 5 stars is that it's a relatively minor UX improvement rather than a fundamental enhancement.

This PR effectively addresses the silent data loss issue that was identified during dogfooding. The implementation is technically sound and the placement ensures consistent behavior across both execution paths.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

Adds a warnOnEmbeddingsWipe guard that queries the embeddings row count before any full rebuild (explicit --no-incremental or mismatch-promoted) and logs a human-readable warning so users know to re-run codegraph embed. The two-site placement — once in setupPipeline and once in the native-orchestrator fallback catch block — correctly covers every promotion path, and the existing !ctx.forceFullRebuild guard on the fallback path prevents the warning from firing twice.

Confidence Score: 5/5

Safe to merge — the change is additive (read-only DB query + log), all promotion paths are covered, and the only finding is a P2 style suggestion about the catch scope.

The sole finding is a P2 style/hardening suggestion (broad catch). No logic errors or data-integrity risks were found; the warning logic is correct across all identified forceFullRebuild promotion paths.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/graph/builder/pipeline.ts Adds warnOnEmbeddingsWipe called at pipeline setup and again after the native-orchestrator late-promotion path; logic correctly covers --no-incremental and mismatch-promoted builds. Minor: broad catch in the new function silences non-table-missing DB errors.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[buildGraph called] --> B[setupPipeline]
    B --> C[checkEngineSchemaMismatch\nsets forceFullRebuild]
    C --> D[warnOnEmbeddingsWipe\n!incremental OR forceFullRebuild?]
    D -->|No| E[loadAliases / continue]
    D -->|Yes| F[SELECT COUNT from embeddings]
    F -->|count > 0| G["WARN: Full rebuild will discard N embeddings"]
    F -->|count == 0 or table missing| E
    G --> E
    E --> H{tryNativeOrchestrator}
    H -->|success| I[Done]
    H -->|throws| J{incremental &&\n!forceFullRebuild?}
    J -->|No| K[runPipelineStages]
    J -->|Yes| L[Late version check\nforceFullRebuild = true]
    L --> M[warnOnEmbeddingsWipe again]
    M --> K
    K --> I
Loading

Fix All in Claude Code

Reviews (2): Last reviewed commit: "fix(build): re-check embeddings on nativ..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Codegraph Impact Analysis

3 functions changed8 callers affected across 6 files

  • warnOnEmbeddingsWipe in src/domain/graph/builder/pipeline.ts:109 (7 transitive callers)
  • setupPipeline in src/domain/graph/builder/pipeline.ts:144 (6 transitive callers)
  • buildGraph in src/domain/graph/builder/pipeline.ts:946 (6 transitive callers)

@carlos-alm carlos-alm mentioned this pull request Apr 21, 2026
2 tasks
…ch (#986)

The catch block that handles native-orchestrator throws performs a late
promotion to a full rebuild when the codegraph_version mismatch is
detected. warnOnEmbeddingsWipe already ran in setupPipeline before
forceFullRebuild was set here, so this path previously wiped embeddings
silently — exactly the scenario the initial guard was meant to close.

Call warnOnEmbeddingsWipe again after setting ctx.forceFullRebuild = true
so the late-promotion path surfaces the same warning as the other full
rebuild paths.

Impact: 1 functions changed, 6 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's P1 feedback on the late-promotion gap:

  • Added a warnOnEmbeddingsWipe(ctx) call inside the catch block at src/domain/graph/builder/pipeline.ts:972-980, right after setting ctx.forceFullRebuild = true on a JS CODEGRAPH_VERSION mismatch.
  • Now the silent-data-loss guard also fires for the native-orchestrator-throws \u2192 version-mismatch path, not just the initial setupPipeline evaluation.
  • Comment in the code cross-references this scenario so future edits do not regress.

Pushed in b4c907b.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit e3edd63 into main Apr 21, 2026
14 checks passed
@carlos-alm carlos-alm deleted the fix/warn-embeddings-wipe-on-full-rebuild branch April 21, 2026 03:44
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant