Found during dogfooding v3.9.4
Severity: High
Command: codegraph build --no-incremental
Running a full rebuild after codegraph embed silently drops every row from the embeddings table. There is no warning, no prompt, no opt-out. Users who have spent minutes generating embeddings (especially with the larger Jina models) lose all of them on the next --no-incremental build.
Reproduction
mkdir embed-test && cd embed-test
cat > a.js <<'EOF'
export function alpha() { return 1; }
export function beta() { return alpha(); }
EOF
npx codegraph build .
npx codegraph embed . -m minilm
# Stored 2 embeddings (384d, ...) in graph.db
node -e "const db = require('better-sqlite3')('.codegraph/graph.db'); \
console.log('before:', db.prepare('SELECT COUNT(*) c FROM embeddings').get().c);"
# before: 2
npx codegraph build . --no-incremental
node -e "const db = require('better-sqlite3')('.codegraph/graph.db'); \
console.log('after:', db.prepare('SELECT COUNT(*) c FROM embeddings').get().c);"
# after: 0
Expected behavior
One of:
- Preserve embeddings whose
node_id still maps to a live node in the rebuilt graph (ideal).
- Warn the user before wiping (
[codegraph WARN] --no-incremental will discard N embeddings; re-run \codegraph embed` after the build.`).
- Require an explicit
--wipe-embeddings flag to opt into destruction.
Actual behavior
Embeddings table is emptied silently. Subsequent codegraph search returns zero results with no hint about why.
Suggested fix
The simplest safe change is option 2: warn before wiping. For option 1, keep the embeddings table intact, and after the full rebuild either (a) leave embeddings keyed to their old node_ids that no longer exist (and let a downstream validator prune), or (b) re-key embeddings by symbol signature (name + file + kind) so they survive re-identification.
Related
Full rebuild also invalidates any external consumers that cached node_ids from the old DB. A migration note in the release notes would help, but the silent data loss is the bigger issue.
Found during dogfooding v3.9.4
Severity: High
Command:
codegraph build --no-incrementalRunning a full rebuild after
codegraph embedsilently drops every row from theembeddingstable. There is no warning, no prompt, no opt-out. Users who have spent minutes generating embeddings (especially with the larger Jina models) lose all of them on the next--no-incrementalbuild.Reproduction
Expected behavior
One of:
node_idstill maps to a live node in the rebuilt graph (ideal).[codegraph WARN] --no-incremental will discard N embeddings; re-run \codegraph embed` after the build.`).--wipe-embeddingsflag to opt into destruction.Actual behavior
Embeddings table is emptied silently. Subsequent
codegraph searchreturns zero results with no hint about why.Suggested fix
The simplest safe change is option 2: warn before wiping. For option 1, keep the embeddings table intact, and after the full rebuild either (a) leave embeddings keyed to their old
node_ids that no longer exist (and let a downstream validator prune), or (b) re-key embeddings bysymbol signature(name + file + kind) so they survive re-identification.Related
Full rebuild also invalidates any external consumers that cached
node_ids from the old DB. A migration note in the release notes would help, but the silent data loss is the bigger issue.