Skip to content

feat(native): port full-build structure computation to Rust#937

Merged
carlos-alm merged 4 commits intomainfrom
feat/native-full-build-structure
Apr 16, 2026
Merged

feat(native): port full-build structure computation to Rust#937
carlos-alm merged 4 commits intomainfrom
feat/native-full-build-structure

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Port full buildStructure logic to Rust (structure::build_full_structure), eliminating the JS fallback that required a DB round-trip through reconstructFileSymbolsFromDb() to reload all nodes
  • Handles all three build modes in Rust: small-incremental fast path (existing), large incremental, and full build
  • Computes directory nodes, contains edges, file metrics (line count, symbol count, imports, exports, fan-in/out), and directory metrics (file count, symbol count, fan-in/out, cohesion)
  • Sets structure_handled=true unconditionally so the JS runPostNativeStructure fallback is never triggered

Closes #935

Test plan

  • Rust crate compiles (cargo check — 0 new warnings)
  • JS structure unit tests pass (7/7)
  • JS graph/builders structure tests pass (2/2)
  • Full integration test suite passes (564/564)
  • Verify with codegraph build --engine native --timing on a real codebase — structure phase should show no JS fallback in debug logs
  • Compare codegraph structure --depth 2 output between native and WASM engines for parity

The native Rust orchestrator only handled the small-incremental fast
path for structure computation. Full builds and large incremental
builds fell back to JS, requiring a DB round-trip through
reconstructFileSymbolsFromDb() to reload all nodes.

Add build_full_structure() to the Rust structure module that handles:
- Cleanup of previous structure data (scoped for incremental)
- Directory node insertion from discovered dirs + file paths
- Contains edges (dir→file, parent→child)
- File metrics (line count, symbol count, imports, exports, fan-in/out)
- Directory metrics (file count, symbol count, fan-in/out, cohesion)

Set structure_handled=true unconditionally so the JS fallback path
is never triggered for structure computation.

Closes #935
@claude
Copy link
Copy Markdown

claude bot commented Apr 16, 2026

Claude finished @carlos-alm's task —— View job


Pull Request Review

Reviewing PR #937: feat(native): port full-build structure computation to Rust

Todo List

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity - verify no test weakening
  • Phase 4: Code quality review - analyze implementation details
  • Phase 5: Backlog compliance assessment - rate against project criteria
  • Final: Provide comprehensive review recommendation

Root Cause Analysis

Problem Being Solved: The current structure computation phase in the build pipeline required a JS fallback with DB round-trip through reconstructFileSymbolsFromDb() when the native engine was handling the build. This creates a performance bottleneck by unnecessarily crossing the JS/Rust boundary and reloading all nodes from the database.

Why This Approach: Porting the full structure logic to Rust eliminates the round-trip completely, handles all build modes (small-incremental fast path, large incremental, and full build) natively, and sets structure_handled=true unconditionally to prevent JS fallback.

Risk Assessment: This is a performance optimization that consolidates logic into the native engine. The approach aligns with the project's dual-engine architecture where native should be a complete replacement for WASM with identical functionality.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies, builds on existing Rust infrastructure
  • Foundation-aligned: ✓ — Improves P1 (always-current), P2 (native speed), maintains zero-cost core
  • Problem-fit: 4/5 — Improves build performance enabling faster agent loops
  • Breaking: No — Internal implementation changes only
  • Tier: 1 — Zero-dep + Foundation-aligned

Critical Concerns (if any)

None identified. This is a solid performance optimization with proper implementation.

Technical Assessment

Implementation Quality:

  • build_pipeline.rs: Clean delegation to structure module, unconditional structure_handled: true
  • structure.rs: 536 lines of well-structured Rust with proper error handling, parameterized queries, batch operations, and comprehensive unit tests

Security: All database operations use parameterized statements, no injection risks, proper path validation

Performance: Eliminates JS/Rust boundary crossing, uses efficient batch queries, minimal allocations

Testing: Maintains existing test coverage, no weakened assertions, comprehensive edge cases

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality performance optimization that strengthens the native engine capabilities. Eliminates a known bottleneck while maintaining code quality standards. The implementation is thorough, well-tested, and aligns perfectly with the project's dual-engine architecture goals. Minor deduction only because this is infrastructure improvement rather than direct user-facing value, but execution is excellent.

Recommendation: Approve and merge. This directly supports the core principle of maintaining an always-current graph by making rebuilds faster and more efficient.


@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

Codegraph Impact Analysis

16 functions changed15 callers affected across 6 files

  • run_pipeline in crates/codegraph-core/src/build_pipeline.rs:102 (0 transitive callers)
  • normalize_path in crates/codegraph-core/src/structure.rs:147 (9 transitive callers)
  • parent_dir in crates/codegraph-core/src/structure.rs:153 (8 transitive callers)
  • collect_all_directories in crates/codegraph-core/src/structure.rs:168 (1 transitive callers)
  • get_ancestor_dirs in crates/codegraph-core/src/structure.rs:209 (3 transitive callers)
  • get_node_id in crates/codegraph-core/src/structure.rs:230 (5 transitive callers)
  • build_full_structure in crates/codegraph-core/src/structure.rs:251 (0 transitive callers)
  • cleanup_previous_data in crates/codegraph-core/src/structure.rs:297 (1 transitive callers)
  • insert_directory_nodes in crates/codegraph-core/src/structure.rs:341 (1 transitive callers)
  • load_child_dirs_in_affected in crates/codegraph-core/src/structure.rs:363 (2 transitive callers)
  • load_file_paths_in_dirs in crates/codegraph-core/src/structure.rs:384 (2 transitive callers)
  • insert_contains_edges in crates/codegraph-core/src/structure.rs:404 (1 transitive callers)
  • compute_import_edge_maps in crates/codegraph-core/src/structure.rs:520 (1 transitive callers)
  • compute_file_metrics in crates/codegraph-core/src/structure.rs:564 (1 transitive callers)
  • compute_directory_metrics in crates/codegraph-core/src/structure.rs:630 (1 transitive callers)
  • tryNativeOrchestrator in src/domain/graph/builder/pipeline.ts:583 (5 transitive callers)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 16, 2026

Greptile Summary

This PR ports buildStructure from JS to Rust (structure::build_full_structure), covering all three build modes (full, large-incremental, small-incremental fast path) and setting structure_handled=true unconditionally to retire the JS runPostNativeStructure fallback. Three correctness bugs (missing contains edges for unchanged sibling files, dir→dir edge loss for unchanged subdirectories, stale directory metrics) were found in earlier review rounds and addressed in follow-up commits within this PR. The implementation is functionally correct post-fixes; remaining comments are performance and code-clarity observations.

Confidence Score: 5/5

Safe to merge — all three correctness regressions flagged in prior review rounds are resolved; remaining findings are P2 performance and style.

The three previously-flagged data-integrity bugs (missing dir→file edges for unchanged files, stale directory metrics, and lost dir→dir edges for unchanged sibling subdirs) were addressed within this PR via load_file_paths_in_dirs and load_child_dirs_in_affected. The only open comments are a stale inline comment and two O(N) performance patterns (full table scans, re-prepared statements per node) that are correctness-neutral. All 564 integration tests pass.

crates/codegraph-core/src/structure.rs — performance of incremental helpers worth improving before this becomes a hot path on very large repos.

Important Files Changed

Filename Overview
crates/codegraph-core/src/structure.rs New file: full Rust structure computation. All three previously-flagged correctness gaps are addressed; two helper functions (load_file_paths_in_dirs, load_child_dirs_in_affected) do full table scans and get_node_id is re-prepared O(N) times inside insert_contains_edges.
crates/codegraph-core/src/build_pipeline.rs Updated to call build_full_structure for both the full-build and large-incremental paths; structure_handled is set unconditionally to true. Logic looks correct.
src/domain/graph/builder/pipeline.ts JS pipeline updated: needsStructure is now always false (structureHandled always true). Stale comment at line 665 says the JS fallback is still needed for full/large-incremental builds, which is no longer true.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_pipeline] --> B{is_full_build?}
    B -- Yes --> C[clear_all_graph_data]
    B -- No --> D{parse_changes <= 5\nAND existing_files > threshold?}
    C --> E[parse + insert + resolve + edges]
    D -- Yes fast path --> F[update_changed_file_metrics]
    D -- No full path --> G[build_full_structure\nchanged_files=Some]
    E --> H[build_full_structure\nchanged_files=None]
    F --> I[roles + analysis + finalize]
    G --> I
    H --> I
    I --> J[structure_handled=true\nalways]

    subgraph build_full_structure
        K[cleanup_previous_data] --> L[insert_directory_nodes]
        L --> M[insert_contains_edges\nuses load_file_paths_in_dirs\n+ load_child_dirs_in_affected]
        M --> N[compute_import_edge_maps]
        N --> O[compute_file_metrics]
        O --> P[compute_directory_metrics\nuses all_db_files full scan]
    end

    G --> build_full_structure
    H --> build_full_structure
Loading

Fix All in Claude Code

Reviews (3): Last reviewed commit: "fix(structure): restore dir->dir contain..." | Re-trigger Greptile

Comment on lines +297 to +339
fn cleanup_previous_data(
conn: &Connection,
is_incremental: bool,
changed_files: Option<&[String]>,
_all_dirs: &HashSet<String>,
) {
if is_incremental {
let affected_dirs = get_ancestor_dirs(changed_files.unwrap_or(&[]));
let tx = match conn.unchecked_transaction() {
Ok(tx) => tx,
Err(_) => return,
};
// Delete contains edges from affected directories
for dir in &affected_dirs {
let _ = tx.execute(
"DELETE FROM edges WHERE kind = 'contains' AND source_id IN \
(SELECT id FROM nodes WHERE name = ? AND kind = 'directory')",
[dir],
);
}
// Delete metrics for changed files
for f in changed_files.unwrap_or(&[]) {
if let Some(file_id) = get_node_id(&tx, f, "file", f, 0) {
let _ = tx.execute("DELETE FROM node_metrics WHERE node_id = ?", [file_id]);
}
}
// Delete metrics for affected directories
for dir in &affected_dirs {
if let Some(dir_id) = get_node_id(&tx, dir, "directory", dir, 0) {
let _ = tx.execute("DELETE FROM node_metrics WHERE node_id = ?", [dir_id]);
}
}
let _ = tx.commit();
} else {
// Full build: clear all structure data
let _ = conn.execute_batch(
"DELETE FROM edges WHERE kind = 'contains' \
AND source_id IN (SELECT id FROM nodes WHERE kind = 'directory'); \
DELETE FROM node_metrics; \
DELETE FROM nodes WHERE kind = 'directory';",
);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incremental path drops contains edges for unchanged files in affected directories

In the large-incremental path (is_incremental = true, >5 changed files), cleanup_previous_data deletes all contains edges from every ancestor directory of the changed files. But insert_contains_edges only re-inserts edges for files in file_symbols — which holds only the parsed (changed + reverse-dep) files. Unchanged files that share a directory with any changed file permanently lose their dir → file containment edges until the next full rebuild.

Concrete failure: Directory src/ contains a.ts (unchanged) and b.ts (changed). Cleanup deletes both edges; re-insert adds back only src → b.ts. The src → a.ts edge is permanently gone.

The JS fallback path avoided this by loading ALL files from the DB first (reconstructFileSymbolsFromDb), giving buildStructure the full file set. The Rust path needs to do the same. The same root cause makes compute_directory_metrics undercount file_count, symbol_count, and cohesion for affected directories.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4555d12. Added load_file_paths_in_dirs() helper that queries all existing file nodes from the DB for affected directories. insert_contains_edges now iterates over both file_symbols keys (changed files) and DB-loaded paths (unchanged files in affected directories), ensuring all dir->file containment edges are re-inserted after cleanup.

Also fixed compute_directory_metrics to use DB-loaded file paths for the dir->files map (and DB-based symbol counts), so file_count, symbol_count, and cohesion are computed over the complete file set.

Comment on lines 374 to 391
);
} else {
// Full structure: directory nodes, contains edges, file + directory metrics.
let changed_for_structure: Option<Vec<String>> = if change_result.is_full_build {
None
} else {
Some(changed_files.clone())
};
structure::build_full_structure(
conn,
&file_symbols,
&collect_result.directories,
root_dir,
&line_count_map,
changed_for_structure.as_deref(),
);
}
// For full/larger builds, the JS fallback handles full structure via
// `features/structure.ts`. The Rust orchestrator handles the fast path
// for small incremental builds. Full structure computation will be
// ported in a follow-up.
timing.structure_ms = t0.elapsed().as_secs_f64() * 1000.0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 file_symbols is incomplete when build_full_structure runs in large-incremental mode

file_symbols at this point contains only the files parsed this cycle (changed + reverse-deps), not all files in the codebase. When build_full_structure is called with changed_for_structure = Some(...), it cleans up and re-inserts contains edges using only that partial set, orphaning unchanged files in the same directories.

Fix: before calling build_full_structure in the non-fast incremental path, load all existing file paths for the affected directories from the DB and add them to the symbol set passed to the function (only path data needed, not full symbol data). The JS fallback did this via reconstructFileSymbolsFromDb.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the same commit (4555d12). Rather than augmenting file_symbols in build_pipeline.rs (which would require constructing FileSymbols stubs), the fix is in structure.rs itself: insert_contains_edges and compute_directory_metrics now load all existing file paths from the DB for affected directories, so they work correctly even when file_symbols only contains the parsed subset. This matches what the JS fallback did via reconstructFileSymbolsFromDb but is more targeted — only file paths are loaded, not full symbol data.

The incremental path of build_full_structure deleted all contains edges
from affected directories but only re-inserted edges for files in the
partial file_symbols set. Unchanged files in those directories
permanently lost their dir->file containment edges until the next full
rebuild. Directory metrics (file_count, symbol_count, cohesion) were
also undercounted for the same reason.

Fix: load existing file nodes from DB for affected directories before
re-inserting contains edges and computing directory metrics, ensuring
unchanged files retain their structural relationships.

Impact: 3 functions changed, 2 affected
Since structure_handled is now unconditionally true, the JS structure
fallback never runs and structure_scope is never read. Remove the field
from BuildPipelineResult and the corresponding JS type/usage.

Impact: 2 functions changed, 5 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed all 4 Greptile review comments:

P1 — Incremental path drops contains edges (structure.rs:297-339): Fixed in 4555d12. Added load_file_paths_in_dirs() to query existing file nodes from DB for affected directories. insert_contains_edges now re-inserts edges for all files (changed + unchanged) in affected directories.

P1 — file_symbols incomplete in large-incremental mode (build_pipeline.rs:374-391): Fixed in the same commit. The fix lives in structure.rs — it loads DB file paths internally rather than requiring a complete file_symbols map from the caller.

P2 — import_count semantics differ between fast path and full path (structure.rs:46-132): This inconsistency pre-exists in the JS codebase. Created follow-up issue #941 to reconcile the two paths.

P2 — structure_scope is dead code (build_pipeline.rs:79-87): Fixed in 965c84c. Removed the structure_scope field from BuildPipelineResult and the corresponding JS type/usage, since structure_handled is now always true.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +447 to +470
// Parent directory → child directory edges
for dir in all_dirs {
let parent = match parent_dir(dir) {
Some(p) => p,
None => continue,
};
if parent == *dir {
continue;
}
if let Some(ref ad) = affected_dirs {
if !ad.contains(&parent) {
continue;
}
}
let parent_id = match get_node_id(&tx, &parent, "directory", &parent, 0) {
Some(id) => id,
None => continue,
};
let child_id = match get_node_id(&tx, dir, "directory", dir, 0) {
Some(id) => id,
None => continue,
};
let _ = stmt.execute(rusqlite::params![parent_id, child_id]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dir→dir containment edges lost for unchanged sibling subdirectories

The dir→file edge fix (commit 4555d12) added load_file_paths_in_dirs to restore edges for unchanged files. The same gap exists for dir→dir edges. all_dirs contains only ancestors of changed files. When cleanup deletes all contains edges from an affected parent (e.g., src/), the edge src → src/utils is deleted, but since src/utils is not in all_dirs (no changed file lives under it), the edge is never re-inserted. After a large-incremental build, src/utils becomes a floating directory — it retains its own children but is no longer reachable from its parent — causing codegraph structure to produce wrong output until the next full rebuild.

A symmetric fix is needed: query the DB for all existing directory-kind nodes whose parent is in affected_dirs, then re-insert any parent → child edges for child dirs not already covered by all_dirs.

// After the existing all_dirs loop, restore dir→dir edges for unchanged
// subdirectories that were cleaned up but aren't in all_dirs.
if let Some(ref ad) = affected_dirs {
    let db_child_dirs = load_child_dirs_in_affected(&tx, ad);
    for child_dir in &db_child_dirs {
        if all_dirs.contains(child_dir.as_str()) {
            continue; // already handled above
        }
        let parent = match parent_dir(child_dir) {
            Some(p) => p,
            None => continue,
        };
        if !ad.contains(&parent) {
            continue;
        }
        if let (Some(p_id), Some(c_id)) = (
            get_node_id(&tx, &parent, "directory", &parent, 0),
            get_node_id(&tx, child_dir, "directory", child_dir, 0),
        ) {
            let _ = stmt.execute(rusqlite::params![p_id, c_id]);
        }
    }
}

Where load_child_dirs_in_affected mirrors load_file_paths_in_dirs but selects kind = 'directory'.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1f9cb86. Added load_child_dirs_in_affected() that queries all existing directory nodes from the DB whose parent is in the affected set. After the main dir->dir loop in insert_contains_edges, we now iterate over these DB-loaded child dirs and re-insert parent->child edges for any that aren't already in all_dirs. This mirrors the existing load_file_paths_in_dirs fix for dir->file edges.

…ing subdirs (#937)

During large-incremental builds, cleanup_previous_data deletes all
contains edges from affected ancestor directories. The dir->dir edge
loop only iterated over all_dirs (ancestors of changed files), so
unchanged sibling subdirectories lost their parent->child edges
permanently until the next full rebuild.

Add load_child_dirs_in_affected() to query existing directory nodes
from the DB whose parent is in the affected set, and re-insert their
containment edges after the main dir->dir loop completes.

Impact: 2 functions changed, 2 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 16, 2026

Claude finished @carlos-alm's task —— View job


Final Assessment of PR #937

Status: All Critical Issues Resolved ✅

Todo List

  • Review latest commits and verify fixes for Greptile's P1 issues
  • Analyze remaining P1 issue about dir→dir containment edges
  • Check if final commit (1f9cb86) addresses all concerns
  • Provide final assessment and recommendations

Comprehensive Review

All Greptile P1 Issues Have Been Fixed:

  1. ✅ Dir→file containment edges (4555d12): Added load_file_paths_in_dirs() that queries the DB for all existing files in affected directories, ensuring unchanged files retain their parent→file edges after incremental cleanup.

  2. ✅ Incomplete file_symbols in incremental mode (4555d12): The fix is embedded in the structure computation itself - insert_contains_edges() now loads complete file paths from DB internally, eliminating the need to pass a complete file_symbols map from the caller.

  3. ✅ Dir→dir containment edges for sibling subdirs (1f9cb86): Added load_child_dirs_in_affected() and the restoration logic in lines 493-515. This perfectly mirrors the file fix but for directory nodes, ensuring unchanged subdirectories maintain their parent→child edges.

  4. ✅ Dead structure_scope field (965c84c): Removed the unused structure_scope field from BuildPipelineResult and corresponding JS usage.

Code Quality Assessment

Structure.rs Implementation (640 lines):

  • ✅ Comprehensive error handling with graceful fallbacks
  • ✅ Efficient batch operations and parameterized queries
  • ✅ Proper transaction management
  • ✅ Well-documented functions with clear separation of concerns
  • ✅ Symmetric fixes for both dir→file and dir→dir edge restoration

Build Pipeline Integration:

  • ✅ Clean delegation pattern with structure_handled: true
  • ✅ Preserves WASM fallback path for compatibility
  • ✅ Handles all three build modes correctly

Performance Impact

This eliminates the expensive JS/Rust boundary crossing and DB reconstruction that occurred in the previous fallback path. The native implementation should provide significant speedup for large codebases.

Recommendation: READY TO MERGE 🚀

The implementation is robust, all critical issues are resolved, and the code quality meets project standards. This delivers on the core promise of eliminating the JS fallback bottleneck while maintaining correctness across all build modes.

@carlos-alm carlos-alm merged commit 633efba into main Apr 16, 2026
28 checks passed
@carlos-alm carlos-alm deleted the feat/native-full-build-structure branch April 16, 2026 02:55
@github-actions github-actions bot locked and limited conversation to collaborators Apr 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native: port full-build structure computation to Rust

1 participant