Parallelize AST extraction and rewrite Louvain clustering by kenn850 · Pull Request #3 · elbruno/graphify-dotnet

kenn850 · 2026-05-06T09:19:25Z

Two independent hot paths were dominating wall-clock time on large repositories. Both showed the same symptom — long runtime with low CPU utilization (extraction) or sustained single-core saturation (clustering) — and both scaled poorly with node count. ---

Parallelize stage 2 (AST extraction) in PipelineRunner The pipeline previously ran extraction with a sequential foreach:
foreach (var file in detectedFiles)
await extractor.ExecuteAsync(file, cancellationToken);
Each call performs file I/O, so the loop spent most of its time awaiting the disk while a single core sat idle. Wall-clock time grew linearly with file count.
Replaced with Parallel.ForEachAsync bounded by Environment.ProcessorCount. Extractor only holds a read-only dictionary of language extractors and returns a fresh ExtractionResult per call, so concurrent invocation is safe.
Concurrency details:

Results are collected into a ConcurrentBag and flattened into the existing list after the loop.
processed / skipped counters use Interlocked.Increment.
Verbose per-file failure messages are buffered into a ConcurrentQueue and flushed sequentially after the loop; TextWriter is not thread-safe and interleaved writes from parallel workers would corrupt output.
Cancellation flows through ParallelOptions.CancellationToken. Stage 2b (LLM-backed semantic extraction) is intentionally left sequential — LLM providers throttle aggressive concurrency and that stage needs a separate, lower parallelism cap if it's parallelized later.
Expected impact: near-linear speedup on AST extraction up to disk throughput, typically 4–10x on large codebases.

Rewrite Louvain community detection in ClusterEngine Stage 4 ("Detecting communities") was the dominant cost on graphs with thousands of nodes. The root cause was inside CalculateModularityGain (and its subgraph twin): every gain evaluation walked all N nodes and re-summed each one's edge weights:
foreach (var otherNode in graph.GetNodes())
otherDegree = graph.GetEdges(otherNode.Id).Sum(e => e.Weight);
That made a single gain evaluation O(N · Ē). The outer Louvain loop calls it once per (node, neighboring-community) per iteration, so the overall cost was effectively O(iterations · N² · Ē). On a 10k-node / 50k-edge graph this is billions of operations — exactly why CPU was pinned and wall-clock high.
Rewrote both the main Louvain pass and SplitCommunity to use standard incremental aggregates:

Build a per-node weighted adjacency dictionary once at the start of DetectCommunities (collapses parallel edges into a single neighbor → summed-weight entry).
Compute each node's weighted degree once.
Maintain communityTotalDegree[c] incrementally: when a node moves from a → b, subtract its degree from a and add it to b.
For each move evaluation, walk only the moving node's own neighbors to bucket weight per neighboring community in one pass — O(degree) instead of O(N · Ē).
Reuse the edgesToCommunity dictionary across iterations (.Clear() instead of allocating). The same shape is applied inside SplitCommunity, with a sub-adjacency restricted to the induced subgraph for the oversized community. The now-redundant CalculateModularityGain and
CalculateSubgraphModularityGain helpers are removed; the gain math is inlined where it is used. The public CalculateModularity and CalculateCohesion static methods are unchanged. The modularity gain formula is preserved bit-for-bit so community assignments stay equivalent to the previous implementation.
Total cost drops from O(iterations · N² · Ē) to O(iterations · E), roughly an N× speedup on large graphs.

Validation

All 573 unit tests in Graphify.Tests pass (10 ClusterEngine tests, plus full suite).
No public API changes.
No behavioral changes to graph output (community ids may differ only by the ordering tie-break already present in the original implementation).

Two independent hot paths were dominating wall-clock time on large repositories. Both showed the same symptom — long runtime with low CPU utilization (extraction) or sustained single-core saturation (clustering) — and both scaled poorly with node count. --- 1. Parallelize stage 2 (AST extraction) in PipelineRunner The pipeline previously ran extraction with a sequential foreach: foreach (var file in detectedFiles) await extractor.ExecuteAsync(file, cancellationToken); Each call performs file I/O, so the loop spent most of its time awaiting the disk while a single core sat idle. Wall-clock time grew linearly with file count. Replaced with Parallel.ForEachAsync bounded by Environment.ProcessorCount. Extractor only holds a read-only dictionary of language extractors and returns a fresh ExtractionResult per call, so concurrent invocation is safe. Concurrency details: - Results are collected into a ConcurrentBag<ExtractionResult> and flattened into the existing list after the loop. - processed / skipped counters use Interlocked.Increment. - Verbose per-file failure messages are buffered into a ConcurrentQueue<string> and flushed sequentially after the loop; TextWriter is not thread-safe and interleaved writes from parallel workers would corrupt output. - Cancellation flows through ParallelOptions.CancellationToken. Stage 2b (LLM-backed semantic extraction) is intentionally left sequential — LLM providers throttle aggressive concurrency and that stage needs a separate, lower parallelism cap if it's parallelized later. Expected impact: near-linear speedup on AST extraction up to disk throughput, typically 4–10x on large codebases. --- 2. Rewrite Louvain community detection in ClusterEngine Stage 4 ("Detecting communities") was the dominant cost on graphs with thousands of nodes. The root cause was inside CalculateModularityGain (and its subgraph twin): every gain evaluation walked all N nodes and re-summed each one's edge weights: foreach (var otherNode in graph.GetNodes()) otherDegree = graph.GetEdges(otherNode.Id).Sum(e => e.Weight); That made a single gain evaluation O(N · Ē). The outer Louvain loop calls it once per (node, neighboring-community) per iteration, so the overall cost was effectively O(iterations · N² · Ē). On a 10k-node / 50k-edge graph this is billions of operations — exactly why CPU was pinned and wall-clock high. Rewrote both the main Louvain pass and SplitCommunity to use standard incremental aggregates: - Build a per-node weighted adjacency dictionary once at the start of DetectCommunities (collapses parallel edges into a single neighbor → summed-weight entry). - Compute each node's weighted degree once. - Maintain communityTotalDegree[c] incrementally: when a node moves from a → b, subtract its degree from a and add it to b. - For each move evaluation, walk only the moving node's own neighbors to bucket weight per neighboring community in one pass — O(degree) instead of O(N · Ē). - Reuse the edgesToCommunity dictionary across iterations (.Clear() instead of allocating). The same shape is applied inside SplitCommunity, with a sub-adjacency restricted to the induced subgraph for the oversized community. The now-redundant CalculateModularityGain and CalculateSubgraphModularityGain helpers are removed; the gain math is inlined where it is used. The public CalculateModularity and CalculateCohesion static methods are unchanged. The modularity gain formula is preserved bit-for-bit so community assignments stay equivalent to the previous implementation. Total cost drops from O(iterations · N² · Ē) to O(iterations · E), roughly an N× speedup on large graphs. --- Validation - All 573 unit tests in Graphify.Tests pass (10 ClusterEngine tests, plus full suite). - No public API changes. - No behavioral changes to graph output (community ids may differ only by the ordering tie-break already present in the original implementation).

elbruno · 2026-05-10T11:19:05Z

Validation update: I checked out this PR and ran the full solution tests locally (614 passed, 0 failed). I’m adding a focused regression test for the new Louvain weighted-adjacency/parallel-edge path, then re-running tests before final merge recommendation.

elbruno · 2026-05-10T11:21:56Z

Final update: I validated PR #3 end-to-end, then merged it.\n\n✅ Validation: full solution tests passed locally\n- Before merge: 615/615 passing\n- After merge on main: 615/615 passing\n\n✅ Follow-up hardening: I added a regression test for parallel-edge weighted-adjacency clustering behavior in ClusterEngineTests and pushed it to main in commit 64cd03c.\n\nThis PR is now merged/closed and main is green from local verification.

elbruno merged commit 1b905ba into elbruno:main May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize AST extraction and rewrite Louvain clustering#3

Parallelize AST extraction and rewrite Louvain clustering#3
elbruno merged 1 commit into
elbruno:mainfrom
kenn850:main

kenn850 commented May 6, 2026

Uh oh!

elbruno commented May 10, 2026

Uh oh!

elbruno commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kenn850 commented May 6, 2026

Uh oh!

elbruno commented May 10, 2026

Uh oh!

elbruno commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants