refactor(features): decompose complexity/structure/owners; reduce cfg/cochange/feature-warnings complexity#1237
refactor(features): decompose complexity/structure/owners; reduce cfg/cochange/feature-warnings complexity#1237carlos-alm wants to merge 11 commits into
Conversation
…, structure-query, and owners Internal refactor — no public API or behaviour change, so docs check acknowledged. - complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from computeHalsteadMetrics. - structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared buildActiveFilesSet + buildClassifierInput helpers. - graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode, selectFileSeedNodes. - structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern (HOTSPOT_ORDER_BY) and mapNative/JsHotspotRow helpers. - owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles, buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary.
|
Claude finished @carlos-alm's task —— View job PR Review: Titan Decomposition Features ✅Reviewing this refactoring PR following the 5-phase review process for codegraph. Todo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: This PR is part of the Titan Paradigm cleanup (9/10 in merge order), systematically decomposing large, complex feature modules to improve maintainability and reduce cognitive load. The Titan Paradigm approach uses automated tooling to audit and refactor codebases for "Google-standard shape." Phase 2: Configuration Change Detection ✅CLEAN - No configuration changes detected:
Phase 3: Test Integrity ✅CLEAN - No test modifications:
Phase 4: Code Quality Review ✅Refactoring Pattern Analysis: Positive patterns observed:
Phase 5: Backlog Compliance ✅Root Cause AnalysisProblem Being Solved: Systematic complexity reduction as part of Titan Paradigm cleanup - dogfooding codegraph on itself to prove the tool's effectiveness. Backlog Compliance
Critical ConcernsNone identified. This is a clean, well-executed refactoring. Final Recommendation
Recommended next steps: Merge when CI passes. This refactoring should measurably reduce complexity metrics when codegraph analyzes itself. |
Codegraph Impact Analysis110 functions changed → 63 callers affected across 33 files
|
Greptile SummaryThis PR is a pure complexity-reduction refactor across 12 feature files — no new behavior, no schema changes. Long functions are decomposed into focused, named helpers, redundant code is deduplicated, and a few latent quality issues are addressed along the way.
Confidence Score: 5/5All 12 files are purely structural refactors with no logic, schema, or public API changes. Every behavioral path was traced: the allNative/bulkInsertCfg fast path in cfg.ts, the read-before-clear ordering in cochange.ts, the partial-rows-on-fallback handling in complexity.ts, and the BFS frame sharing in flow/dataflow/sequence. All preserve original semantics. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["buildCFGData()"] --> B{allNative?}
B -- "yes" --> C["tryNativeBulkInsertCfg()"]
C -- "nativeDb.bulkInsertCfg exists" --> D["buildNativeCfgEntry() per def"]
D --> E["nativeDb.bulkInsertCfg(entries)"]
E --> F[return early]
C -- "no bulkInsertCfg" --> G[false]
B -- "no" --> H["initCfgParsers()"]
G --> I["prepareCfgInsertStatements()"]
H --> I
I --> J["persistAllFileCfgs()"]
J --> K{allNative and no tree?}
K -- "yes" --> L["persistNativeFileCfg()"]
K -- "no" --> M["persistVisitorFileCfg()"]
L --> N[log analyzed count]
M --> N
Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile |
| /** Decision outcome for a single definition during native bulk-row collection. | ||
| * - 'skip': the definition is legitimately ignorable (non-function, missing line, | ||
| * interface stub, unsupported language). | ||
| * - 'fallback': a genuine function body is missing precomputed complexity — | ||
| * the whole native fast path must abort to JS. | ||
| * - 'emit': the definition has complexity data; the row was appended. */ | ||
| type NativeRowDecision = 'skip' | 'fallback' | 'emit'; | ||
|
|
||
| /** Classify a definition relative to the native bulk path. Returns | ||
| * 'skip' to ignore it, 'fallback' to bail out, or 'emit' if the row was added. */ | ||
| function classifyDefinitionForNativeBulk( | ||
| def: FileSymbols['definitions'][0], | ||
| langSupported: boolean, | ||
| ): 'skip' | 'fallback' | 'has-data' { | ||
| if (def.kind !== 'function' && def.kind !== 'method') return 'skip'; | ||
| if (!def.line) return 'skip'; | ||
| if (!def.complexity) { | ||
| // Interface/type property signatures and single-line stubs are extracted | ||
| // as methods but the native engine correctly never assigns complexity. | ||
| // Mirror the leniency in initWasmParsersIfNeeded to avoid bailing out | ||
| // of the native bulk-insert path for every TypeScript codebase (#846). | ||
| if (def.name.includes('.') || !def.endLine || def.endLine <= def.line) return 'skip'; | ||
| // Languages without complexity rules will never have data — skip them | ||
| // rather than bailing out of the entire native bulk path. | ||
| if (!langSupported) return 'skip'; | ||
| return 'fallback'; // genuine function body missing complexity — needs JS fallback | ||
| } | ||
| return 'has-data'; | ||
| } |
There was a problem hiding this comment.
Inconsistent
'has-data' vs 'emit' naming
NativeRowDecision declares the three variants as 'skip' | 'fallback' | 'emit', and its JSDoc explains 'emit' as "the definition has complexity data; the row was appended." However, classifyDefinitionForNativeBulk is typed as returning 'skip' | 'fallback' | 'has-data' and its JSDoc also mis-states the third value as 'emit'. The type NativeRowDecision is never used as the return annotation here, so the two sets of string literals silently diverge. A reader tracing decision === 'has-data' in collectFileBulkRows won't find it in the documented type, making the intent of the abstraction harder to follow.
There was a problem hiding this comment.
Fixed in d46198d — classifyDefinitionForNativeBulk now returns NativeRowDecision directly (the third variant renamed from 'has-data' to 'emit' to match the documented type). The return annotation is now NativeRowDecision instead of the inline 'skip' | 'fallback' | 'has-data' literal, so the type and its callers are fully consistent.
| const srcOwners = matchOwners(e.srcFile, rules); | ||
| const tgtOwners = matchOwners(e.tgtFile, rules); | ||
| // Cross-boundary: different owner sets (sort for deterministic comparison) | ||
| const srcKey = [...srcOwners].sort().join(','); | ||
| const tgtKey = [...tgtOwners].sort().join(','); | ||
| if (srcKey === tgtKey) continue; | ||
| boundaries.push({ | ||
| from: { | ||
| name: e.srcName, | ||
| kind: e.srcKind, | ||
| file: e.srcFile, | ||
| line: e.srcLine, | ||
| owners: srcOwners, | ||
| }, | ||
| to: { name: e.tgtName, kind: e.tgtKind, file: e.tgtFile, line: e.tgtLine, owners: tgtOwners }, | ||
| edgeKind: e.edgeKind, | ||
| }); |
There was a problem hiding this comment.
Silent change in
owners sort order inside returned boundaries
The original code called srcOwners.sort() (mutating in place) before storing the reference in boundaries.from.owners, so callers always received a sorted array. The refactored computeOwnerBoundaries correctly avoids the mutation by sorting a spread copy — but as a side effect, boundary.from.owners and boundary.to.owners are now returned in the original order from matchOwners rather than lexicographically sorted. For most uses this is the better behavior, but any downstream consumer that relied on the accidentally-sorted order (e.g., deterministic snapshot tests or display logic) will silently see different output. Worth confirming no caller depends on this.
There was a problem hiding this comment.
Fixed in d46198d — computeOwnerBoundaries now creates sortedSrc/sortedTgt via spread+sort before the key comparison, and stores these sorted arrays in boundary.from.owners and boundary.to.owners. This restores the lexicographically-sorted order that the original in-place sort provided, ensuring downstream consumers always receive sorted owner arrays.
…, structure-query, and owners Internal refactor — no public API or behaviour change, so docs check acknowledged. - complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from computeHalsteadMetrics. - structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared buildActiveFilesSet + buildClassifierInput helpers. - graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode, selectFileSeedNodes. - structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern (HOTSPOT_ORDER_BY) and mapNativeHotspotRow/mapJsHotspotRow helpers. - owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles, buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary.
…ry owners - complexity.ts: classifyDefinitionForNativeBulk now returns NativeRowDecision (renamed 'has-data' to 'emit') so the return annotation matches the shared union type; eliminates the silent divergence between the type and its callers. - owners.ts: computeOwnerBoundaries now stores sortedSrc/sortedTgt in boundaries rather than the original unsorted arrays, restoring the lexicographic order that the original in-place sort provided.
496a718 to
d46198d
Compare
- cfg.ts: consolidate double dynamic import of ../domain/parser.js into
a single `mod` variable used for both createParsers and getParser
- cfg.ts: type cfgRules as CfgRulesConfig in buildVisitorCfgMap, removing
the `unknown` annotation and the downstream `as { functionNodes: string[] }` cast
- check.ts: replace mutable module-level EMPTY_CHECK constant with a
makeEmptyCheck() factory to prevent shared-state corruption
|
Addressed the three issues from Greptile's summary review (commit e549551):
|
Summary
Commits
Context
Part of the Titan Paradigm cleanup pass (see
.codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #9 of 10 (mergeOrder position: 9).Note: Plan listed PR #3 (builder) and PR #5 (graph helpers) as dependencies — PR #3 was skipped due to merge conflict against main. Cherry-pick applied cleanly here, so review separately.
Caveats
Test plan
codegraph stats)