Skip to content

refactor(features): decompose complexity/structure/owners; reduce cfg/cochange/feature-warnings complexity#1237

Open
carlos-alm wants to merge 11 commits into
mainfrom
refactor/titan-decomposition-features
Open

refactor(features): decompose complexity/structure/owners; reduce cfg/cochange/feature-warnings complexity#1237
carlos-alm wants to merge 11 commits into
mainfrom
refactor/titan-decomposition-features

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Decomposes complexity, structure, graph-enrichment, structure-query, and owners modules
  • Reduces complexity in cfg and cochange
  • Reduces warning-level complexity in feature warnings batch

Commits

  • bdffbd1: refactor(features): decompose complexity, structure, graph-enrichment, structure-query, and owners
  • 9856136: refactor(features): reduce complexity in cfg and cochange
  • e52f18b: refactor(features): reduce warning-level complexity in feature warnings batch

Context

Part of the Titan Paradigm cleanup pass (see .codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #9 of 10 (mergeOrder position: 9).

Note: Plan listed PR #3 (builder) and PR #5 (graph helpers) as dependencies — PR #3 was skipped due to merge conflict against main. Cherry-pick applied cleanly here, so review separately.

Caveats

  • WASM grammars not available in dev worktree — CI will run full test matrix

Test plan

  • CI passes (lint, build, full test matrix)
  • Verify no new cycles introduced (codegraph stats)

…, structure-query, and owners

Internal refactor — no public API or behaviour change, so docs check acknowledged.

- complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from computeHalsteadMetrics.
- structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared buildActiveFilesSet + buildClassifierInput helpers.
- graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode, selectFileSeedNodes.
- structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern (HOTSPOT_ORDER_BY) and mapNative/JsHotspotRow helpers.
- owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles, buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary.
@claude
Copy link
Copy Markdown

claude Bot commented May 27, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Titan Decomposition Features ✅

Reviewing this refactoring PR following the 5-phase review process for codegraph.

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for rule relaxations
  • Phase 3: Test integrity - verify no weakened assertions
  • Phase 4: Code quality review - examine refactoring patterns
  • Phase 5: Backlog compliance - rate against foundation principles
  • Provide final assessment and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR is part of the Titan Paradigm cleanup (9/10 in merge order), systematically decomposing large, complex feature modules to improve maintainability and reduce cognitive load. The Titan Paradigm approach uses automated tooling to audit and refactor codebases for "Google-standard shape."

Phase 2: Configuration Change Detection ✅

CLEAN - No configuration changes detected:

  • No changes to biome.json, tsconfig.json, or package.json
  • No lint rule disabling or TypeScript suppressions
  • Only source code changes in src/features/*.ts files

Phase 3: Test Integrity ✅

CLEAN - No test modifications:

  • No test files changed
  • No weakened assertions or removed test cases
  • All test infrastructure remains unchanged

Phase 4: Code Quality Review ✅

Refactoring Pattern Analysis:

Positive patterns observed:

  • Function extraction: Large functions broken into focused helpers (e.g. classifyHalsteadToken, summarizeHalsteadCounts)
  • Clear separation of concerns: Helper functions have single responsibilities with descriptive names
  • Preserved public APIs: All exported functions maintain exact same signatures
  • Consistent style: JSDoc comments added for new internal functions
  • Meaningful names: Functions clearly indicate purpose
  • Type safety: Strong typing maintained throughout

Phase 5: Backlog Compliance ✅

Root Cause Analysis

Problem Being Solved: Systematic complexity reduction as part of Titan Paradigm cleanup - dogfooding codegraph on itself to prove the tool's effectiveness.
Why This Approach: Pure function extraction preserves behavior while reducing cognitive complexity and improving maintainability.
Risk Assessment: Very low risk - structural refactoring with no logic changes and comprehensive test coverage.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Improves maintainability without violating any core principles (P1-P8)
  • Problem-fit: 3 — Indirectly helps by improving codebase quality, but doesn't directly address AI token waste or blast radius detection
  • Breaking: No — All public APIs preserved, purely internal restructuring
  • Tier: 1

Critical Concerns

None identified. This is a clean, well-executed refactoring.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent refactoring execution with clear complexity reduction. The systematic approach demonstrates codegraph's effectiveness on itself. No functional changes, preserved APIs, comprehensive test coverage, and follows established patterns. Minor deduction only because this is internal cleanup rather than user-facing functionality.

Recommended next steps: Merge when CI passes. This refactoring should measurably reduce complexity metrics when codegraph analyzes itself.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Codegraph Impact Analysis

110 functions changed63 callers affected across 33 files

  • collectAllRules in src/features/boundaries.ts:238 (5 transitive callers)
  • loadImportEdges in src/features/boundaries.ts:251 (5 transitive callers)
  • ruleViolated in src/features/boundaries.ts:280 (4 transitive callers)
  • emitEdgeViolations in src/features/boundaries.ts:286 (5 transitive callers)
  • evaluateBoundaries in src/features/boundaries.ts:308 (8 transitive callers)
  • initCfgParsers in src/features/cfg.ts:110 (3 transitive callers)
  • buildVisitorCfgMap in src/features/cfg.ts:193 (3 transitive callers)
  • buildNativeCfgEntry in src/features/cfg.ts:376 (3 transitive callers)
  • tryNativeBulkInsertCfg in src/features/cfg.ts:413 (3 transitive callers)
  • CfgInsertStatements.insertBlock in src/features/cfg.ts:452 (0 transitive callers)
  • CfgInsertStatements.insertEdge in src/features/cfg.ts:453 (0 transitive callers)
  • prepareCfgInsertStatements in src/features/cfg.ts:456 (3 transitive callers)
  • persistAllFileCfgs in src/features/cfg.ts:472 (3 transitive callers)
  • buildCFGData in src/features/cfg.ts:510 (3 transitive callers)
  • pushHunkRanges in src/features/check.ts:28 (4 transitive callers)
  • parseDiffOutput in src/features/check.ts:48 (4 transitive callers)
  • rangesOverlap in src/features/check.ts:119 (3 transitive callers)
  • defEndLine in src/features/check.ts:126 (3 transitive callers)
  • checkMaxBlastRadius in src/features/check.ts:130 (4 transitive callers)
  • makeEmptyCheck in src/features/check.ts:382 (4 transitive callers)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR is a pure complexity-reduction refactor across 12 feature files — no new behavior, no schema changes. Long functions are decomposed into focused, named helpers, redundant code is deduplicated, and a few latent quality issues are addressed along the way.

  • cfg.ts fixes a redundant double-import of '../domain/parser.js' and replaces the unknown cast with the typed CfgRulesConfig; buildCFGData is split into tryNativeBulkInsertCfg, prepareCfgInsertStatements, and persistAllFileCfgs.
  • check.ts moves the prepared statement for the per-file nodes query outside the loop, and replaces the shared-mutable EMPTY_CHECK constant with a makeEmptyCheck() factory.
  • structure-query.ts / structure.ts eliminate copy-pasted hotspot-row mapping and role-classifier input building.

Confidence Score: 5/5

All 12 files are purely structural refactors with no logic, schema, or public API changes.

Every behavioral path was traced: the allNative/bulkInsertCfg fast path in cfg.ts, the read-before-clear ordering in cochange.ts, the partial-rows-on-fallback handling in complexity.ts, and the BFS frame sharing in flow/dataflow/sequence. All preserve original semantics.

No files require special attention.

Important Files Changed

Filename Overview
src/features/boundaries.ts Extracted collectAllRules, loadImportEdges, ruleViolated, emitEdgeViolations; pure structural decomposition, logic unchanged.
src/features/cfg.ts Extracted buildNativeCfgEntry, tryNativeBulkInsertCfg, prepareCfgInsertStatements, persistAllFileCfgs; fixes redundant double-import and replaces unknown with CfgRulesConfig.
src/features/check.ts Hoisted HUNK_RE/NEW_FILE_RE; extracted pushHunkRanges, rangesOverlap, defEndLine; promoted defsStmt outside loop; EMPTY_CHECK replaced with makeEmptyCheck() factory.
src/features/cochange.ts Extracted loadLastAnalyzedSha, clearCoChangeTables, loadKnownFiles, persistCoChangeResults, recomputeJaccardForAffected, updateCoChangeMeta; read-before-clear ordering preserved.
src/features/complexity.ts Extracted classifyHalsteadToken, summarizeHalsteadCounts, NativeRowDecision, classifyDefinitionForNativeBulk, buildNativeBulkRow, collectFileBulkRows; public API of collectNativeBulkRows unchanged.
src/features/dataflow.ts Extracted DataflowNeighbor, DataflowBfsState, processDataflowNeighbor; BFS semantics preserved.
src/features/flow.ts Extracted FlowBfsFrame and processFlowCallee; shared visited/cycles references preserved across depth iterations.
src/features/graph-enrichment.ts Extracted loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode, selectFileSeedNodes; duplicate seedStrategy branches unified.
src/features/owners.ts Introduced OwnedSymbol, OwnerBoundary, OwnersDataResult types; extracted six helper functions; sorted owner arrays correctly stored.
src/features/sequence.ts Extracted CalleeNode, BfsFrame, processCallee, DataflowStmts, appendReturnMessages, annotateCallParams; message-before-visited-check order preserved.
src/features/structure-query.ts Extracted HotspotEntry, computeHotspotDensity, mapNativeHotspotRow, mapJsHotspotRow, HOTSPOT_ORDER_BY, buildHotspotQuery; all ORDER BY strings are static.
src/features/structure.ts Extracted CallableNodeRow, buildActiveFilesSet, buildClassifierInput; duplicate code in classifyNodeRolesFull and classifyNodeRolesIncremental unified.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["buildCFGData()"] --> B{allNative?}
    B -- "yes" --> C["tryNativeBulkInsertCfg()"]
    C -- "nativeDb.bulkInsertCfg exists" --> D["buildNativeCfgEntry() per def"]
    D --> E["nativeDb.bulkInsertCfg(entries)"]
    E --> F[return early]
    C -- "no bulkInsertCfg" --> G[false]
    B -- "no" --> H["initCfgParsers()"]
    G --> I["prepareCfgInsertStatements()"]
    H --> I
    I --> J["persistAllFileCfgs()"]
    J --> K{allNative and no tree?}
    K -- "yes" --> L["persistNativeFileCfg()"]
    K -- "no" --> M["persistVisitorFileCfg()"]
    L --> N[log analyzed count]
    M --> N
Loading

Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

Comment on lines +554 to +582
/** Decision outcome for a single definition during native bulk-row collection.
* - 'skip': the definition is legitimately ignorable (non-function, missing line,
* interface stub, unsupported language).
* - 'fallback': a genuine function body is missing precomputed complexity —
* the whole native fast path must abort to JS.
* - 'emit': the definition has complexity data; the row was appended. */
type NativeRowDecision = 'skip' | 'fallback' | 'emit';

/** Classify a definition relative to the native bulk path. Returns
* 'skip' to ignore it, 'fallback' to bail out, or 'emit' if the row was added. */
function classifyDefinitionForNativeBulk(
def: FileSymbols['definitions'][0],
langSupported: boolean,
): 'skip' | 'fallback' | 'has-data' {
if (def.kind !== 'function' && def.kind !== 'method') return 'skip';
if (!def.line) return 'skip';
if (!def.complexity) {
// Interface/type property signatures and single-line stubs are extracted
// as methods but the native engine correctly never assigns complexity.
// Mirror the leniency in initWasmParsersIfNeeded to avoid bailing out
// of the native bulk-insert path for every TypeScript codebase (#846).
if (def.name.includes('.') || !def.endLine || def.endLine <= def.line) return 'skip';
// Languages without complexity rules will never have data — skip them
// rather than bailing out of the entire native bulk path.
if (!langSupported) return 'skip';
return 'fallback'; // genuine function body missing complexity — needs JS fallback
}
return 'has-data';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent 'has-data' vs 'emit' naming

NativeRowDecision declares the three variants as 'skip' | 'fallback' | 'emit', and its JSDoc explains 'emit' as "the definition has complexity data; the row was appended." However, classifyDefinitionForNativeBulk is typed as returning 'skip' | 'fallback' | 'has-data' and its JSDoc also mis-states the third value as 'emit'. The type NativeRowDecision is never used as the return annotation here, so the two sets of string literals silently diverge. A reader tracing decision === 'has-data' in collectFileBulkRows won't find it in the documented type, making the intent of the abstraction harder to follow.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d46198d — classifyDefinitionForNativeBulk now returns NativeRowDecision directly (the third variant renamed from 'has-data' to 'emit' to match the documented type). The return annotation is now NativeRowDecision instead of the inline 'skip' | 'fallback' | 'has-data' literal, so the type and its callers are fully consistent.

Comment thread src/features/owners.ts
Comment on lines +279 to +295
const srcOwners = matchOwners(e.srcFile, rules);
const tgtOwners = matchOwners(e.tgtFile, rules);
// Cross-boundary: different owner sets (sort for deterministic comparison)
const srcKey = [...srcOwners].sort().join(',');
const tgtKey = [...tgtOwners].sort().join(',');
if (srcKey === tgtKey) continue;
boundaries.push({
from: {
name: e.srcName,
kind: e.srcKind,
file: e.srcFile,
line: e.srcLine,
owners: srcOwners,
},
to: { name: e.tgtName, kind: e.tgtKind, file: e.tgtFile, line: e.tgtLine, owners: tgtOwners },
edgeKind: e.edgeKind,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent change in owners sort order inside returned boundaries

The original code called srcOwners.sort() (mutating in place) before storing the reference in boundaries.from.owners, so callers always received a sorted array. The refactored computeOwnerBoundaries correctly avoids the mutation by sorting a spread copy — but as a side effect, boundary.from.owners and boundary.to.owners are now returned in the original order from matchOwners rather than lexicographically sorted. For most uses this is the better behavior, but any downstream consumer that relied on the accidentally-sorted order (e.g., deterministic snapshot tests or display logic) will silently see different output. Worth confirming no caller depends on this.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d46198d — computeOwnerBoundaries now creates sortedSrc/sortedTgt via spread+sort before the key comparison, and stores these sorted arrays in boundary.from.owners and boundary.to.owners. This restores the lexicographically-sorted order that the original in-place sort provided, ensuring downstream consumers always receive sorted owner arrays.

…, structure-query,

and owners

Internal refactor — no public API or behaviour change, so docs check acknowledged.

- complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file
  helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from
  computeHalsteadMetrics.
- structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared
  buildActiveFilesSet + buildClassifierInput helpers.
- graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into
  loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode,
  selectFileSeedNodes.
- structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern
  (HOTSPOT_ORDER_BY) and mapNativeHotspotRow/mapJsHotspotRow helpers.
- owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles,
  buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary.
…ry owners

- complexity.ts: classifyDefinitionForNativeBulk now returns NativeRowDecision
  (renamed 'has-data' to 'emit') so the return annotation matches the shared
  union type; eliminates the silent divergence between the type and its callers.
- owners.ts: computeOwnerBoundaries now stores sortedSrc/sortedTgt in boundaries
  rather than the original unsorted arrays, restoring the lexicographic order
  that the original in-place sort provided.
@carlos-alm carlos-alm force-pushed the refactor/titan-decomposition-features branch from 496a718 to d46198d Compare May 27, 2026 05:21
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

- cfg.ts: consolidate double dynamic import of ../domain/parser.js into
  a single `mod` variable used for both createParsers and getParser
- cfg.ts: type cfgRules as CfgRulesConfig in buildVisitorCfgMap, removing
  the `unknown` annotation and the downstream `as { functionNodes: string[] }` cast
- check.ts: replace mutable module-level EMPTY_CHECK constant with a
  makeEmptyCheck() factory to prevent shared-state corruption
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed the three issues from Greptile's summary review (commit e549551):

  1. Double import in cfg.ts — consolidated the two import('../domain/parser.js') calls in initCfgParsers into a single mod variable used for both createParsers and getParser.

  2. cfgRules typed as unknown in buildVisitorCfgMap — imported CfgRulesConfig from ../types.js and typed the parameter correctly; removed the downstream as { functionNodes: string[] } cast (the field is already Set<string> on CfgRulesConfig).

  3. Mutable module-level EMPTY_CHECK constant — converted to a makeEmptyCheck() factory function so each return path gets a fresh object, preventing shared-state corruption if any caller were to mutate the result.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant