Skip to content

perf(query): fix diffImpact latency regression (#904)#905

Open
carlos-alm wants to merge 4 commits intomainfrom
fix/semver-prerelease
Open

perf(query): fix diffImpact latency regression (#904)#905
carlos-alm wants to merge 4 commits intomainfrom
fix/semver-prerelease

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Short-circuit diffImpactData when no functions are affected — skips BFS traversal, co-change lookup, ownership lookup, and boundary checks. All callers (CLI, MCP, benchmark) already return early on empty affectedFunctions, so these lookups were wasted work. Saves ~2-3ms of fixed overhead per invocation.
  • Pass config to checkBoundaryViolations — eliminates redundant loadConfig(repoRoot) call (was being called twice with different cwd keys, bypassing the cache).
  • Strip pre-release suffix in semverComparesemverCompare('3.9.3-dev.6', '3.9.1') returned -1 because Number('3-dev')NaN0, causing shouldSkipNativeOrchestrator to flag all pre-release builds as buggy.

Investigation findings

The 3.9.2 regression (WASM +94%, native +28%) is primarily CI variance on a low baselinediffImpactData code is identical between 3.9.1 and 3.9.2, and affectedFunctions: 0 in both versions means BFS never runs. The entire measurement is fixed overhead: execFileSync('git diff') (~3-8ms) plus unnecessary lookups (~2-3ms). The WASM +94% (+6.1ms absolute) vs native +28% (+2.1ms) disparity reflects process spawn variance across CI runners.

Closes #904

Test plan

  • tsc --noEmit passes
  • biome check passes on changed file
  • semverCompare unit tests pass (23/23)
  • CI integration tests validate diffImpact output shape unchanged

`semverCompare('3.9.3-dev.6', '3.9.1')` returned -1 (less than) because
`Number('3-dev')` is NaN, which the `|| 0` fallback turned into 0,
making the comparison `0 < 1`. This caused `shouldSkipNativeOrchestrator`
to flag all pre-release builds as "buggy", disabling the native
orchestrator fast path introduced in #897.

Strip `-<prerelease>` before splitting on `.` so the numeric comparison
sees `3.9.3` vs `3.9.1` correctly.
Skip co-change, ownership, and boundary lookups when
findAffectedFunctions returns empty — all callers return early
on this case anyway. Also pass the already-loaded config to
checkBoundaryViolations to avoid a redundant loadConfig call.

Saves ~2-3ms of fixed overhead per diffImpact invocation when
the diff touches no function bodies (the common case for
comment/import/type-only changes and the benchmark probe).

Closes #904
@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR #905: diffImpact latency regression fix

Todo List:

  • Read and analyze changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance grading
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Performance regression in diffImpact where fixed overhead operations (co-change lookup, ownership lookup, boundary checks) were being executed even when no functions were affected by changes. Additionally, semverCompare was incorrectly flagging pre-release versions as "buggy" due to NaN comparison issue.

Why This Approach: This is a legitimate optimization addressing real waste. When affectedFunctions.length === 0, the subsequent BFS traversal and expensive lookups provide no value since all callers (CLI, MCP, benchmark) return early on empty results. The semverCompare fix addresses a specific bug where pre-release versions were incorrectly parsed.

Risk Assessment: Low risk. The changes are conservative optimizations that skip work when it would be wasted. The short-circuiting preserves the exact same output structure, just populated with appropriate empty/zero values.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Pure performance optimization, no architectural changes
  • Problem-fit: 4 — Improves agent response time by reducing latency overhead
  • Breaking: No — Output structure identical, purely additive optimization
  • Tier: 1

Code Quality Assessment

diff-impact.ts Changes ✓

The short-circuit logic at lines 314-334 is well-implemented:

  • Correctly identifies when no functions are affected
  • Returns identical structure with appropriate empty values
  • Preserves all expected fields for callers
  • The config passing optimization (line 354) eliminates redundant loadConfig() call

update-check.ts Changes ✓

The semverCompare fix (lines 24-25) is correct:

  • Strips pre-release suffixes before parsing: '3.9.3-dev.6''3.9.3'
  • Prevents NaN from Number('3-dev') causing incorrect comparisons
  • Maintains backward compatibility for normal semver strings

Test Coverage ✓

New test cases adequately cover the semverCompare pre-release stripping:

  • Tests both sides with pre-release suffixes (lines 55-62)
  • Validates the specific bug case: semverCompare('3.9.3-dev.6', '3.9.1') should return 1, not -1

Technical Concerns: None

  • The short-circuit preserves output contract exactly
  • No potential for introducing bugs since the skipped operations would have been ignored by callers anyway
  • Performance improvement is measurable (~2-3ms fixed overhead elimination)

Final Assessment

This is a solid performance optimization that addresses real inefficiencies without changing behavior. The semverCompare fix resolves a legitimate bug affecting pre-release builds. Both changes are technically sound with appropriate test coverage.

The claimed "CI variance" explanation is validated by the code analysis - the regression was indeed fixed overhead being measured on a small baseline, not a functional issue with the core diffImpact logic.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Well-executed performance optimization with clear benefits, appropriate test coverage, and no breaking changes. Minor deduction only because it's purely optimization rather than new functionality.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR eliminates a latency regression in diffImpactData via three targeted fixes: (1) a short-circuit when no functions are affected that skips BFS, co-change, and ownership lookups while preserving file-scoped boundary checks (addressing the prior review concern); (2) passing pre-loaded config through to checkBoundaryViolations to avoid a redundant loadConfig() call; and (3) stripping pre-release suffixes in semverCompare so version comparisons like 3.9.3-dev.6 vs 3.9.1 work correctly.

Confidence Score: 5/5

Safe to merge — all three fixes are correct and the prior boundary-violation concern has been resolved.

The short-circuit is logically sound and the return shape aligns with existing early-exit paths (summary: null). Boundary checks are preserved for file-only changes. The config pass-through fix is straightforward. semverCompare now correctly strips pre-release suffixes. No remaining P0/P1 findings.

No files require special attention.

Vulnerabilities

No security concerns identified. The execFileSync call for git diff already uses an array argument (no shell interpolation), and config is passed through rather than re-loaded, reducing the attack surface slightly.

Important Files Changed

Filename Overview
src/domain/analysis/diff-impact.ts Short-circuit added for empty affectedFunctions case; boundary checks correctly preserved; config pass-through eliminates duplicate loadConfig call. Prior review concern about suppressed boundary violations is resolved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[diffImpactData called] --> B{Git repo?}
    B -- No --> ERR[return error]
    B -- Yes --> C[runGitDiff]
    C --> D{Output empty?}
    D -- Yes --> E[return changedFiles:0, summary:null]
    D -- No --> F[parseGitDiff]
    F --> G{changedRanges empty?}
    G -- Yes --> E
    G -- No --> H[findAffectedFunctions]
    H --> I{affectedFunctions.length === 0?}
    I -- Yes NEW --> J[checkBoundaryViolations file-scoped]
    J --> K[return with historicallyCoupled:empty, ownership:null, summary:null]
    I -- No --> L[buildFunctionImpactResults BFS]
    L --> M[lookupCoChanges]
    M --> N[lookupOwnership]
    N --> O[checkBoundaryViolations file-scoped]
    O --> P[return full result with summary object]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into fix/semver-prer..." | Re-trigger Greptile

Comment on lines +314 to +334
if (affectedFunctions.length === 0) {
const base = {
changedFiles: changedRanges.size,
newFiles: [...newFiles],
affectedFunctions: [] as unknown[],
affectedFiles: [] as string[],
historicallyCoupled: [] as unknown[],
ownership: null,
boundaryViolations: [] as unknown[],
boundaryViolationCount: 0,
summary: {
functionsChanged: 0,
callersAffected: 0,
filesAffected: 0,
historicallyCoupledCount: 0,
ownersAffected: 0,
boundaryViolationCount: 0,
},
};
return paginateResult(base, 'affectedFunctions', { limit: opts.limit, offset: opts.offset });
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Boundary checks silently skipped for file-only changes

checkBoundaryViolations is scoped to files (scopeFiles: [...changedRanges.keys()]), not functions — so it can surface violations from pure import/type-alias changes even when no function body was touched. The new early return hardcodes boundaryViolations: [] / boundaryViolationCount: 0 for that case, silently suppressing any real violations.

If a caller uses summary.boundaryViolationCount (rather than checking functionsChanged first), it will see 0 even when architectural boundaries are actually violated. The PR description asserts callers return early on empty affectedFunctions, but that claim should be validated before skipping a check that is semantically independent of function-level changes. At minimum, consider preserving the boundary check in this branch:

const { boundaryViolations, boundaryViolationCount } = checkBoundaryViolations(
  db,
  changedRanges,
  noTests,
  { ...opts, config },
  repoRoot,
);
const base = {
  ...
  boundaryViolations,
  boundaryViolationCount,
  summary: {
    ...
    boundaryViolationCount,
  },
};
return paginateResult(base, 'affectedFunctions', { limit: opts.limit, offset: opts.offset });

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 14bf7c4 — the short-circuit path now preserves the boundary check since checkBoundaryViolations is file-scoped (via scopeFiles: [...changedRanges.keys()]) and can surface real violations even when no function bodies were touched. The return shape is also aligned: summary: null now matches the two existing early-exit paths, so callers that branch on summary === null behave consistently.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Codegraph Impact Analysis

1 functions changed3 callers affected across 3 files

  • diffImpactData in src/domain/analysis/diff-impact.ts:258 (3 transitive callers)

The short-circuit path was hardcoding boundaryViolations: [] when no
functions were affected. Since boundary checks are file-scoped (not
function-scoped), import or type-alias changes can still produce real
violations. Preserve the check and align the return shape (summary: null)
with the two existing early-exit paths.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed both Greptile findings in 14bf7c4:

  1. Boundary checks preserved — the short-circuit path now calls checkBoundaryViolations instead of hardcoding empty arrays, since boundary evaluation is file-scoped and can catch real violations even when no function bodies changed.

  2. Return shape aligned — the new early return now uses summary: null consistent with the two existing early-exit paths (empty diff output and zero changed ranges), so callers that branch on summary === null behave uniformly.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: investigate diffImpact latency regression in 3.9.2 (WASM +94%, native +28%)

1 participant