perf: reduce query latency regression from 3.1.4 to 3.3.0 by carlos-alm · Pull Request #528 · optave/codegraph

carlos-alm · 2026-03-19T08:45:29Z

Summary

Addresses the +28–56% query latency regression identified in #523 (benchmark comparison 3.1.4 → 3.3.0).

Pin benchmark hub target to stable function names (buildGraph, openDb, loadConfig) instead of auto-selecting the most-connected node. The previous auto-selection made version-to-version comparison meaningless when barrel/type files (e.g. src/types.ts) became the hub.
Gate implementors queries in bfsTransitiveCallers — check once per db handle whether any implements edges exist before doing per-node findNodeById + findImplementors lookups. Eliminates dead overhead for codebases without interface/trait hierarchies.
Cache loadConfig() per cwd — avoids re-reading the config file from disk on every fnImpactData and diffImpactData call. Exports clearConfigCache() for test use.

Test plan

tests/unit/config.test.js — 54 tests pass (cache uses unique temp dirs per test)
tests/unit/queries-unit.test.js — 37 tests pass
tests/integration/queries.test.js — 76 tests pass
biome check clean on all changed files

Three targeted fixes for the +28–56% query latency regression: 1. Pin benchmark hub target to stable function names (buildGraph, openDb, loadConfig) instead of auto-selecting the most-connected node. Barrel/type files becoming the hub made version-to-version comparison meaningless. 2. Gate implementors queries in bfsTransitiveCallers — check once whether the graph has any 'implements' edges before doing per-node findNodeById + findImplementors lookups. Skips all implementor overhead for codebases without interface/trait hierarchies. 3. Cache loadConfig() results per cwd. The config file is read from disk on every fnImpactData and diffImpactData call; caching eliminates redundant fs.existsSync + readFileSync + JSON.parse per query invocation. Impact: 5 functions changed, 123 affected

claude · 2026-03-19T08:45:41Z

Claude finished @carlos-alm's task —— View job

Claude Code Review

Todo List:

Phase 1: Root cause analysis - understand the latency regression problem
Phase 2: Configuration change detection - check for risky config changes
Phase 3: Test integrity - verify tests haven't been weakened
Phase 4: Code quality review - analyze implementation details
Phase 5: Backlog compliance assessment
Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: PR addresses a +28-56% query latency regression between versions 3.1.4 → 3.3.0. The regression stems from:

Auto-selecting benchmark hubs made version comparisons meaningless when barrel/type files became the most-connected nodes
Dead overhead from implementors queries in codebases without interface/trait hierarchies
Config file re-reading on every fnImpactData/diffImpactData call

Why This Approach: The fixes are surgical and well-targeted:

Pin benchmark targets to stable function names for consistent measurement
Gate implementors lookups with a once-per-connection check
Cache loadConfig() per cwd to avoid disk I/O

Risk Assessment: Low risk - all changes are performance optimizations with no behavioral changes.

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies, builds entirely on existing tree-sitter/SQLite/Node.js foundation
Foundation-aligned: ✓ — Directly supports P1 (always-current graph) by reducing rebuild latency; aligns with P4 (zero-cost core) by eliminating unnecessary file I/O and DB queries
Problem-fit: 5 — Directly addresses latency regression that would make the tool slower for AI agents and commit hooks, hitting the core "always-current" promise
Breaking: No — Pure performance optimizations with identical output behavior
Tier: 1

Critical Concerns

None identified. This is a high-quality performance optimization that addresses a real regression with surgical fixes.

Code Quality Highlights

WeakMap caching (impact.js:31) prevents memory leaks and provides optimal per-connection caching
Config caching preserves all error handling paths while eliminating redundant file reads
Benchmark pinning uses smart fallback strategy for robust measurement across versions
Test compatibility - proper clearConfigCache() export maintains test isolation

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Exemplary performance optimization PR. Addresses a real latency regression with three targeted, well-implemented fixes. No behavioral changes, strong test coverage, zero new dependencies. Aligns perfectly with Foundation principle P1 (always-current graph) by keeping rebuilds fast enough for commit hooks and agent loops.

greptile-apps · 2026-03-19T08:48:50Z

Greptile Summary

This PR addresses a 28–56% query latency regression between versions 3.1.4 and 3.3.0 through three targeted optimizations: pinning benchmark hub targets to stable function names for reproducible comparisons, gating implements edge lookups in bfsTransitiveCallers behind a single per-db-handle probe, and caching loadConfig() results per working directory to avoid repeated disk reads.

src/domain/analysis/impact.js: hasImplementsEdges() uses a module-level WeakMap keyed on the db handle to run the probe query at most once per connection. Both resolveImplementors substitutions in bfsTransitiveCallers are consistent and correct.
src/infrastructure/config.js: Cache miss paths now store structuredClone(result) and return the original; cache hit returns structuredClone(cached) — correctly preventing callers from corrupting the shared cache object (previous review issues addressed). clearConfigCache() is exported with a JSDoc explaining its intended use cases. Minor gap: the JSDoc for loadConfig does not mention that env-var overrides (CODEGRAPH_LLM_*) and apiKeyCommand output are also frozen in the cache snapshot, which is relevant for long-running processes that rotate credentials without restarting.
scripts/query-benchmark.js: Pinned hub candidates tried in order before falling back to most-connected node; db.close() is now guaranteed via try/finally. Minor indentation inconsistency inside the new try block.
scripts/lib/bench-config.js: Explicit @huggingface/transformers install in npm benchmark mode. Unlike the native package install, this block has no retry logic — a transient network failure silently falls through to a runtime import error in embedding workers.
scripts/update-embedding-report.js: Early guard prevents overwriting valid benchmark data when all workers crashed or produced empty results.

Confidence Score: 4/5

Safe to merge; all three performance optimizations are logically correct and the previous cache-correctness issues have been addressed.
The core logic changes (WeakMap gate, config cache with structuredClone, pinned benchmark targets) are all sound. The three remaining comments are style/consistency issues (indentation, missing retry, doc gap) that do not affect correctness or production safety. The score is 4 rather than 5 solely because the undocumented behavioral change around env-var / apiKeyCommand caching could surprise future contributors working on credential rotation in long-running processes.
src/infrastructure/config.js — the doc gap around env-var override caching is worth clarifying before this pattern is relied upon in more callers.

Important Files Changed

Filename	Overview
src/infrastructure/config.js	Adds per-cwd config cache with structuredClone on both cache hits and cache misses (addressing previous review feedback). Exports clearConfigCache() for long-running processes and test isolation. Minor doc gap: the JSDoc doesn't clarify that env-var overrides and apiKeyCommand output are also frozen in the cache snapshot.
src/domain/analysis/impact.js	Adds hasImplementsEdges() with module-level WeakMap cache per db handle to gate all findNodeById + findImplementors lookups. Logic is correct: resolveImplementors short-circuits when includeImplementors=false (no WeakMap write) and when the graph has no implements edges. Both call sites in bfsTransitiveCallers are consistently updated.
scripts/query-benchmark.js	Pins hub target to stable function names (buildGraph, openDb, loadConfig) with fallback to most-connected node, preventing version-to-version benchmark drift. Wraps selectTargets in try/finally to ensure db.close() is always called. Minor indentation inconsistency inside the try block.
scripts/lib/bench-config.js	Adds explicit installation of @huggingface/transformers devDependency in npm benchmark mode so embedding benchmark workers can import it. Correctly reads the version from the local package.json. Unlike the native package install, this block has no retry logic — a transient network failure would silently fall through to a worker import error.
scripts/update-embedding-report.js	Adds an early guard that rejects empty benchmark results (no symbols or no model data) before overwriting the report file. Guard logic is correct and the error message is informative. Non-critical: !entry.symbols is truthy for 0 symbols, which is the intended empty-result case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["loadConfig(cwd)"] --> B{Cache hit?}
    B -->|yes| C["return structuredClone(cached)"]
    B -->|no| D[Search CONFIG_FILES on disk]
    D -->|found| E["mergeConfig(DEFAULTS, file)"]
    D -->|not found| F["applyEnvOverrides({...DEFAULTS})"]
    E --> G["applyEnvOverrides(merged)"]
    G --> H["resolveSecrets(result)\n(runs apiKeyCommand once)"]
    H --> I["_configCache.set(cwd, structuredClone(result))"]
    I --> J["return result"]
    F --> K["resolveSecrets(defaults)\n(runs apiKeyCommand once)"]
    K --> L["_configCache.set(cwd, structuredClone(defaults))"]
    L --> M["return defaults"]

    N["bfsTransitiveCallers(db, startId, opts)"] --> O{"includeImplementors?"}
    O -->|no| P["resolveImplementors = false\nskip all findNodeById/findImplementors"]
    O -->|yes| Q{"_hasImplementsCache.has(db)?"}
    Q -->|yes| R["return cached bool"]
    Q -->|no| S["SELECT 1 FROM edges WHERE kind='implements' LIMIT 1"]
    S --> T["_hasImplementsCache.set(db, result)"]
    T --> R
    R -->|false| P
    R -->|true| U["resolveImplementors = true\nproceed with implementor lookups"]

_{Last reviewed commit: "fix: clone config on..."}

greptile-apps · 2026-03-19T08:48:54Z

src/infrastructure/config.js

+  const cached = _configCache.get(cwd);
+  if (cached) return cached;


Mutable object returned by reference from cache

_configCache stores and returns the same object reference. If any caller (current or future) mutates a nested field on the returned config — for example config.llm.apiKey = '...', config.query.excludeTests = true, or config.include.push(...) — every subsequent call to loadConfig(cwd) will observe the mutated value. This is especially risky because resolveSecrets and applyEnvOverrides both already mutate the config object in-place before it is cached.

Returning a shallow or deep clone on every cache hit prevents callers from accidentally corrupting the cache:

Suggested change

const cached = _configCache.get(cwd);

if (cached) return cached;

const cached = _configCache.get(cwd);

if (cached) return structuredClone(cached);

Alternatively, Object.freeze the cached value so mutations throw in strict mode.

Fixed — loadConfig() now returns structuredClone(cached) on cache hits, preventing callers from mutating the shared cache object.

greptile-apps · 2026-03-19T08:48:55Z

src/infrastructure/config.js

+export function clearConfigCache() {
+  _configCache.clear();
 }


clearConfigCache exported but never imported anywhere

The PR description says this is exported "for test use," but a search across the whole codebase shows it is defined here and not imported or called anywhere — including in tests/unit/config.test.js, which achieves test isolation via unique os.tmpdir() directories instead.

An unused export is harmless, but it can mislead future contributors into thinking tests or callers rely on it. If the intent is to provide a safety valve for long-running processes that detect config-file changes on disk, a brief inline comment explaining that use case (or a corresponding test) would clarify the intent.

Fixed — updated the JSDoc to clarify the intended use cases: long-running processes that need to detect on-disk config changes, and test isolation when tests share the same cwd.

…handle Prevent callers from mutating the cached config object by returning a deep clone on cache hits. Add try/finally to selectTargets() so the database handle is closed even if a query throws. Impact: 2 functions changed, 1 affected

carlos-alm · 2026-03-19T08:54:10Z

Addressed the outside-diff comment about the DB handle leak in selectTargets() — wrapped in try/finally so db.close() runs unconditionally.

carlos-alm · 2026-03-19T08:54:18Z

@greptileai

greptile-apps · 2026-03-19T08:57:09Z

src/infrastructure/config.js

+        const result = resolveSecrets(applyEnvOverrides(merged));
+        _configCache.set(cwd, result);
+        return result;
      } catch (err) {
        debug(`Failed to parse config ${filePath}: ${err.message}`);
      }
    }
  }
-  return resolveSecrets(applyEnvOverrides({ ...DEFAULTS }));
+  const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS }));
+  _configCache.set(cwd, defaults);
+  return defaults;


Cache miss returns un-cloned reference

The structuredClone guard was applied to cache hits (line 144), but both cache-miss return paths hand the caller the exact same object that was just written into _configCache. A caller mutating the returned config on first use will corrupt every subsequent cache hit from that cwd.

The fix from the previous thread — "loadConfig() now returns structuredClone(cached) on cache hits" — is incomplete; the same protection needs to apply at initial population. Either store a clone in the cache and return the original, or return a clone at the end of each miss path:

Suggested change

const result = resolveSecrets(applyEnvOverrides(merged));

_configCache.set(cwd, result);

return result;

} catch (err) {

debug(`Failed to parse config ${filePath}: ${err.message}`);

}

}

}

return resolveSecrets(applyEnvOverrides({ ...DEFAULTS }));

const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS }));

_configCache.set(cwd, defaults);

return defaults;

const result = resolveSecrets(applyEnvOverrides(merged));

_configCache.set(cwd, structuredClone(result));

return result;

And the defaults path needs the same treatment:

const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); _configCache.set(cwd, structuredClone(defaults)); return defaults;

Without this, the invariant that "the cache stores a pristine copy that no external caller can touch" only holds after the second call to loadConfig(cwd), not the first.

Fixed — both cache-miss paths now store structuredClone(result) in the cache and return the original object, ensuring callers can never mutate the cached copy.

The embedding benchmark's npm mode installs codegraph into a temp dir, but @huggingface/transformers is a devDependency and not included. All 6 model workers crash on import, producing symbols: 0, models: {}. Install it explicitly from the local devDependencies version, matching the existing pattern for native platform packages. Also add a guard in update-embedding-report.js to reject empty results and fail loudly instead of silently overwriting valid benchmark data.

carlos-alm · 2026-03-19T09:09:00Z

@greptileai

Impact: 1 functions changed, 118 affected

carlos-alm · 2026-03-19T09:33:16Z

@greptileai

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

carlos-alm mentioned this pull request Mar 19, 2026

docs: update embedding benchmarks (3.3.0) #525

Closed

greptile-apps bot mentioned this pull request Mar 19, 2026

docs: update query benchmarks (3.3.0) #523

Merged

fix: clone config on cache-miss paths to prevent cache corruption (#528)

1f9e2e5

Impact: 1 functions changed, 118 affected

Merge branch 'main' into perf/query-latency-regression

55a9f36

carlos-alm merged commit f8016c6 into main Mar 19, 2026
14 checks passed

carlos-alm deleted the perf/query-latency-regression branch March 19, 2026 09:38

github-actions bot locked and limited conversation to collaborators Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce query latency regression from 3.1.4 to 3.3.0#528

perf: reduce query latency regression from 3.1.4 to 3.3.0#528
carlos-alm merged 5 commits intomainfrom
perf/query-latency-regression

carlos-alm commented Mar 19, 2026

Uh oh!

claude bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

carlos-alm Mar 19, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

carlos-alm Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

carlos-alm Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const cached = _configCache.get(cwd);
		if (cached) return cached;

Conversation

carlos-alm commented Mar 19, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Root Cause Analysis

Backlog Compliance

Critical Concerns

Code Quality Highlights

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

carlos-alm commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 19, 2026 •

edited

Loading

greptile-apps bot commented Mar 19, 2026 •

edited

Loading