perf: reduce query latency regression from 3.1.4 to 3.3.0#528
perf: reduce query latency regression from 3.1.4 to 3.3.0#528carlos-alm merged 5 commits intomainfrom
Conversation
Three targeted fixes for the +28–56% query latency regression: 1. Pin benchmark hub target to stable function names (buildGraph, openDb, loadConfig) instead of auto-selecting the most-connected node. Barrel/type files becoming the hub made version-to-version comparison meaningless. 2. Gate implementors queries in bfsTransitiveCallers — check once whether the graph has any 'implements' edges before doing per-node findNodeById + findImplementors lookups. Skips all implementor overhead for codebases without interface/trait hierarchies. 3. Cache loadConfig() results per cwd. The config file is read from disk on every fnImpactData and diffImpactData call; caching eliminates redundant fs.existsSync + readFileSync + JSON.parse per query invocation. Impact: 5 functions changed, 123 affected
|
Claude finished @carlos-alm's task —— View job Claude Code ReviewTodo List:
Root Cause AnalysisProblem Being Solved: PR addresses a +28-56% query latency regression between versions 3.1.4 → 3.3.0. The regression stems from:
Why This Approach: The fixes are surgical and well-targeted:
Risk Assessment: Low risk - all changes are performance optimizations with no behavioral changes. Backlog Compliance
Critical ConcernsNone identified. This is a high-quality performance optimization that addresses a real regression with surgical fixes. Code Quality Highlights
Final Recommendation
|
Greptile SummaryThis PR addresses a 28–56% query latency regression between versions 3.1.4 and 3.3.0 through three targeted optimizations: pinning benchmark hub targets to stable function names for reproducible comparisons, gating
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["loadConfig(cwd)"] --> B{Cache hit?}
B -->|yes| C["return structuredClone(cached)"]
B -->|no| D[Search CONFIG_FILES on disk]
D -->|found| E["mergeConfig(DEFAULTS, file)"]
D -->|not found| F["applyEnvOverrides({...DEFAULTS})"]
E --> G["applyEnvOverrides(merged)"]
G --> H["resolveSecrets(result)\n(runs apiKeyCommand once)"]
H --> I["_configCache.set(cwd, structuredClone(result))"]
I --> J["return result"]
F --> K["resolveSecrets(defaults)\n(runs apiKeyCommand once)"]
K --> L["_configCache.set(cwd, structuredClone(defaults))"]
L --> M["return defaults"]
N["bfsTransitiveCallers(db, startId, opts)"] --> O{"includeImplementors?"}
O -->|no| P["resolveImplementors = false\nskip all findNodeById/findImplementors"]
O -->|yes| Q{"_hasImplementsCache.has(db)?"}
Q -->|yes| R["return cached bool"]
Q -->|no| S["SELECT 1 FROM edges WHERE kind='implements' LIMIT 1"]
S --> T["_hasImplementsCache.set(db, result)"]
T --> R
R -->|false| P
R -->|true| U["resolveImplementors = true\nproceed with implementor lookups"]
Last reviewed commit: "fix: clone config on..." |
src/infrastructure/config.js
Outdated
| const cached = _configCache.get(cwd); | ||
| if (cached) return cached; |
There was a problem hiding this comment.
Mutable object returned by reference from cache
_configCache stores and returns the same object reference. If any caller (current or future) mutates a nested field on the returned config — for example config.llm.apiKey = '...', config.query.excludeTests = true, or config.include.push(...) — every subsequent call to loadConfig(cwd) will observe the mutated value. This is especially risky because resolveSecrets and applyEnvOverrides both already mutate the config object in-place before it is cached.
Returning a shallow or deep clone on every cache hit prevents callers from accidentally corrupting the cache:
| const cached = _configCache.get(cwd); | |
| if (cached) return cached; | |
| const cached = _configCache.get(cwd); | |
| if (cached) return structuredClone(cached); |
Alternatively, Object.freeze the cached value so mutations throw in strict mode.
There was a problem hiding this comment.
Fixed — loadConfig() now returns structuredClone(cached) on cache hits, preventing callers from mutating the shared cache object.
| export function clearConfigCache() { | ||
| _configCache.clear(); | ||
| } |
There was a problem hiding this comment.
clearConfigCache exported but never imported anywhere
The PR description says this is exported "for test use," but a search across the whole codebase shows it is defined here and not imported or called anywhere — including in tests/unit/config.test.js, which achieves test isolation via unique os.tmpdir() directories instead.
An unused export is harmless, but it can mislead future contributors into thinking tests or callers rely on it. If the intent is to provide a safety valve for long-running processes that detect config-file changes on disk, a brief inline comment explaining that use case (or a corresponding test) would clarify the intent.
There was a problem hiding this comment.
Fixed — updated the JSDoc to clarify the intended use cases: long-running processes that need to detect on-disk config changes, and test isolation when tests share the same cwd.
…handle Prevent callers from mutating the cached config object by returning a deep clone on cache hits. Add try/finally to selectTargets() so the database handle is closed even if a query throws. Impact: 2 functions changed, 1 affected
|
Addressed the outside-diff comment about the DB handle leak in selectTargets() — wrapped in try/finally so db.close() runs unconditionally. |
| const result = resolveSecrets(applyEnvOverrides(merged)); | ||
| _configCache.set(cwd, result); | ||
| return result; | ||
| } catch (err) { | ||
| debug(`Failed to parse config ${filePath}: ${err.message}`); | ||
| } | ||
| } | ||
| } | ||
| return resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | ||
| const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | ||
| _configCache.set(cwd, defaults); | ||
| return defaults; |
There was a problem hiding this comment.
Cache miss returns un-cloned reference
The structuredClone guard was applied to cache hits (line 144), but both cache-miss return paths hand the caller the exact same object that was just written into _configCache. A caller mutating the returned config on first use will corrupt every subsequent cache hit from that cwd.
The fix from the previous thread — "loadConfig() now returns structuredClone(cached) on cache hits" — is incomplete; the same protection needs to apply at initial population. Either store a clone in the cache and return the original, or return a clone at the end of each miss path:
| const result = resolveSecrets(applyEnvOverrides(merged)); | |
| _configCache.set(cwd, result); | |
| return result; | |
| } catch (err) { | |
| debug(`Failed to parse config ${filePath}: ${err.message}`); | |
| } | |
| } | |
| } | |
| return resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | |
| const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | |
| _configCache.set(cwd, defaults); | |
| return defaults; | |
| const result = resolveSecrets(applyEnvOverrides(merged)); | |
| _configCache.set(cwd, structuredClone(result)); | |
| return result; |
And the defaults path needs the same treatment:
const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS }));
_configCache.set(cwd, structuredClone(defaults));
return defaults;Without this, the invariant that "the cache stores a pristine copy that no external caller can touch" only holds after the second call to loadConfig(cwd), not the first.
There was a problem hiding this comment.
Fixed — both cache-miss paths now store structuredClone(result) in the cache and return the original object, ensuring callers can never mutate the cached copy.
The embedding benchmark's npm mode installs codegraph into a temp dir,
but @huggingface/transformers is a devDependency and not included.
All 6 model workers crash on import, producing symbols: 0, models: {}.
Install it explicitly from the local devDependencies version, matching
the existing pattern for native platform packages. Also add a guard in
update-embedding-report.js to reject empty results and fail loudly
instead of silently overwriting valid benchmark data.
Impact: 1 functions changed, 118 affected
Summary
Addresses the +28–56% query latency regression identified in #523 (benchmark comparison 3.1.4 → 3.3.0).
buildGraph,openDb,loadConfig) instead of auto-selecting the most-connected node. The previous auto-selection made version-to-version comparison meaningless when barrel/type files (e.g.src/types.ts) became the hub.bfsTransitiveCallers— check once per db handle whether anyimplementsedges exist before doing per-nodefindNodeById+findImplementorslookups. Eliminates dead overhead for codebases without interface/trait hierarchies.loadConfig()per cwd — avoids re-reading the config file from disk on everyfnImpactDataanddiffImpactDatacall. ExportsclearConfigCache()for test use.Test plan
tests/unit/config.test.js— 54 tests pass (cache uses unique temp dirs per test)tests/unit/queries-unit.test.js— 37 tests passtests/integration/queries.test.js— 76 tests passbiome checkclean on all changed files