fix(bench): resolve query benchmark CI failure and increase embedding timeout#749
fix(bench): resolve query benchmark CI failure and increase embedding timeout#749carlos-alm merged 4 commits intomainfrom
Conversation
The query benchmark's benchDiffImpact reads row.file from the DB and joins it with the repo root, but on CI the npm-installed buildGraph stores paths that include the root directory prefix without a leading slash. This produces doubled paths like /root/root/src/... causing ENOENT. Add resolveDbFile() helper that detects absolute-like paths and falls back to prepending '/' when the joined path doesn't exist. Also increase embedding benchmark per-model timeout from 10→20 minutes and add explicit workflow timeout (180 min) so model downloads on cold CI caches don't cause silent failures.
|
Claude finished @carlos-alm's task —— View job PR Review AnalysisReview Checklist:
Root Cause AnalysisProblem Being Solved: Two distinct CI failures in benchmarking workflows:
Why This Approach:
Risk Assessment: ✅ LOW RISK - Fixes real environmental differences without relaxing correctness standards Configuration Change DetectionChanges Identified:
Assessment: ✅ JUSTIFIED - CI cold cache model downloads legitimately require more time. This addresses infrastructure reality, not code quality issues. Test Integrity AssessmentAssessment: ✅ NO DEGRADATION
Code Quality ReviewresolveDbFile() function analysis (query-benchmark.ts:184-192):
Integration quality:
Backlog ComplianceZero-dep: ✅ - No new runtime dependencies, builds on existing fs/path APIs Foundation-aligned: ✅ - Fully compliant
Problem-fit (1-5): 3 - Improves tool reliability but doesn't directly address core AI navigation problem. Infrastructure fix that enables benchmarking to continue working. Breaking: No - Purely additive error handling, no API/output format changes Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓) Critical ConcernsNone identified. This is a well-executed infrastructure fix. Final Recommendation
Minor suggestion: Consider adding a comment in the workflow explaining why 20min timeout is needed (cold cache model downloads) for future maintainers. |
Greptile SummaryThis PR fixes two benchmark CI reliability issues: a doubled-path ENOENT crash in the query benchmark when Confidence Score: 5/5Safe to merge — all prior P1 concerns resolved, no new issues found. The No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[benchDiffImpact called] --> B[Query DB for hub node file path]
B --> C{row found?}
C -- No --> D[Return zero result]
C -- Yes --> E[resolveDbFile rootDir, dbFile]
E --> F{path.isAbsolute?}
F -- Yes --> G{existsSync absolute?}
G -- Yes --> H[Return absolute path]
G -- No --> I[Return null]
F -- No --> J[joined = path.join rootDir, dbFile]
J --> K{existsSync joined?}
K -- Yes --> L[Return joined]
K -- No --> M[withSlash = '/' + dbFile]
M --> N{existsSync withSlash?}
N -- Yes --> O[Return withSlash]
N -- No --> I
H --> P{hubFile null?}
L --> P
O --> P
I --> P
P -- Yes --> Q[Log error, return zero result]
P -- No --> R[Read file, write probe, git add, run diffImpact]
R --> S[finally: git restore, rewrite original]
S --> T[Return benchmark metrics]
Reviews (2): Last reviewed commit: "fix(bench): return null from resolveDbFi..." | Re-trigger Greptile |
scripts/query-benchmark.ts
Outdated
| function resolveDbFile(rootDir: string, dbFile: string): string { | ||
| if (path.isAbsolute(dbFile)) return dbFile; | ||
| const joined = path.join(rootDir, dbFile); | ||
| if (fs.existsSync(joined)) return joined; | ||
| // DB may store an absolute path without the leading '/' | ||
| const withSlash = '/' + dbFile; | ||
| if (fs.existsSync(withSlash)) return withSlash; | ||
| return joined; | ||
| } |
There was a problem hiding this comment.
Redundant
existsSync in outer caller after resolveDbFile
resolveDbFile already calls fs.existsSync(joined) and fs.existsSync(withSlash) — but it can still return a non-existent path (the final fallback return joined). The follow-up fs.existsSync(hubFile) in benchDiffImpact therefore correctly guards the missing-file case, but resolveDbFile itself performs up to two redundant filesystem stats before the outer check repeats them. Consider having resolveDbFile return null (or a tagged result) when no candidate exists, so the caller can log and bail without issuing a third existsSync.
There was a problem hiding this comment.
Fixed in 55652fb. resolveDbFile now returns null when no candidate file exists, and the caller checks for null directly — no more redundant existsSync calls.
|
|
||
| - name: Run embedding benchmark | ||
| if: steps.existing.outputs.skip != 'true' | ||
| timeout-minutes: 160 |
There was a problem hiding this comment.
Step timeout may be tight with full model downloads
TIMEOUT_MS is now 20 min per model; there are 7 models in MODELS. Worst-case sequential cold-cache run is 7 × 20 = 140 min, plus the "Wait for npm propagation" step (up to 10 min) and other setup. That leaves ≤ 10 min of headroom before the 160-min step timeout fires. If the npm-wait step runs and most models are cold, the step could still time out. Consider reducing to timeout-minutes: 165 at the job level while keeping the step at 160, or document the expected warm-cache baseline so future tuning has a reference.
There was a problem hiding this comment.
Fixed in 55652fb. Bumped job-level timeout from 180 to 195 min and added a comment explaining the budget calculation (7 models x 20 min + ~30 min setup headroom).
…749) - resolveDbFile now returns null when no candidate exists, eliminating redundant existsSync calls in the caller - Bump embedding-benchmark job timeout from 180 to 195 min to provide adequate headroom for 7 cold-cache model downloads at 20 min each
|
CI Status: All 3 test jobs (macos, ubuntu, windows) fail on the same 2 pre-existing test suites that also fail on
Neither test is touched by this PR (which only modifies All reviewer feedback (Greptile P2 comments, Claude suggestions) has been addressed in commit 55652fb. |
Summary
Query benchmark: Fix ENOENT from doubled file paths on CI. The
benchDiffImpactfunction readsrow.filefrom the DB and joins it with the repo root, but on CI the npm-installedbuildGraph(v3.8.0) stores paths that include the root directory prefix without a leading/— producing paths like/root/root/src/.... AddresolveDbFile()helper that detects absolute-like paths and falls back to prepending/when the joined path doesn't exist. Also adds a diagnostic log + graceful fallback instead of crashing the entire worker.Embedding benchmark: Increase per-model timeout from 10→20 minutes and add explicit workflow timeout (180 min). On cold CI caches, model downloads alone can exceed the 10-minute timeout, causing all 7 models to silently fail.
Test plan
node --experimental-strip-types --import ./scripts/ts-resolve-loader.js scripts/query-benchmark.ts --version dev)workflow_dispatchand confirm query-benchmark job succeeds