fix(bench): resolve query benchmark CI failure and increase embedding timeout by carlos-alm · Pull Request #749 · optave/ops-codegraph-tool

carlos-alm · 2026-04-02T06:38:15Z

Summary

Query benchmark: Fix ENOENT from doubled file paths on CI. The benchDiffImpact function reads row.file from the DB and joins it with the repo root, but on CI the npm-installed buildGraph (v3.8.0) stores paths that include the root directory prefix without a leading / — producing paths like /root/root/src/.... Add resolveDbFile() helper that detects absolute-like paths and falls back to prepending / when the joined path doesn't exist. Also adds a diagnostic log + graceful fallback instead of crashing the entire worker.
Embedding benchmark: Increase per-model timeout from 10→20 minutes and add explicit workflow timeout (180 min). On cold CI caches, model downloads alone can exceed the 10-minute timeout, causing all 7 models to silently fail.

Test plan

Verify query benchmark passes locally (node --experimental-strip-types --import ./scripts/ts-resolve-loader.js scripts/query-benchmark.ts --version dev)
Trigger benchmark workflow via workflow_dispatch and confirm query-benchmark job succeeds
Confirm embedding-benchmark job no longer times out at 10 min per model

The query benchmark's benchDiffImpact reads row.file from the DB and joins it with the repo root, but on CI the npm-installed buildGraph stores paths that include the root directory prefix without a leading slash. This produces doubled paths like /root/root/src/... causing ENOENT. Add resolveDbFile() helper that detects absolute-like paths and falls back to prepending '/' when the joined path doesn't exist. Also increase embedding benchmark per-model timeout from 10→20 minutes and add explicit workflow timeout (180 min) so model downloads on cold CI caches don't cause silent failures.

claude · 2026-04-02T06:38:28Z

Claude finished @carlos-alm's task —— View job

PR Review Analysis

Review Checklist:

Read and analyze changed files
Phase 1: Root cause analysis
Phase 2: Configuration change detection
Phase 3: Test integrity assessment
Phase 4: Code quality review
Phase 5: Backlog compliance evaluation
Final assessment and recommendation

Root Cause Analysis

Problem Being Solved: Two distinct CI failures in benchmarking workflows:

Query benchmark ENOENT errors from doubled file paths (/root/root/src/...) when npm-installed buildGraph v3.8.0 stores absolute-like paths without leading /
Embedding benchmark timeouts during model downloads on cold CI caches (models timing out at 10min)

Why This Approach:

File path issue: Adds graceful fallback logic rather than masking symptoms - first tries joining normally, then handles the CI-specific edge case
Timeout issue: Addresses legitimate infrastructure constraint - cold cache model downloads genuinely exceed 10min on CI

Risk Assessment: ✅ LOW RISK - Fixes real environmental differences without relaxing correctness standards

Configuration Change Detection

Changes Identified:

Embedding benchmark timeout: 10min → 20min per model (line 128 in embedding-benchmark.ts)
Workflow timeouts: Added 180min job timeout + 160min step timeout in benchmark.yml

Assessment: ✅ JUSTIFIED - CI cold cache model downloads legitimately require more time. This addresses infrastructure reality, not code quality issues.

Test Integrity Assessment

Assessment: ✅ NO DEGRADATION

No test expectations lowered
No assertions weakened
Added graceful error handling with diagnostic logging instead of crashing
File resolution fix includes explicit existence checks before proceeding

Code Quality Review

resolveDbFile() function analysis (query-benchmark.ts:184-192):

✅ Good: Handles both normal relative paths and CI edge case systematically
✅ Good: Includes file existence validation
✅ Good: Clear comments explain the CI-specific path format issue
✅ Good: Graceful fallback with diagnostic logging vs silent failures
✅ Good: TypeScript properly typed

Integration quality:

✅ Consistent error handling pattern
✅ Non-breaking - maintains backward compatibility
✅ Adds resilience without compromising correctness

Backlog Compliance

Zero-dep: ✅ - No new runtime dependencies, builds on existing fs/path APIs

Foundation-aligned: ✅ - Fully compliant

Maintains P1 (always-current): Fixes rebuild failures that would break currency
Preserves P4 (zero-cost core): No external APIs added
Respects P5 (embeddable-first): Improves reliability without breaking contracts

Problem-fit (1-5): 3 - Improves tool reliability but doesn't directly address core AI navigation problem. Infrastructure fix that enables benchmarking to continue working.

Breaking: No - Purely additive error handling, no API/output format changes

Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed infrastructure fix.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Solid technical solution addressing real CI environment differences. Fixes legitimate infrastructure constraints (model download times) and path format discrepancies without compromising code quality or test rigor. The resolveDbFile() implementation demonstrates good defensive programming practices.

Minor suggestion: Consider adding a comment in the workflow explaining why 20min timeout is needed (cold cache model downloads) for future maintainers.

greptile-apps · 2026-04-02T06:40:23Z

Greptile Summary

This PR fixes two benchmark CI reliability issues: a doubled-path ENOENT crash in the query benchmark when buildGraph stores absolute-like paths without a leading /, and silent embedding benchmark timeouts caused by cold-cache model downloads exceeding the old 10-minute limit. Both the resolveDbFile null-return fix and the timeout budget corrections raised in earlier review rounds are correctly addressed in this head commit.

Confidence Score: 5/5

Safe to merge — all prior P1 concerns resolved, no new issues found.

The resolveDbFile helper correctly handles all three path forms (relative, absolute, absolute-without-slash) and returns null cleanly when nothing matches. The timeout arithmetic (7 × 20 min = 140 min step + 195 min job) is sound. Previous review concerns about redundant existsSync and step-vs-job timeout headroom are both fixed in this commit. All remaining findings are at most P2; there are no open correctness or reliability issues.

No files require special attention.

Important Files Changed

Filename	Overview
scripts/query-benchmark.ts	Adds `resolveDbFile()` helper that handles absolute-like DB paths missing a leading `/`, returns `null` when no candidate exists, and updates `benchDiffImpact` to log + bail gracefully instead of crashing.
scripts/embedding-benchmark.ts	Doubles per-model timeout from 600 s to 1200 s (20 min) to accommodate cold-cache model downloads on CI.
.github/workflows/benchmark.yml	Adds job-level `timeout-minutes: 195` (7 models × 20 min + 30 min headroom) and step-level `timeout-minutes: 160` for the embedding benchmark run step.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[benchDiffImpact called] --> B[Query DB for hub node file path]
    B --> C{row found?}
    C -- No --> D[Return zero result]
    C -- Yes --> E[resolveDbFile rootDir, dbFile]
    E --> F{path.isAbsolute?}
    F -- Yes --> G{existsSync absolute?}
    G -- Yes --> H[Return absolute path]
    G -- No --> I[Return null]
    F -- No --> J[joined = path.join rootDir, dbFile]
    J --> K{existsSync joined?}
    K -- Yes --> L[Return joined]
    K -- No --> M[withSlash = '/' + dbFile]
    M --> N{existsSync withSlash?}
    N -- Yes --> O[Return withSlash]
    N -- No --> I
    H --> P{hubFile null?}
    L --> P
    O --> P
    I --> P
    P -- Yes --> Q[Log error, return zero result]
    P -- No --> R[Read file, write probe, git add, run diffImpact]
    R --> S[finally: git restore, rewrite original]
    S --> T[Return benchmark metrics]

_{Reviews (2): Last reviewed commit: "fix(bench): return null from resolveDbFi..." | Re-trigger Greptile}

greptile-apps · 2026-04-02T06:40:27Z

scripts/query-benchmark.ts

+function resolveDbFile(rootDir: string, dbFile: string): string {
+	if (path.isAbsolute(dbFile)) return dbFile;
+	const joined = path.join(rootDir, dbFile);
+	if (fs.existsSync(joined)) return joined;
+	// DB may store an absolute path without the leading '/'
+	const withSlash = '/' + dbFile;
+	if (fs.existsSync(withSlash)) return withSlash;
+	return joined;
+}


Redundant existsSync in outer caller after resolveDbFile

resolveDbFile already calls fs.existsSync(joined) and fs.existsSync(withSlash) — but it can still return a non-existent path (the final fallback return joined). The follow-up fs.existsSync(hubFile) in benchDiffImpact therefore correctly guards the missing-file case, but resolveDbFile itself performs up to two redundant filesystem stats before the outer check repeats them. Consider having resolveDbFile return null (or a tagged result) when no candidate exists, so the caller can log and bail without issuing a third existsSync.

Fixed in 55652fb. resolveDbFile now returns null when no candidate file exists, and the caller checks for null directly — no more redundant existsSync calls.

greptile-apps · 2026-04-02T06:40:28Z

.github/workflows/benchmark.yml


      - name: Run embedding benchmark
        if: steps.existing.outputs.skip != 'true'
+        timeout-minutes: 160


Step timeout may be tight with full model downloads

TIMEOUT_MS is now 20 min per model; there are 7 models in MODELS. Worst-case sequential cold-cache run is 7 × 20 = 140 min, plus the "Wait for npm propagation" step (up to 10 min) and other setup. That leaves ≤ 10 min of headroom before the 160-min step timeout fires. If the npm-wait step runs and most models are cold, the step could still time out. Consider reducing to timeout-minutes: 165 at the job level while keeping the step at 160, or document the expected warm-cache baseline so future tuning has a reference.

Fixed in 55652fb. Bumped job-level timeout from 180 to 195 min and added a comment explaining the budget calculation (7 models x 20 min + ~30 min setup headroom).

…749) - resolveDbFile now returns null when no candidate exists, eliminating redundant existsSync calls in the caller - Bump embedding-benchmark job timeout from 180 to 195 min to provide adequate headroom for 7 cold-cache model downloads at 20 min each

carlos-alm · 2026-04-02T21:47:58Z

@greptileai

carlos-alm · 2026-04-02T22:58:38Z

CI Status: All 3 test jobs (macos, ubuntu, windows) fail on the same 2 pre-existing test suites that also fail on main:

tests/integration/watcher-rebuild.test.ts — incremental rebuildFile produces extra nodes/edges vs full build (40 vs 33 nodes, 54 vs 45 edges). Tracked in fix(test): watcher-rebuild parity test produces extra nodes/edges after Rust pipeline changes #759.
tests/search/embedding-regression.test.ts — buildEmbeddings produces 0 embeddings, all search assertions fail. Tracked in fix(test): embedding-regression test fails — buildEmbeddings produces 0 embeddings #760.

Neither test is touched by this PR (which only modifies scripts/query-benchmark.ts, scripts/embedding-benchmark.ts, and .github/workflows/benchmark.yml). These failures are identical on main at commit a058615.

All reviewer feedback (Greptile P2 comments, Claude suggestions) has been addressed in commit 55652fb.

greptile-apps bot reviewed Apr 2, 2026

View reviewed changes

This was referenced Apr 2, 2026

fix(test): watcher-rebuild parity test produces extra nodes/edges after Rust pipeline changes #759

Closed

fix(test): embedding-regression test fails — buildEmbeddings produces 0 embeddings #760

Closed

carlos-alm added 2 commits April 2, 2026 20:00

Merge branch 'main' into fix/benchmark-ci-failures

ba250fd

Merge branch 'main' into fix/benchmark-ci-failures

cbb5503

carlos-alm merged commit ea1c5cc into main Apr 3, 2026
11 of 12 checks passed

carlos-alm deleted the fix/benchmark-ci-failures branch April 3, 2026 02:40

github-actions bot locked and limited conversation to collaborators Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bench): resolve query benchmark CI failure and increase embedding timeout#749

fix(bench): resolve query benchmark CI failure and increase embedding timeout#749
carlos-alm merged 4 commits intomainfrom
fix/benchmark-ci-failures

carlos-alm commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

carlos-alm Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

carlos-alm Apr 2, 2026

Uh oh!

carlos-alm commented Apr 2, 2026

Uh oh!

carlos-alm commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

carlos-alm commented Apr 2, 2026

Summary

Test plan

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Analysis

Root Cause Analysis

Configuration Change Detection

Test Integrity Assessment

Code Quality Review

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Apr 2, 2026

Uh oh!

carlos-alm commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Apr 2, 2026 •

edited

Loading

greptile-apps bot commented Apr 2, 2026 •

edited

Loading