Skip to content

fix(bench): resolve query benchmark CI failure and increase embedding timeout#749

Merged
carlos-alm merged 4 commits intomainfrom
fix/benchmark-ci-failures
Apr 3, 2026
Merged

fix(bench): resolve query benchmark CI failure and increase embedding timeout#749
carlos-alm merged 4 commits intomainfrom
fix/benchmark-ci-failures

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Query benchmark: Fix ENOENT from doubled file paths on CI. The benchDiffImpact function reads row.file from the DB and joins it with the repo root, but on CI the npm-installed buildGraph (v3.8.0) stores paths that include the root directory prefix without a leading / — producing paths like /root/root/src/.... Add resolveDbFile() helper that detects absolute-like paths and falls back to prepending / when the joined path doesn't exist. Also adds a diagnostic log + graceful fallback instead of crashing the entire worker.

  • Embedding benchmark: Increase per-model timeout from 10→20 minutes and add explicit workflow timeout (180 min). On cold CI caches, model downloads alone can exceed the 10-minute timeout, causing all 7 models to silently fail.

Test plan

  • Verify query benchmark passes locally (node --experimental-strip-types --import ./scripts/ts-resolve-loader.js scripts/query-benchmark.ts --version dev)
  • Trigger benchmark workflow via workflow_dispatch and confirm query-benchmark job succeeds
  • Confirm embedding-benchmark job no longer times out at 10 min per model

The query benchmark's benchDiffImpact reads row.file from the DB and
joins it with the repo root, but on CI the npm-installed buildGraph
stores paths that include the root directory prefix without a leading
slash.  This produces doubled paths like /root/root/src/... causing
ENOENT.

Add resolveDbFile() helper that detects absolute-like paths and falls
back to prepending '/' when the joined path doesn't exist.

Also increase embedding benchmark per-model timeout from 10→20 minutes
and add explicit workflow timeout (180 min) so model downloads on cold
CI caches don't cause silent failures.
@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Review Checklist:

  • Read and analyze changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance evaluation
  • Final assessment and recommendation

Root Cause Analysis

Problem Being Solved: Two distinct CI failures in benchmarking workflows:

  1. Query benchmark ENOENT errors from doubled file paths (/root/root/src/...) when npm-installed buildGraph v3.8.0 stores absolute-like paths without leading /
  2. Embedding benchmark timeouts during model downloads on cold CI caches (models timing out at 10min)

Why This Approach:

  • File path issue: Adds graceful fallback logic rather than masking symptoms - first tries joining normally, then handles the CI-specific edge case
  • Timeout issue: Addresses legitimate infrastructure constraint - cold cache model downloads genuinely exceed 10min on CI

Risk Assessment: ✅ LOW RISK - Fixes real environmental differences without relaxing correctness standards

Configuration Change Detection

Changes Identified:

  1. Embedding benchmark timeout: 10min → 20min per model (line 128 in embedding-benchmark.ts)
  2. Workflow timeouts: Added 180min job timeout + 160min step timeout in benchmark.yml

Assessment: ✅ JUSTIFIED - CI cold cache model downloads legitimately require more time. This addresses infrastructure reality, not code quality issues.

Test Integrity Assessment

Assessment: ✅ NO DEGRADATION

  • No test expectations lowered
  • No assertions weakened
  • Added graceful error handling with diagnostic logging instead of crashing
  • File resolution fix includes explicit existence checks before proceeding

Code Quality Review

resolveDbFile() function analysis (query-benchmark.ts:184-192):

  • ✅ Good: Handles both normal relative paths and CI edge case systematically
  • ✅ Good: Includes file existence validation
  • ✅ Good: Clear comments explain the CI-specific path format issue
  • ✅ Good: Graceful fallback with diagnostic logging vs silent failures
  • ✅ Good: TypeScript properly typed

Integration quality:

  • ✅ Consistent error handling pattern
  • ✅ Non-breaking - maintains backward compatibility
  • ✅ Adds resilience without compromising correctness

Backlog Compliance

Zero-dep: ✅ - No new runtime dependencies, builds on existing fs/path APIs

Foundation-aligned: ✅ - Fully compliant

  • Maintains P1 (always-current): Fixes rebuild failures that would break currency
  • Preserves P4 (zero-cost core): No external APIs added
  • Respects P5 (embeddable-first): Improves reliability without breaking contracts

Problem-fit (1-5): 3 - Improves tool reliability but doesn't directly address core AI navigation problem. Infrastructure fix that enables benchmarking to continue working.

Breaking: No - Purely additive error handling, no API/output format changes

Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed infrastructure fix.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Solid technical solution addressing real CI environment differences. Fixes legitimate infrastructure constraints (model download times) and path format discrepancies without compromising code quality or test rigor. The resolveDbFile() implementation demonstrates good defensive programming practices.

Minor suggestion: Consider adding a comment in the workflow explaining why 20min timeout is needed (cold cache model downloads) for future maintainers.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR fixes two benchmark CI reliability issues: a doubled-path ENOENT crash in the query benchmark when buildGraph stores absolute-like paths without a leading /, and silent embedding benchmark timeouts caused by cold-cache model downloads exceeding the old 10-minute limit. Both the resolveDbFile null-return fix and the timeout budget corrections raised in earlier review rounds are correctly addressed in this head commit.

Confidence Score: 5/5

Safe to merge — all prior P1 concerns resolved, no new issues found.

The resolveDbFile helper correctly handles all three path forms (relative, absolute, absolute-without-slash) and returns null cleanly when nothing matches. The timeout arithmetic (7 × 20 min = 140 min step + 195 min job) is sound. Previous review concerns about redundant existsSync and step-vs-job timeout headroom are both fixed in this commit. All remaining findings are at most P2; there are no open correctness or reliability issues.

No files require special attention.

Important Files Changed

Filename Overview
scripts/query-benchmark.ts Adds resolveDbFile() helper that handles absolute-like DB paths missing a leading /, returns null when no candidate exists, and updates benchDiffImpact to log + bail gracefully instead of crashing.
scripts/embedding-benchmark.ts Doubles per-model timeout from 600 s to 1200 s (20 min) to accommodate cold-cache model downloads on CI.
.github/workflows/benchmark.yml Adds job-level timeout-minutes: 195 (7 models × 20 min + 30 min headroom) and step-level timeout-minutes: 160 for the embedding benchmark run step.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[benchDiffImpact called] --> B[Query DB for hub node file path]
    B --> C{row found?}
    C -- No --> D[Return zero result]
    C -- Yes --> E[resolveDbFile rootDir, dbFile]
    E --> F{path.isAbsolute?}
    F -- Yes --> G{existsSync absolute?}
    G -- Yes --> H[Return absolute path]
    G -- No --> I[Return null]
    F -- No --> J[joined = path.join rootDir, dbFile]
    J --> K{existsSync joined?}
    K -- Yes --> L[Return joined]
    K -- No --> M[withSlash = '/' + dbFile]
    M --> N{existsSync withSlash?}
    N -- Yes --> O[Return withSlash]
    N -- No --> I
    H --> P{hubFile null?}
    L --> P
    O --> P
    I --> P
    P -- Yes --> Q[Log error, return zero result]
    P -- No --> R[Read file, write probe, git add, run diffImpact]
    R --> S[finally: git restore, rewrite original]
    S --> T[Return benchmark metrics]
Loading

Reviews (2): Last reviewed commit: "fix(bench): return null from resolveDbFi..." | Re-trigger Greptile

Comment on lines +184 to +192
function resolveDbFile(rootDir: string, dbFile: string): string {
if (path.isAbsolute(dbFile)) return dbFile;
const joined = path.join(rootDir, dbFile);
if (fs.existsSync(joined)) return joined;
// DB may store an absolute path without the leading '/'
const withSlash = '/' + dbFile;
if (fs.existsSync(withSlash)) return withSlash;
return joined;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant existsSync in outer caller after resolveDbFile

resolveDbFile already calls fs.existsSync(joined) and fs.existsSync(withSlash) — but it can still return a non-existent path (the final fallback return joined). The follow-up fs.existsSync(hubFile) in benchDiffImpact therefore correctly guards the missing-file case, but resolveDbFile itself performs up to two redundant filesystem stats before the outer check repeats them. Consider having resolveDbFile return null (or a tagged result) when no candidate exists, so the caller can log and bail without issuing a third existsSync.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 55652fb. resolveDbFile now returns null when no candidate file exists, and the caller checks for null directly — no more redundant existsSync calls.


- name: Run embedding benchmark
if: steps.existing.outputs.skip != 'true'
timeout-minutes: 160
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Step timeout may be tight with full model downloads

TIMEOUT_MS is now 20 min per model; there are 7 models in MODELS. Worst-case sequential cold-cache run is 7 × 20 = 140 min, plus the "Wait for npm propagation" step (up to 10 min) and other setup. That leaves ≤ 10 min of headroom before the 160-min step timeout fires. If the npm-wait step runs and most models are cold, the step could still time out. Consider reducing to timeout-minutes: 165 at the job level while keeping the step at 160, or document the expected warm-cache baseline so future tuning has a reference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 55652fb. Bumped job-level timeout from 180 to 195 min and added a comment explaining the budget calculation (7 models x 20 min + ~30 min setup headroom).

…749)

- resolveDbFile now returns null when no candidate exists, eliminating
  redundant existsSync calls in the caller
- Bump embedding-benchmark job timeout from 180 to 195 min to provide
  adequate headroom for 7 cold-cache model downloads at 20 min each
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

CI Status: All 3 test jobs (macos, ubuntu, windows) fail on the same 2 pre-existing test suites that also fail on main:

  1. tests/integration/watcher-rebuild.test.ts — incremental rebuildFile produces extra nodes/edges vs full build (40 vs 33 nodes, 54 vs 45 edges). Tracked in fix(test): watcher-rebuild parity test produces extra nodes/edges after Rust pipeline changes #759.
  2. tests/search/embedding-regression.test.tsbuildEmbeddings produces 0 embeddings, all search assertions fail. Tracked in fix(test): embedding-regression test fails — buildEmbeddings produces 0 embeddings #760.

Neither test is touched by this PR (which only modifies scripts/query-benchmark.ts, scripts/embedding-benchmark.ts, and .github/workflows/benchmark.yml). These failures are identical on main at commit a058615.

All reviewer feedback (Greptile P2 comments, Claude suggestions) has been addressed in commit 55652fb.

@carlos-alm carlos-alm merged commit ea1c5cc into main Apr 3, 2026
11 of 12 checks passed
@carlos-alm carlos-alm deleted the fix/benchmark-ci-failures branch April 3, 2026 02:40
@github-actions github-actions bot locked and limited conversation to collaborators Apr 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants