Skip to content

ci(bench): bump bench shard timeout_minutes to 120 — fixes stuck publish#161

Merged
polaz merged 1 commit into
mainfrom
fix/#160-bench-timeouts-2h
May 17, 2026
Merged

ci(bench): bump bench shard timeout_minutes to 120 — fixes stuck publish#161
polaz merged 1 commit into
mainfrom
fix/#160-bench-timeouts-2h

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 17, 2026

Summary

Raises bench shard timeout_minutes from 45/60/45 (per-target) to a uniform 120 min in .github/workflows/ci.yml (bench-matrix job, inside targets.json).

Why

PR #153 already bumped timeouts from 25/30 to 45/60/45, but slow shards (lazy, fast) STILL cancel at the cap. Latest example — run 26003106160 (sha bbc9db4, PR #159 merge):

  • Bench x86_64-gnu / lazy: started 21:54:12Z, completed 22:39:29Z → 45m17s ⇒ hit cap, cancelled
  • Bench x86_64-gnu / fast and Bench x86_64-musl / lazy cancelled same way

Cancelled shards cascade: benchmark-aggregate needs all shards → benchmark-pages skipped → dashboard at dev/bench/ stays stale.

Tests aren't cycling — they're just slow

Verified from log of a previously cancelled shard (job 76419168123, sha f3a6dad):

  • Iterations make linear progress through (level, scenario, codec_side, stream_variant) combinations
  • No gaps in timestamps — each criterion iteration ≈ 9s
  • Worst-case suite math: 11 lazy levels × ~5 scenarios × 2 sides × 2 stream variants × 2 ops ≈ 440 iterations × 9s ≈ 66 min

So 45 min was structurally insufficient. 120 min gives ~50% headroom on the slowest shards. GH-hosted runner cap is 360 min, so we're well within limits.

Acceptance

  • All three target entries (x86_64-gnu, i686-gnu, x86_64-musl) carry timeout_minutes: 120
  • Next main push completes all bench shards within the new cap
  • benchmark-aggregate succeeds → benchmark-pages runs → dashboard publishes

Test plan

  • YAML syntax validated
  • 5-min change; effect verified on next main run after merge

Closes #160

Summary by CodeRabbit

  • Chores
    • Updated CI/CD benchmark job timeout configurations to improve pipeline stability during extended benchmark runs.

Review Change Stack

The 45/60/45 caps from PR #153 still aren't enough for the slowest
shards (lazy, fast). Log inspection of cancelled jobs (job
76419168123, sha f3a6dad) shows linear progress with ~9s per
criterion iteration and no stalls — the bench suite simply has more
combinations (level × scenario × codec_side × stream_variant ×
{compress, decompress}) than fit in 45 min:

  ~11 levels × ~5 scenarios × 2 sides × 2 stream variants × 2 ops
    ≈ 440 iterations × 9s ≈ 66 min worst-case shard

Uniform 120 min cap (GH-hosted limit is 360 min) gives ~50% headroom
on the slowest shards and unblocks the publish chain so the
dev/bench dashboard stays current.
Copilot AI review requested due to automatic review settings May 17, 2026 22:45
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 73ecff91-04a4-48c1-a454-4807d9a66699

📥 Commits

Reviewing files that changed from the base of the PR and between bbc9db4 and f1ca42c.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml

📝 Walkthrough

Walkthrough

The pull request increases GitHub Actions CI timeout for three benchmark targets in the bench-matrix job from 45/60/45 minutes to 120 minutes to prevent timeout cancellation of slower-running benchmark shards.

Changes

Benchmark job timeout configuration

Layer / File(s) Summary
Benchmark target timeout_minutes updates
.github/workflows/ci.yml
bench-matrix entries for x86_64-gnu, i686-gnu, and x86_64-musl have timeout_minutes increased to 120 (from 45, 60, and 45 respectively) to accommodate longer-running lazy and fast benchmark shards.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

  • structured-world/structured-zstd#152: This PR directly addresses the accepted criteria from issue #160 by raising all three per-target timeout_minutes values to 120 to unblock benchmark shards that were being cancelled before completion.

Possibly related PRs

  • structured-world/structured-zstd#153: Both PRs modify .github/workflows/ci.yml's bench-matrix target matrix by changing the same per-target timeout_minutes values for x86_64-gnu, i686-gnu, and x86_64-musl (this PR further increases them to 120 after the retrieved PR's 45/60/45 bump).

Poem

🐰 Time flies when benchmarks run slow,
One twenty minutes—now we'll know!
Lazy shards and fast ones too,
Two hours to finish what they do. ✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/#160-bench-timeouts-2h

Comment @coderabbitai help to get the list of available commands and usage tips.

@polaz polaz merged commit f5a56dc into main May 17, 2026
18 of 19 checks passed
@polaz polaz deleted the fix/#160-bench-timeouts-2h branch May 17, 2026 22:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the CI benchmark workflow to prevent benchmark shard cancellations that block benchmark-aggregate and benchmark-pages, keeping the dev/bench/ dashboard up to date.

Changes:

  • Increased per-target benchmark shard timeout configuration to a uniform 120 minutes in the bench-matrix target inventory.
  • Ensures the benchmark job’s timeout-minutes: ${{ matrix.bench.timeout_minutes }} has sufficient headroom for slow shards (lazy, fast) across all targets.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci(bench): bump per-target timeout_minutes to 120 — current 45/60 caps still hit by lazy/fast shards

2 participants