Skip to content

fix: lake: race condition in monitor queue#13559

Merged
tydeu merged 1 commit into
leanprover:masterfrom
tydeu:lake/queue-race
Apr 28, 2026
Merged

fix: lake: race condition in monitor queue#13559
tydeu merged 1 commit into
leanprover:masterfrom
tydeu:lake/queue-race

Conversation

@tydeu
Copy link
Copy Markdown
Member

@tydeu tydeu commented Apr 28, 2026

This PR fixes a race condition in the Lake build monitor's draining of the job queue.

Previously, the monitor drained the registered-jobs queue before checking task states. A finished job in that scan may have registered new jobs in its body just before terminating. Those jobs landed in the queue after the drain, so when no other unfinished jobs remained the monitor exited without ever observing them. If one of the missed jobs errors, this could produce an "uncaught top-level build failure" message alongside "Build completed successfully".

To fix this, the monitor loop is restructured so the queue is drained after task states are scanned, with an additional drain before the loop terminates. drainQueue and scanJobs are split out from poll, and Monitor.main performs an initial drain to capture any jobs registered before the monitor starts iterating. Finishing jobs cannot register further work once their state transitions, so a single post-scan drain is sufficient to close the window.

🤖 Prepared with Claude Code

@tydeu tydeu added changelog-lake Lake lake-ci Run all Lake tests labels Apr 28, 2026
@github-actions github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 28, 2026
@leanprover-bot
Copy link
Copy Markdown
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-04-27 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-04-28 22:13:01)

@github-actions github-actions Bot added the mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN label Apr 28, 2026
@tydeu tydeu marked this pull request as ready for review April 28, 2026 22:38
@tydeu tydeu added this pull request to the merge queue Apr 28, 2026
@mathlib-lean-pr-testing mathlib-lean-pr-testing Bot added the builds-mathlib CI has verified that Mathlib builds against this PR label Apr 28, 2026
@mathlib-lean-pr-testing
Copy link
Copy Markdown

Mathlib CI status (docs):

Merged via the queue into leanprover:master with commit 9df737d Apr 28, 2026
47 of 51 checks passed
@tydeu tydeu deleted the lake/queue-race branch April 29, 2026 02:20
pull Bot pushed a commit to VitalyAnkh/lean4 that referenced this pull request May 11, 2026
This PR re-enables all of the Lake tests by default. The previous
flakiness appears to have been fixed by leanprover#13559, as multiple runs of
leanprover#8580 demonstrate. The `LAKE_CI` CMake setting and the `lake-ci` label
is kept to potentially enable expensive Lake tests in the future (e.g.,
the online tests that are not currently run in CI).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

builds-mathlib CI has verified that Mathlib builds against this PR changelog-lake Lake lake-ci Run all Lake tests mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants