Support Optimistic Execution in Scheduler by nvdtf · Pull Request #70 · onflow/FlowYieldVaultsEVM

nvdtf · 2026-03-13T20:19:24Z

Summary

Check for failed requests on every scheduler run. Previously, it was only checked when pending > 0.
Fixes crash recovery race condition caused by FlowTransactionScheduler's optimistic execution status updates.

Problem

The FlowTransactionScheduler uses an optimistic status update pattern where scheduled transactions are marked as Executed before the handler code actually runs. This is done to enable concurrent execution (see FlowTransactionScheduler.cdc lines 1315-1321):

// after pending execution event is emitted we set the transaction as executed because we
// must rely on execution node to actually execute it. Execution of the transaction is
// done in a separate transaction that calls executeTransaction(id) function.
// Executing the transaction can not update the status of transaction or any other shared state,
// since that blocks concurrent transaction execution.
// Therefore an optimistic update to executed is made here to avoid race condition.
tx.setStatus(newStatus: Status.Executed)

This caused _checkForFailedWorkerRequests to incorrectly mark workers as failed when:

Worker transactions were picked up for execution (status = Executed)
But the actual handler code hadn't finished running yet
The scheduler saw Executed status + entry still in scheduledRequests = assumed panic

Solution

Added configurable grace period (crashRecoveryGracePeriod, default 10 seconds) before checking for failed workers
Moved _checkForFailedWorkerRequests to the beginning of execute() for clearer execution flow
Added admin function setCrashRecoveryGracePeriod() to adjust the grace period if needed

The crash recovery logic now waits scheduledTimestamp + gracePeriod before checking if a worker failed, giving the handler enough time to complete execution and remove itself from scheduledRequests.

Changes

Added crashRecoveryGracePeriod configuration variable (default: 10.0 seconds)
Added Admin.setCrashRecoveryGracePeriod() function
Moved _checkForFailedWorkerRequests call from _runScheduler to execute()
Updated grace period check: currentTimestamp <= scheduledTimestamp + gracePeriod
Increased SCHEDULER_BASE_EFFORT from 700 to 1200
Updated test config SCHEDULER_WAKEUP_INTERVAL from 2 to 1

claude · 2026-03-13T20:23:20Z

PR Review: Support Optimistic Execution in Scheduler. The core fix moves _checkForFailedWorkerRequests() to run unconditionally at the top of every scheduler tick (instead of only inside _runScheduler when fetchCount > 0). This is correct - failed requests were previously stuck in PROCESSING indefinitely when no new work arrived. Issues: (1) Stale docstring on _runScheduler still lists crash recovery as step 1 - should be removed/renumbered. (2) No upper bound on crashRecoveryGracePeriod - a misconfiguration like 86400s would silently disable recovery. (3) SCHEDULER_BASE_EFFORT bump from 700 to 1200 is unjustified in comments. Grace period logic is correct: checkAfterTimestamp = workerScheduledTimestamp + gracePeriod accounts properly for optimistic execution. Default of 10s is reasonable vs 1s wakeup interval. Test coverage gap for _checkForFailedWorkerRequests is pre-existing. Overall: correct and well-motivated fix.

liobrasil

LGTM

support optimistic execution in scheduler

e20daa8

nvdtf requested a review from liobrasil March 13, 2026 20:29

liobrasil approved these changes Mar 13, 2026

View reviewed changes

liobrasil merged commit 38e5cc4 into main Mar 13, 2026
6 checks passed

nvdtf mentioned this pull request Mar 13, 2026

fix: harden supervisor recovery and stuck scan onflow/FlowYieldVaults#207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Optimistic Execution in Scheduler#70

Support Optimistic Execution in Scheduler#70
liobrasil merged 1 commit intomainfrom
navid/scheduler-optimistic-execution-fix

nvdtf commented Mar 13, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 13, 2026

Uh oh!

liobrasil left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nvdtf commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Uh oh!

claude bot commented Mar 13, 2026

Uh oh!

liobrasil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nvdtf commented Mar 13, 2026 •

edited

Loading