Support Optimistic Execution in Scheduler#70
Merged
Conversation
| PR Review: Support Optimistic Execution in Scheduler. The core fix moves _checkForFailedWorkerRequests() to run unconditionally at the top of every scheduler tick (instead of only inside _runScheduler when fetchCount > 0). This is correct - failed requests were previously stuck in PROCESSING indefinitely when no new work arrived. Issues: (1) Stale docstring on _runScheduler still lists crash recovery as step 1 - should be removed/renumbered. (2) No upper bound on crashRecoveryGracePeriod - a misconfiguration like 86400s would silently disable recovery. (3) SCHEDULER_BASE_EFFORT bump from 700 to 1200 is unjustified in comments. Grace period logic is correct: checkAfterTimestamp = workerScheduledTimestamp + gracePeriod accounts properly for optimistic execution. Default of 10s is reasonable vs 1s wakeup interval. Test coverage gap for _checkForFailedWorkerRequests is pre-existing. Overall: correct and well-motivated fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pending > 0.Problem
The FlowTransactionScheduler uses an optimistic status update pattern where scheduled transactions are marked as
Executedbefore the handler code actually runs. This is done to enable concurrent execution (see FlowTransactionScheduler.cdc lines 1315-1321):This caused
_checkForFailedWorkerRequeststo incorrectly mark workers as failed when:Executed)Executedstatus + entry still inscheduledRequests= assumed panicSolution
crashRecoveryGracePeriod, default 10 seconds) before checking for failed workers_checkForFailedWorkerRequeststo the beginning ofexecute()for clearer execution flowsetCrashRecoveryGracePeriod()to adjust the grace period if neededThe crash recovery logic now waits
scheduledTimestamp + gracePeriodbefore checking if a worker failed, giving the handler enough time to complete execution and remove itself fromscheduledRequests.Changes
crashRecoveryGracePeriodconfiguration variable (default: 10.0 seconds)Admin.setCrashRecoveryGracePeriod()function_checkForFailedWorkerRequestscall from_runSchedulertoexecute()currentTimestamp <= scheduledTimestamp + gracePeriodSCHEDULER_BASE_EFFORTfrom 700 to 1200SCHEDULER_WAKEUP_INTERVALfrom 2 to 1