Skip to content

refactor(pb): fix actor overdue alarm handling#4770

Open
MasterPtato wants to merge 1 commit into04-23-fix_update_railway_docs_to_include_graceful_shutdownfrom
04-24-refactor_pb_fix_actor_overdue_alarm_handling
Open

refactor(pb): fix actor overdue alarm handling#4770
MasterPtato wants to merge 1 commit into04-23-fix_update_railway_docs_to_include_graceful_shutdownfrom
04-24-refactor_pb_fix_actor_overdue_alarm_handling

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

PR Review: refactor(pb): fix actor overdue alarm handling (#4770)

File: engine/packages/pegboard/src/workflows/actor2/runtime.rs

Summary

This PR refactors the overdue alarm check in handle_stopped by hoisting it out of the Decision::Sleep match arm and evaluating it before the stopped_res dispatch. The logic is correct and the refactor significantly reduces duplication.

Correctness

The logic change is correct. The previous Decision::Sleep arm duplicated the full Decision::Reallocate dispatch inline. The new approach converts decision from Sleep to Reallocate before the dispatch match, so the overdue-alarm path falls through to the single shared Decision::Reallocate arm. This avoids code duplication while preserving identical observable behavior.

state.alarm_ts is correctly cleared when the alarm has fired, consistent with the existing pattern elsewhere in the file.

The check is correctly scoped to Decision::Sleep only, so alarms do not interfere with Reallocate, Backoff, or Destroy decisions.

Potential Gap: Decision::Backoff with an Overdue Alarm

When an actor stops due to a failed allocation while an alarm is already overdue, both old and new code leave state.alarm_ts set and transition to Reallocating. This is not a regression introduced by this PR, but the intent is undocumented. Consider adding a comment above the alarm check explaining why Backoff is excluded.

Code Quality

The comment // Check alarm (line 564) is a sentence fragment. Per CLAUDE.md conventions, comments should be complete sentences. Suggested replacement: // Check whether an alarm fired while the actor was stopping and wake it immediately.

Workflow Replay Safety

Since GetTsInput {} is an existing #[activity] used in this workflow, hoisting it out of the Decision::Sleep arm changes the activity call order for workflows that previously hit this code path. Any in-flight workflow interrupted after computing Decision::Sleep but before or during the old overdue-alarm allocation block will replay with the new code and encounter GetTsInput at a different position in the event log.

Whether this is safe depends on how Gasoline handles activity sequencing during replay (strict position vs. type+input matching). Please confirm replay safety, or note that this is deployed only after draining any in-flight workflows in the affected state.

PR Description

The description is the default template with no context. For a deterministic workflow change it would be helpful to document: what the original bug was, how this was tested, and whether workflow replay safety was considered.

Overall

The refactor is clean and reduces real duplication. Main items to address: (1) confirm workflow replay safety, (2) optionally add a comment clarifying why Backoff is excluded from the alarm check, (3) expand the sentence-fragment comment on line 564 to a complete sentence.

@MasterPtato MasterPtato force-pushed the 04-24-refactor_pb_fix_actor_overdue_alarm_handling branch from 6851547 to 3cfcb16 Compare April 25, 2026 01:07
@MasterPtato MasterPtato force-pushed the 04-23-fix_update_railway_docs_to_include_graceful_shutdown branch from df97819 to 3ff43a4 Compare April 25, 2026 01:07
@MasterPtato MasterPtato force-pushed the 04-24-refactor_pb_fix_actor_overdue_alarm_handling branch from 3cfcb16 to 31253f7 Compare April 25, 2026 01:14
@MasterPtato MasterPtato force-pushed the 04-23-fix_update_railway_docs_to_include_graceful_shutdown branch from 3ff43a4 to 0a19554 Compare April 25, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant