Skip to content

Conversation

@dulinriley
Copy link
Contributor

Summary:
Minor change: before the "Instance::run" and "Instance::run_actor_tree" would both
try to change the ActorStatus to Stopping and Stopped. Going back and forth from a terminal
state could introduce timing errors on status watchers, which watch for Stopped or Failed.

Just do this once in run_actor_tree once all child actors are stopped and cleanup has run.

Reviewed By: mariusae

Differential Revision: D87082404

@meta-codesync
Copy link

meta-codesync bot commented Nov 14, 2025

@dulinriley has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87082404.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 14, 2025
dulinriley added a commit to dulinriley/monarch that referenced this pull request Nov 14, 2025
Summary:

Minor change: before the "Instance::run" and "Instance::run_actor_tree" would both
try to change the ActorStatus to Stopping and Stopped. Going back and forth from a terminal
state could introduce timing errors on status watchers, which watch for Stopped or Failed.

Just do this once in run_actor_tree once all child actors are stopped and cleanup has run.
Added an assert to `change_status` to ensure this is the case.

Reviewed By: mariusae

Differential Revision: D87082404
@dulinriley dulinriley force-pushed the export-D87082404 branch 2 times, most recently from 0a63c3d to f2329e6 Compare November 14, 2025 19:42
dulinriley added a commit to dulinriley/monarch that referenced this pull request Nov 14, 2025
Summary:

Minor change: before the "Instance::run" and "Instance::run_actor_tree" would both
try to change the ActorStatus to Stopping and Stopped. Going back and forth from a terminal
state could introduce timing errors on status watchers, which watch for Stopped or Failed.

Just do this once in run_actor_tree once all child actors are stopped and cleanup has run.
Added an assert to `change_status` to ensure this is the case.

Differential Revision: D87082404
dulinriley added a commit to dulinriley/monarch that referenced this pull request Nov 17, 2025
…ytorch#1889)

Summary:

Before the "Instance::run" and "Instance::run_actor_tree" would both
try to change the ActorStatus to Stopping and Stopped. Going back and forth from a terminal
state could introduce timing errors on status watchers, which watch for Stopped or Failed.

Just do this once in run_actor_tree once all child actors are stopped and cleanup has run.
Added an assert to `change_status` to ensure this is the case.
Also disallow changing from Stopped -> Failed. Actors should only have a single terminal state.

Reviewed By: mariusae

Differential Revision: D87082404
dulinriley added a commit to dulinriley/monarch that referenced this pull request Nov 18, 2025
…ytorch#1889)

Summary:

Before the "Instance::run" and "Instance::run_actor_tree" would both
try to change the ActorStatus to Stopping and Stopped. Going back and forth from a terminal
state could introduce timing errors on status watchers, which watch for Stopped or Failed.

Just do this once in run_actor_tree once all child actors are stopped and cleanup has run.
Added an assert to `change_status` to ensure this is the case.
Also disallow changing from Stopped -> Failed. Actors should only have a single terminal state.

Reviewed By: mariusae

Differential Revision: D87082404
…ytorch#1889)

Summary:

Before the "Instance::run" and "Instance::run_actor_tree" would both
try to change the ActorStatus to Stopping and Stopped. Going back and forth from a terminal
state could introduce timing errors on status watchers, which watch for Stopped or Failed.

Just do this once in run_actor_tree once all child actors are stopped and cleanup has run.
Added an assert to `change_status` to ensure this is the case.
Also disallow changing from Stopped -> Failed. Actors should only have a single terminal state.

Reviewed By: mariusae

Differential Revision: D87082404
@meta-codesync
Copy link

meta-codesync bot commented Nov 19, 2025

This pull request has been merged in 2760b20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants