Worker processing comm#3
Merged
Merged
Conversation
…pending from dict to list[EventAndPointers], added request_id to EventAndPointers, made NameAndRequestId frozen/hashable, guarded mooncake import
…t when waiting is None
…f.subgraph.section) instead of deepcopy(self.subgraph) (Subgraph vs GraphSection); (2) SubgraphQueues.is_done: now checks waiting is None AND ready is empty (was prematurely marking subgraphs complete); (3) SubgraphsManager.process_stage_outputs: skip back_to_conductor pointers and unknown stages when routing to workers
…er, and MooncakeCommunicationManager
…n loop, SubgraphsManager integration, MicroScheduler, and prefill graph
…ll/decode, pause/resume
…approach would be to raise an error or log a warning instead of silently skipping
Collaborator
|
What's the distinction of engine vs worker? |
shg8
added a commit
that referenced
this pull request
May 5, 2026
PR #78 review issue #3. _apply_pending_stops_to_batch built the "loop-back inputs to drop from this iter's routing" set as every self-loop edge on the running node: stopped_loop_backs[rid] = { edge.name for edge in node.outputs if edge.next_node == node.name } That works for in-tree models (each AR node owns exactly one decode loop, so {self-loop edges} == {this loop's loop-back edges}), but contradicts the docstring's "the stopped loop's loop-back" promise. A node that participates in two distinct loops with disjoint loop- back edges would lose the SURVIVING loop's loop-back tensor when the OTHER loop stopped, since the broad self-loop filter doesn't know which edge belongs to which loop. Fix: thread per-loop attribution through stop_loops. - WorkerGraphQueues.stop_loops now returns the (edge.name, edge.next_node) pairs of the loop-back signals belonging to the loops that actually matched loop_names. The walker already finds matching DynamicLoops to call register_finished on; collect their _loop_back_signals at the same point. - WorkerGraphsManager.stop_loops aggregates and returns the union across worker graphs. Existing void callers (_drain_orphan_pending_stops) ignore the return — backward- compatible. - _apply_pending_stops_to_batch intersects node.outputs against the returned set: now drops only edges that are BOTH self-loops AND loop-back of a stopped loop. The self-loop guard preserves the existing consumer's filter shape (kept_for_routing only drops e.next_node == node.name && e.name in consumed_names). No in-tree model has the multi-loop-per-node shape today, so the visible behavior on Orpheus / BAGEL / Q3-Omni is unchanged. The fix tightens the docstring contract to match the code so the next model with that shape doesn't silently drop loop-back tensors. test/modular/test_stop_loops_filter.py drives WorkerGraphQueues .stop_loops directly with a hand-built section to verify the attribution. Five cases: - returns only stopped loops' loop-back signals - returns union when multiple loops stopped - empty when no loops match - register_finished only fires on matched loops - shared-node-name case from the PR comment: two loops, same GraphNode name "n", disjoint self-loop edges {a} and {b}; stopping loop_a returns {("a","n")} and never {("b","n")}
shg8
added a commit
that referenced
this pull request
May 5, 2026
PR #78 review issue #3. _apply_pending_stops_to_batch built the "loop-back inputs to drop from this iter's routing" set as every self-loop edge on the running node: stopped_loop_backs[rid] = { edge.name for edge in node.outputs if edge.next_node == node.name } That works for in-tree models (each AR node owns exactly one decode loop, so {self-loop edges} == {this loop's loop-back edges}), but contradicts the docstring's "the stopped loop's loop-back" promise. A node that participates in two distinct loops with disjoint loop- back edges would lose the SURVIVING loop's loop-back tensor when the OTHER loop stopped, since the broad self-loop filter doesn't know which edge belongs to which loop. Fix: thread per-loop attribution through stop_loops. - WorkerGraphQueues.stop_loops now returns the (edge.name, edge.next_node) pairs of the loop-back signals belonging to the loops that actually matched loop_names. The walker already finds matching DynamicLoops to call register_finished on; collect their _loop_back_signals at the same point. - WorkerGraphsManager.stop_loops aggregates and returns the union across worker graphs. Existing void callers (_drain_orphan_pending_stops) ignore the return — backward- compatible. - _apply_pending_stops_to_batch intersects node.outputs against the returned set: now drops only edges that are BOTH self-loops AND loop-back of a stopped loop. The self-loop guard preserves the existing consumer's filter shape (kept_for_routing only drops e.next_node == node.name && e.name in consumed_names). No in-tree model has the multi-loop-per-node shape today, so the visible behavior on Orpheus / BAGEL / Q3-Omni is unchanged. The fix tightens the docstring contract to match the code so the next model with that shape doesn't silently drop loop-back tensors. test/modular/test_stop_loops_filter.py drives WorkerGraphQueues .stop_loops directly with a hand-built section to verify the attribution. Five cases: - returns only stopped loops' loop-back signals - returns union when multiple loops stopped - empty when no loops match - register_finished only fires on matched loops - shared-node-name case from the PR comment: two loops, same GraphNode name "n", disjoint self-loop edges {a} and {b}; stopping loop_a returns {("a","n")} and never {("b","n")}
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here's a summary of everything implemented:
Files Created
Files Modified
mooncake import
- SubgraphQueues.add_request/reset: used deepcopy(self.subgraph.section) instead of deepcopy(self.subgraph) (Subgraph vs GraphSection)
- SubgraphQueues.is_done: now checks waiting is None AND ready is empty (was prematurely marking subgraphs complete)
- SubgraphsManager.process_stage_outputs: skip back_to_conductor pointers and unknown stages when routing to workers