Worker processing comm by merceod · Pull Request #3 · mstar-project/mstar

merceod · 2026-02-27T21:28:35Z

Here's a summary of everything implemented:

Files Created

mminf/engine/init.py — Empty package init
mminf/engine/base.py — BaseEngine ABC, EngineType enum, StageBatch and StageOutput data classes
mminf/engine/enc_dec_engine.py — EncoderDecoderEngine for stateless forward passes (ViT, text emb, VAE)
mminf/engine/flow_engine.py — FlowEngine for single denoising step execution
mminf/engine/ar_engine.py — AREngine with PageAllocator, KVRequestState, FlashInfer-guarded prefill/decode, pause/resume
mminf/worker/engine_manager.py — EngineManager mapping stage names to engine instances
mminf/worker/micro_scheduler.py — MicroScheduler with priority-based stage selection (AR > Flow > EncDec)
mminf/worker/worker.py — Real Worker integrating SubgraphsManager, EngineManager, MicroScheduler, and MooncakeCommunicationManager
test/test_phase1.py — 20 tests covering PageAllocator, all engines, EngineManager, image_gen loop, SubgraphsManager integration, MicroScheduler, and prefill graph

Files Modified

mminf/communication/tensors.py — Completed start_read_tensors() and get_ready_tensors(), changed self.pending from dict to list[EventAndPointers], added request_id to EventAndPointers, made NameAndRequestId frozen/hashable, guarded
mooncake import
mminf/worker/dummy_worker.py — Fixed 3 bugs:
- SubgraphQueues.add_request/reset: used deepcopy(self.subgraph.section) instead of deepcopy(self.subgraph) (Subgraph vs GraphSection)
- SubgraphQueues.is_done: now checks waiting is None AND ready is empty (was prematurely marking subgraphs complete)
- SubgraphsManager.process_stage_outputs: skip back_to_conductor pointers and unknown stages when routing to workers
mminf/graph/request_queues.py — Fixed process_new_inputs to return ProcessedInputs instead of raw list when waiting is None

…pending from dict to list[EventAndPointers], added request_id to EventAndPointers, made NameAndRequestId frozen/hashable, guarded mooncake import

…t when waiting is None

…f.subgraph.section) instead of deepcopy(self.subgraph) (Subgraph vs GraphSection); (2) SubgraphQueues.is_done: now checks waiting is None AND ready is empty (was prematurely marking subgraphs complete); (3) SubgraphsManager.process_stage_outputs: skip back_to_conductor pointers and unknown stages when routing to workers

…er, and MooncakeCommunicationManager

…n loop, SubgraphsManager integration, MicroScheduler, and prefill graph

…ll/decode, pause/resume

…approach would be to raise an error or log a warning instead of silently skipping

kamahori · 2026-02-27T22:22:53Z

What's the distinction of engine vs worker?

PR #78 review issue #3. _apply_pending_stops_to_batch built the "loop-back inputs to drop from this iter's routing" set as every self-loop edge on the running node: stopped_loop_backs[rid] = { edge.name for edge in node.outputs if edge.next_node == node.name } That works for in-tree models (each AR node owns exactly one decode loop, so {self-loop edges} == {this loop's loop-back edges}), but contradicts the docstring's "the stopped loop's loop-back" promise. A node that participates in two distinct loops with disjoint loop- back edges would lose the SURVIVING loop's loop-back tensor when the OTHER loop stopped, since the broad self-loop filter doesn't know which edge belongs to which loop. Fix: thread per-loop attribution through stop_loops. - WorkerGraphQueues.stop_loops now returns the (edge.name, edge.next_node) pairs of the loop-back signals belonging to the loops that actually matched loop_names. The walker already finds matching DynamicLoops to call register_finished on; collect their _loop_back_signals at the same point. - WorkerGraphsManager.stop_loops aggregates and returns the union across worker graphs. Existing void callers (_drain_orphan_pending_stops) ignore the return — backward- compatible. - _apply_pending_stops_to_batch intersects node.outputs against the returned set: now drops only edges that are BOTH self-loops AND loop-back of a stopped loop. The self-loop guard preserves the existing consumer's filter shape (kept_for_routing only drops e.next_node == node.name && e.name in consumed_names). No in-tree model has the multi-loop-per-node shape today, so the visible behavior on Orpheus / BAGEL / Q3-Omni is unchanged. The fix tightens the docstring contract to match the code so the next model with that shape doesn't silently drop loop-back tensors. test/modular/test_stop_loops_filter.py drives WorkerGraphQueues .stop_loops directly with a hand-built section to verify the attribution. Five cases: - returns only stopped loops' loop-back signals - returns union when multiple loops stopped - empty when no loops match - register_finished only fires on matched loops - shared-node-name case from the PR comment: two loops, same GraphNode name "n", disjoint self-loop edges {a} and {b}; stopping loop_a returns {("a","n")} and never {("b","n")}

merceod added 12 commits February 27, 2026 13:18

Completed start_read_tensors() and get_ready_tensors(), changed self.…

13d1c8a

…pending from dict to list[EventAndPointers], added request_id to EventAndPointers, made NameAndRequestId frozen/hashable, guarded mooncake import

Fixed process_new_inputs to return ProcessedInputs instead of raw lis…

5afa98c

…t when waiting is None

EngineManager mapping stage names to engine instances

d235901

MicroScheduler with priority-based stage selection (AR > Flow > EncDec)

215c38e

Real Worker integrating SubgraphsManager, EngineManager, MicroSchedul…

0580588

…er, and MooncakeCommunicationManager

20 tests covering PageAllocator, all engines, EngineManager, image_ge…

8728b26

…n loop, SubgraphsManager integration, MicroScheduler, and prefill graph

Empty package init

2d7c4a3

AREngine with PageAllocator, KVRequestState, FlashInfer-guarded prefi…

6514a4a

…ll/decode, pause/resume

BaseEngine ABC, EngineType enum, StageBatch and StageOutput data classes

49a4f60

EncoderDecoderEngine for stateless forward passes (ViT, text emb, VAE)

e5e7f62

FlowEngine for single denoising step execution

4f806a1

merceod requested review from NSagan271, kamahori and sivginirmak February 27, 2026 21:28

unknown stages should probably be handled more explicitly. A better …

6f624c5

…approach would be to raise an error or log a warning instead of silently skipping

kamahori mentioned this pull request Feb 27, 2026

[Feat] initial implementation for API server #4

Merged

merceod merged commit 3d788ca into main Feb 28, 2026

kamahori deleted the worker_processing_comm branch March 14, 2026 00:38

NSagan271 mentioned this pull request Jun 17, 2026

Port the Qwen3-Omni multimodal encoders into M* and optimize them #131

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker processing comm#3

Worker processing comm#3
merceod merged 13 commits into
mainfrom
worker_processing_comm

merceod commented Feb 27, 2026

Uh oh!

kamahori commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

merceod commented Feb 27, 2026

Uh oh!

kamahori commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants