Runloop: extracts parts and atomize state access by MOZGIII · Pull Request #236 · piercefreeman/waymark

MOZGIII · 2026-03-09T15:57:57Z

This PR refactors the main runloop body into smaller parts, and atomizes access to the variables by effectively hand-rolling the explicit closures as params.

This change helps by moving the runloop implementation from a waterfall of code to a more structured sequence of fn calls with explicit parameter captures, thus enabling high-level reasoning and overview.

The code (for now) is organized into two major pieces: parts and ops.

parts are what's going to hold the chunks of runloop, as in they were, pretty much following the linear logic and structure of the extracted code;
ops are the bits that were corresponding to the reusable operations previously residing at impl Runloop { ... }, and accessible via self

The access to both runloop local variables and self variables is now explicitly provided via params. This can a bit verbose, but this verbosity also provides an intuitive understanding as to how complex a block of code really is, and an explicit overview of what data it can access. We will rely on this later when we will be analyzing usage patterns and encapsulation designs for data used by the runloop and shards.

Further refactors are to be be added on top of this.
Among other things, notable things to do next are:

eliminate the RunLoop type altogether, and move to running the loop via a standalone fn;
before going into the actual loop, the runloop does a lot of extra setup work, spawns other loops and thread, and generally does things that are more fitting for the setup code than for the actual runloop; these parts should be separated into their own composable setup routines, and sometimes into whole separate subsystems / loops; this work should be accompanied by additional unit-testing of the newly separated and testable components;
switch from raw types (like Uuid) to newtypes (like LockId, InstanceId, ExecutionId, NodeId, etc) for added type-safety; together with this, give the variables more contextually-specific names - like lock_uuid should be this_execution_lock_id;
use nonempty-collections to straighten out the internal logic.

Why switch from passing vars via `self` to via `Params`?

Before, it was unclear what state each fn call involved when reading the code; now it is immediately clear.
This setup is a prerequisite for expressing the execution flow in terms of continuations, where each part depends on the results from the previous part. I didn't yet do the semantic analysis to confirm this is something we'd want here, but it makes sense that it would be the case intuitively.

To do:

extracts parts and ops
cleanup dead code
review the results, optimize the internal APIs, naming patterns and code placement
fix issues discovered by python tests
add some unit-tests
document the purpose of each part and what does it do
collapse the contexts to be a single parameter for non-owned variables (params)
self code-review

Note this PR adds a respectable 1.5% test coverage; in actuality there are new tests that are catching certain things that have already been covered by python tests, however that was more accidental than intentional; we explicitly add tests (at the unit level) to ensure the small logic bits we now have work as intednded.

github-actions · 2026-03-12T12:16:24Z

Coverage Report

Python Coverage

Metric	Coverage
Lines	75.8%
Branches	57.8%

Download HTML Report

Rust Coverage

Metric	Coverage
Lines	61.6% 🟢 (+1.6%)
Branches	N/A

Download HTML Report

_{Compared to main branch}

…ain runloop fn

piercefreeman

This refactor is shaping up really nicely. Leaving my initial comments below.

piercefreeman · 2026-03-12T16:02:05Z

+    let mut executor_shards = HashMap::from([(executor_id, 0usize)]);
+    let lock_tracker = InstanceLockTracker::new(Uuid::new_v4());
+    let mut inflight_actions: HashMap<Uuid, usize> = HashMap::new();
+    let mut inflight_dispatches: HashMap<Uuid, InflightActionDispatch> = HashMap::new();
+    let mut sleeping_nodes: HashMap<Uuid, SleepRequest> = HashMap::new();
+    let mut sleeping_by_instance: HashMap<Uuid, HashSet<Uuid>> = HashMap::new();
+    let mut blocked_until: HashMap<Uuid, DateTime<Utc>> = HashMap::new();
+    let mut barrier: CommitBarrier<ShardStep> = CommitBarrier::new();
+    let mut instances_done_pending: Vec<InstanceDone> = Vec::new();
+    let (sleep_tx, _sleep_rx) = tokio::sync::mpsc::unbounded_channel::<SleepWake>();
+
+    let step = ShardStep {
+        executor_id,
+        actions: vec![],
+        sleep_requests: vec![SleepRequest {
+            node_id,
+            wake_at: requested_wake_at,
+        }],
+        updates: None,
+        instance_done: None,
+    };
+
+    let mut worker_pool = MockWorkerPool::new();
+    worker_pool.expect_queue().never();
+
+    let before = Utc::now();
+    let params = super::Params {
+        executor_shards: &mut executor_shards,
+        lock_tracker: &lock_tracker,
+        inflight_actions: &mut inflight_actions,
+        inflight_dispatches: &mut inflight_dispatches,
+        sleeping_nodes: &mut sleeping_nodes,
+        sleeping_by_instance: &mut sleeping_by_instance,
+        blocked_until_by_instance: &mut blocked_until,
+        commit_barrier: &mut barrier,
+        instances_done_pending: &mut instances_done_pending,
+        sleep_tx: &sleep_tx,
+        worker_pool: &worker_pool,
+        skip_sleep: true,
+        step,
+    };


Seems like we're sharing the same basic constructors across all these test functions - a helper function that returns the default params (or a test-scoped extension to Params that allows for construction without these values) might make this code cleaner.

6283c11 (this PR) how is this? If this looks good enough I'll distribute the same approach to the rest of the tests

I agree the harness approach is cleaner (at least for these cases where we have to initialize a pretty large payload of params that are redundant across test functions).

I'm unclear on the separation of concerns between just directly mutating the harness properties via a harness.executor_shards = xyz (which you do to override the dicts) and the arguments that are intended to go in the params() signature. Would it make sense to just have "default" values for all of the parameters and go with harness assignment for them all? Or are we only expecting to house empty-collections within the harness and everything else is intended to be a params() param?

The separation of concerns is arbitrary; I think it is a mistake. Let me iterate a bit more on this to we can simplify; we can reduce the harness to just a bag of values really.

We probably don't want to generalize and reuse it among the tests for different parts - as it would needlessly broaden the scope of the tests. So each part would get a test harness that only accepts overrides via fields, and a direct params conversion.

On the separation of concerns - there is a bit of a practical thing though - the params instance holds some non-Copy fields by value by design, in this case step; so, those would still have to be passed explicitly via params argument. Also the mocks just have shorter access paths if they're provided from the outside - but there's actually no practical need to ever construct them separately - so I'm inlining them as well for now; if there's a need we can move them back to .params args.

Goes in after #236 This PR continues the runloop refactors, now focusing on removing more logic from the `impl Runloop { ... }` blocks. We are introducing purposely-built primitive with a strict API for the tracking of the available instance slots.

MOZGIII changed the title ~~Runloop: extracts parts and eliminate self use~~ Runloop: extracts parts and atomize state access Mar 9, 2026

MOZGIII changed the title ~~Runloop: extracts parts and atomize state access~~ Runloop: extracts parts and atomize state access [wip] Mar 9, 2026

MOZGIII requested a review from piercefreeman March 9, 2026 16:28

MOZGIII changed the base branch from main to mzg/r9 March 9, 2026 16:29

MOZGIII force-pushed the mzg/ws-2026-03-09 branch from 4107853 to 7c80534 Compare March 9, 2026 16:34

Base automatically changed from mzg/r9 to main March 10, 2026 00:50

MOZGIII force-pushed the mzg/ws-2026-03-09 branch 3 times, most recently from e9e47a3 to e8b0022 Compare March 12, 2026 12:11

MOZGIII force-pushed the mzg/ws-2026-03-09 branch from 017e486 to f7da748 Compare March 12, 2026 15:06

MOZGIII added 17 commits March 12, 2026 19:07

Extract prepend_timeout_completions_from_inflight_dispatches out of m…

6935bf0

…ain runloop fn

Refactor to extract step_pending_acks handling

dee5aa1

Extract handling of wakes and completions from runloop

022d7ab

Extract instances handling from the runloop

cb3d195

Extract failed instances

9f24a20

Extract steps handling logic

8e2c299

Extract blocked_until_by_instance from runloop

1d3a344

Refactor parts to use mod-aware naming pattern

090addf

Drop unused CoordinatorState

8bc38f8

Inline value_utils

2cbc3dd

Refactor the ops to match the parts pattern

a01c1fc

Switch to mod-aware naming scheme for ops

3bcedd5

Add unit tests

8e1ffcb

Use mockall

5d0b7fd

More tests

ced03b8

Add documentation

3a98a19

Simplify the API by merging all the fn arguments in the params arg

323396e

MOZGIII force-pushed the mzg/ws-2026-03-09 branch from f7da748 to 323396e Compare March 12, 2026 15:07

Add field docs

491a721

Use mockall correctly

ea74b4f

MOZGIII added the full-build label Mar 12, 2026

More explicit and readable tests

e52771e

MOZGIII changed the title ~~Runloop: extracts parts and atomize state access [wip]~~ Runloop: extracts parts and atomize state access Mar 12, 2026

MOZGIII changed the title ~~Runloop: extracts parts and atomize state access~~ Runloop: extracts parts and atomize state access [wip] Mar 12, 2026

MOZGIII added 3 commits March 12, 2026 20:35

Add step persist acks tests

1e33b94

Add more tests

5d551c2

Add sleep skip integration test

ace7fba

MOZGIII marked this pull request as ready for review March 12, 2026 16:47

MOZGIII changed the title ~~Runloop: extracts parts and atomize state access [wip]~~ Runloop: extracts parts and atomize state access Mar 12, 2026

MOZGIII added 5 commits March 12, 2026 21:02

Move integration tests to external lib tests

9dc5248

Internalize the leftover tests as utility tests and adjust accordingly

6560f58

Add lock utils tests

f098520

Add channel utils tests

d1c2cdb

Drop some trivial tests for the utils

9f61f8b

MOZGIII mentioned this pull request Mar 12, 2026

Runloop: refactor the available instance slots logic #240

Merged

piercefreeman reviewed Mar 12, 2026

View reviewed changes

MOZGIII added 7 commits March 13, 2026 01:43

Eliminate odd useless struct

c924b8c

Document what shard step is at apply confirmed step

5be7c19

Rename prepend_timeout_completions_from_inflight_dispatches to handle

37626b9

Add test harness to apply_confirmed_step

6283c11

Fix docs at blocked_until_by_instance

3e9fc0b

Separate code blocks with newlines at instances

1215ff5

Rename parts

5cea114

MOZGIII mentioned this pull request Mar 13, 2026

Simplification of the step persist ack part of the runloop #242

Open

Push more state into the TestHarness

5101065

piercefreeman merged commit 102d322 into main Mar 14, 2026
18 checks passed

piercefreeman deleted the mzg/ws-2026-03-09 branch March 14, 2026 01:15

Conversation

MOZGIII commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why switch from passing vars via self to via Params?

Uh oh!

github-actions Bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Python Coverage

Rust Coverage

Uh oh!

piercefreeman left a comment

Choose a reason for hiding this comment

Uh oh!

piercefreeman Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

MOZGIII Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

piercefreeman Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

MOZGIII Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

MOZGIII Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MOZGIII commented Mar 9, 2026 •

edited

Loading

Why switch from passing vars via `self` to via `Params`?

github-actions Bot commented Mar 12, 2026 •

edited

Loading