Skip to content

Runloop: extracts parts and atomize state access#236

Merged
piercefreeman merged 36 commits intomainfrom
mzg/ws-2026-03-09
Mar 14, 2026
Merged

Runloop: extracts parts and atomize state access#236
piercefreeman merged 36 commits intomainfrom
mzg/ws-2026-03-09

Conversation

@MOZGIII
Copy link
Copy Markdown
Collaborator

@MOZGIII MOZGIII commented Mar 9, 2026

This PR refactors the main runloop body into smaller parts, and atomizes access to the variables by effectively hand-rolling the explicit closures as params.

This change helps by moving the runloop implementation from a waterfall of code to a more structured sequence of fn calls with explicit parameter captures, thus enabling high-level reasoning and overview.

The code (for now) is organized into two major pieces: parts and ops.

  • parts are what's going to hold the chunks of runloop, as in they were, pretty much following the linear logic and structure of the extracted code;
  • ops are the bits that were corresponding to the reusable operations previously residing at impl Runloop { ... }, and accessible via self

The access to both runloop local variables and self variables is now explicitly provided via params. This can a bit verbose, but this verbosity also provides an intuitive understanding as to how complex a block of code really is, and an explicit overview of what data it can access. We will rely on this later when we will be analyzing usage patterns and encapsulation designs for data used by the runloop and shards.

Further refactors are to be be added on top of this.
Among other things, notable things to do next are:

  • eliminate the RunLoop type altogether, and move to running the loop via a standalone fn;
  • before going into the actual loop, the runloop does a lot of extra setup work, spawns other loops and thread, and generally does things that are more fitting for the setup code than for the actual runloop; these parts should be separated into their own composable setup routines, and sometimes into whole separate subsystems / loops; this work should be accompanied by additional unit-testing of the newly separated and testable components;
  • switch from raw types (like Uuid) to newtypes (like LockId, InstanceId, ExecutionId, NodeId, etc) for added type-safety; together with this, give the variables more contextually-specific names - like lock_uuid should be this_execution_lock_id;
  • use nonempty-collections to straighten out the internal logic.

Why switch from passing vars via self to via Params?

  1. Before, it was unclear what state each fn call involved when reading the code; now it is immediately clear.
  2. This setup is a prerequisite for expressing the execution flow in terms of continuations, where each part depends on the results from the previous part. I didn't yet do the semantic analysis to confirm this is something we'd want here, but it makes sense that it would be the case intuitively.

To do:

  • extracts parts and ops
  • cleanup dead code
  • review the results, optimize the internal APIs, naming patterns and code placement
  • fix issues discovered by python tests
  • add some unit-tests
  • document the purpose of each part and what does it do
  • collapse the contexts to be a single parameter for non-owned variables (params)
  • self code-review

Note this PR adds a respectable 1.5% test coverage; in actuality there are new tests that are catching certain things that have already been covered by python tests, however that was more accidental than intentional; we explicitly add tests (at the unit level) to ensure the small logic bits we now have work as intednded.

@MOZGIII MOZGIII changed the title Runloop: extracts parts and eliminate self use Runloop: extracts parts and atomize state access Mar 9, 2026
@MOZGIII MOZGIII changed the title Runloop: extracts parts and atomize state access Runloop: extracts parts and atomize state access [wip] Mar 9, 2026
@MOZGIII MOZGIII requested a review from piercefreeman March 9, 2026 16:28
@MOZGIII MOZGIII changed the base branch from main to mzg/r9 March 9, 2026 16:29
@MOZGIII MOZGIII force-pushed the mzg/ws-2026-03-09 branch from 4107853 to 7c80534 Compare March 9, 2026 16:34
Base automatically changed from mzg/r9 to main March 10, 2026 00:50
@MOZGIII MOZGIII force-pushed the mzg/ws-2026-03-09 branch 3 times, most recently from e9e47a3 to e8b0022 Compare March 12, 2026 12:11
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 12, 2026

Coverage Report

Python Coverage

Metric Coverage
Lines 75.8%
Branches 57.8%

Download HTML Report

Rust Coverage

Metric Coverage
Lines 61.6% 🟢 (+1.6%)
Branches N/A

Download HTML Report

Compared to main branch

@MOZGIII MOZGIII force-pushed the mzg/ws-2026-03-09 branch from 017e486 to f7da748 Compare March 12, 2026 15:06
@MOZGIII MOZGIII force-pushed the mzg/ws-2026-03-09 branch from f7da748 to 323396e Compare March 12, 2026 15:07
@MOZGIII MOZGIII changed the title Runloop: extracts parts and atomize state access [wip] Runloop: extracts parts and atomize state access Mar 12, 2026
@MOZGIII MOZGIII changed the title Runloop: extracts parts and atomize state access Runloop: extracts parts and atomize state access [wip] Mar 12, 2026
@MOZGIII MOZGIII marked this pull request as ready for review March 12, 2026 16:47
@MOZGIII MOZGIII changed the title Runloop: extracts parts and atomize state access [wip] Runloop: extracts parts and atomize state access Mar 12, 2026
Copy link
Copy Markdown
Owner

@piercefreeman piercefreeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor is shaping up really nicely. Leaving my initial comments below.

Comment on lines +380 to +420
let mut executor_shards = HashMap::from([(executor_id, 0usize)]);
let lock_tracker = InstanceLockTracker::new(Uuid::new_v4());
let mut inflight_actions: HashMap<Uuid, usize> = HashMap::new();
let mut inflight_dispatches: HashMap<Uuid, InflightActionDispatch> = HashMap::new();
let mut sleeping_nodes: HashMap<Uuid, SleepRequest> = HashMap::new();
let mut sleeping_by_instance: HashMap<Uuid, HashSet<Uuid>> = HashMap::new();
let mut blocked_until: HashMap<Uuid, DateTime<Utc>> = HashMap::new();
let mut barrier: CommitBarrier<ShardStep> = CommitBarrier::new();
let mut instances_done_pending: Vec<InstanceDone> = Vec::new();
let (sleep_tx, _sleep_rx) = tokio::sync::mpsc::unbounded_channel::<SleepWake>();

let step = ShardStep {
executor_id,
actions: vec![],
sleep_requests: vec![SleepRequest {
node_id,
wake_at: requested_wake_at,
}],
updates: None,
instance_done: None,
};

let mut worker_pool = MockWorkerPool::new();
worker_pool.expect_queue().never();

let before = Utc::now();
let params = super::Params {
executor_shards: &mut executor_shards,
lock_tracker: &lock_tracker,
inflight_actions: &mut inflight_actions,
inflight_dispatches: &mut inflight_dispatches,
sleeping_nodes: &mut sleeping_nodes,
sleeping_by_instance: &mut sleeping_by_instance,
blocked_until_by_instance: &mut blocked_until,
commit_barrier: &mut barrier,
instances_done_pending: &mut instances_done_pending,
sleep_tx: &sleep_tx,
worker_pool: &worker_pool,
skip_sleep: true,
step,
};
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're sharing the same basic constructors across all these test functions - a helper function that returns the default params (or a test-scoped extension to Params that allows for construction without these values) might make this code cleaner.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6283c11 (this PR) how is this? If this looks good enough I'll distribute the same approach to the rest of the tests

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the harness approach is cleaner (at least for these cases where we have to initialize a pretty large payload of params that are redundant across test functions).

I'm unclear on the separation of concerns between just directly mutating the harness properties via a harness.executor_shards = xyz (which you do to override the dicts) and the arguments that are intended to go in the params() signature. Would it make sense to just have "default" values for all of the parameters and go with harness assignment for them all? Or are we only expecting to house empty-collections within the harness and everything else is intended to be a params() param?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separation of concerns is arbitrary; I think it is a mistake. Let me iterate a bit more on this to we can simplify; we can reduce the harness to just a bag of values really.

We probably don't want to generalize and reuse it among the tests for different parts - as it would needlessly broaden the scope of the tests. So each part would get a test harness that only accepts overrides via fields, and a direct params conversion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the separation of concerns - there is a bit of a practical thing though - the params instance holds some non-Copy fields by value by design, in this case step; so, those would still have to be passed explicitly via params argument. Also the mocks just have shorter access paths if they're provided from the outside - but there's actually no practical need to ever construct them separately - so I'm inlining them as well for now; if there's a need we can move them back to .params args.

Comment thread crates/lib/runloop/src/runloop/ops/apply_confirmed_step.rs
Comment thread crates/lib/runloop/src/runloop/ops/hydrate_instances.rs
Comment thread crates/lib/runloop/src/runloop/parts/completions/tests.rs
Comment thread crates/lib/runloop/src/runloop/parts/new_instances.rs
Comment thread crates/lib/runloop/src/runloop/parts/deferred_instances.rs
Comment thread crates/lib/runloop/src/runloop/parts/blocked_until_by_instance.rs Outdated
Comment thread crates/lib/runloop/src/runloop/parts/blocked_until_by_instance.rs Outdated
Comment thread crates/lib/runloop/src/runloop/parts/blocked_until_by_instance.rs Outdated
Comment thread crates/lib/runloop/src/runloop/parts/inflight_dispatches.rs Outdated
@piercefreeman piercefreeman merged commit 102d322 into main Mar 14, 2026
18 checks passed
@piercefreeman piercefreeman deleted the mzg/ws-2026-03-09 branch March 14, 2026 01:15
piercefreeman added a commit that referenced this pull request Mar 14, 2026
Goes in after #236 

This PR continues the runloop refactors, now focusing on removing more
logic from the `impl Runloop { ... }` blocks. We are introducing
purposely-built primitive with a strict API for the tracking of the
available instance slots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants