mitosis: bugfixes #2904

likewhatevs · 2025-10-17T08:08:49Z

Fix bugs in mitosis.

The test script added passes on the last commit in this PR.

It fails due to various issues on commits in this PR.

W/o this PR at all, mitosis will get kicked w/ missing cell in cpu map.

W/o drain commit, mitosis will get kicked w/ No available cells to allocate.

likewhatevs · 2025-10-17T08:23:47Z

Reasonably confident this also fixes:

WARNING: CPU: 42 PID: 3899049 at kernel/sched/ext.c:2177 scx_dsq_insert_commit+0xf1/0x120

which i encountered while testing.

dschatzberg · 2025-10-17T13:45:46Z

Implement cell draining when cpuset removed

Let's drop this commit from the PR - I don't think we've hit this issue except in synthetic cases. It also won't work because we reassign the cpus to other cells so there's no guarantee that those cells will get drained.

dschatzberg

See my comments inline - some of these changes don't make sense (I indicated where) but a bunch of good stuff in here too. I love the test script

scheds/rust/scx_mitosis/src/bpf/mitosis.bpf.c

scheds/rust/scx_mitosis/src/main.rs

scheds/rust/scx_mitosis/src/bpf/mitosis.bpf.c

scheds/rust/scx_mitosis/src/main.rs

In mitosis_stopping, the scheduler calculates vtime advancement by dividing by the task's weight. If a task somehow has a zero weight, this would cause a division by zero error. Add a defensive check to catch this case and report an error rather than crashing.

When allocating a new cell, the vtime_now field is not initialized, which means it contains garbage data. This can cause scheduling anomalies as tasks inherit this uninitialized vtime. Initialize vtime_now to 0 when allocating a cell to ensure consistent vtime tracking from the start.

Replace std::ptr::read_volatile with proper atomic load operations with Acquire ordering when reading fields shared between BPF and Rust code. This ensures proper memory ordering guarantees required for multi-threaded communication: - applied_configuration_seq: Used to synchronize configuration updates - in_use: Used to determine cell allocation status The Acquire ordering ensures that all writes made by the BPF side before releasing the data are visible to the Rust side after the load.

Replace expect() with unwrap_or_else() when looking up cell CPU assignments. If a cell is marked as in_use but has no CPUs assigned (which can happen during race conditions or transient states), log a warning and use an empty cpumask rather than panicking. This makes the scheduler more resilient to transient inconsistencies.

Add a check to ensure global_queue_decisions is non-zero before taking log10. If it's zero (which can happen during initialization or in periods of no activity), use MIN_DECISIONS_WIDTH directly instead of attempting to calculate log10(0), which returns negative infinity and causes formatting issues.

Add validation that the percpu map lookup returns at least NR_CPUS_POSSIBLE entries before indexing into it. If the map returns fewer entries than expected, fail with a clear error message rather than panicking with an out-of-bounds access.

Address several race conditions and lifecycle issues that were causing errors under load: BPF side (mitosis.bpf.c): - Fix cgroup_exit incorrectly using BPF_LOCAL_STORAGE_GET_F_CREATE flag. The exit handler should not create storage; if storage doesn't exist, the cgroup was never a cell owner and there's nothing to free. This was causing "cgrp_ctx creation failed" errors. Rust side (main.rs): - Fix TOCTOU race in cell discovery by inferring cell existence from CPU assignments rather than reading in_use flag separately. The in_use flag is set early in allocate_cell() but CPUs aren't assigned until later in update_timer_cb. Reading in_use with Acquire doesn't synchronize with CPU assignments (which are synchronized via applied_configuration_seq). This was causing "missing cell in cpu map" warnings. Known remaining issue: - Cells are not freed when a cgroup's cpuset is removed (only when cgroup exits). See TODO at line 870 in mitosis.bpf.c. With MAX_CELLS=16, systems with many dynamic cpuset changes can exhaust the cell pool, leading to "No available cells to allocate" errors. This may be an issue only present in synthetic tests.

Add a script to stress mitosis by creating and destroying cells. This has proven useful for identifying bugs.

likewhatevs requested review from dforsyth, dschatzberg, kkdwvd and tommy-u October 17, 2025 08:08

likewhatevs force-pushed the mitosis-bugfixes branch 2 times, most recently from 418df1f to 5553ed7 Compare October 17, 2025 08:13

likewhatevs force-pushed the mitosis-bugfixes branch from 5553ed7 to 2d80d63 Compare October 17, 2025 14:02

dschatzberg requested changes Oct 17, 2025

View reviewed changes

likewhatevs force-pushed the mitosis-bugfixes branch from 2d80d63 to fd9dec7 Compare October 17, 2025 18:14

likewhatevs added 8 commits October 17, 2025 15:11

mitosis: Add a test and cleanup script

67be89c

Add a script to stress mitosis by creating and destroying cells. This has proven useful for identifying bugs.

likewhatevs force-pushed the mitosis-bugfixes branch from fd9dec7 to 67be89c Compare October 17, 2025 19:11

likewhatevs requested a review from dschatzberg October 17, 2025 19:16

dschatzberg approved these changes Oct 20, 2025

View reviewed changes

likewhatevs added this pull request to the merge queue Oct 20, 2025

Merged via the queue into sched-ext:main with commit b39d943 Oct 20, 2025
21 checks passed

likewhatevs deleted the mitosis-bugfixes branch October 20, 2025 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mitosis: bugfixes #2904

mitosis: bugfixes #2904

Uh oh!

likewhatevs commented Oct 17, 2025

Uh oh!

likewhatevs commented Oct 17, 2025

Uh oh!

dschatzberg commented Oct 17, 2025

Uh oh!

dschatzberg left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mitosis: bugfixes #2904

mitosis: bugfixes #2904

Uh oh!

Conversation

likewhatevs commented Oct 17, 2025

Uh oh!

likewhatevs commented Oct 17, 2025

Uh oh!

dschatzberg commented Oct 17, 2025

Uh oh!

dschatzberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants