Skip to content

Conversation

@likewhatevs
Copy link
Contributor

Fix bugs in mitosis.

The test script added passes on the last commit in this PR.

It fails due to various issues on commits in this PR.

W/o this PR at all, mitosis will get kicked w/ missing cell in cpu map.

W/o drain commit, mitosis will get kicked w/ No available cells to allocate.

@likewhatevs likewhatevs force-pushed the mitosis-bugfixes branch 2 times, most recently from 418df1f to 5553ed7 Compare October 17, 2025 08:13
@likewhatevs
Copy link
Contributor Author

Reasonably confident this also fixes:

WARNING: CPU: 42 PID: 3899049 at kernel/sched/ext.c:2177 scx_dsq_insert_commit+0xf1/0x120

which i encountered while testing.

@dschatzberg
Copy link
Contributor

Implement cell draining when cpuset removed

Let's drop this commit from the PR - I don't think we've hit this issue except in synthetic cases. It also won't work because we reassign the cpus to other cells so there's no guarantee that those cells will get drained.

Copy link
Contributor

@dschatzberg dschatzberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments inline - some of these changes don't make sense (I indicated where) but a bunch of good stuff in here too. I love the test script

In mitosis_stopping, the scheduler calculates vtime advancement by
dividing by the task's weight. If a task somehow has a zero weight,
this would cause a division by zero error. Add a defensive check to
catch this case and report an error rather than crashing.
When allocating a new cell, the vtime_now field is not initialized,
which means it contains garbage data. This can cause scheduling
anomalies as tasks inherit this uninitialized vtime. Initialize
vtime_now to 0 when allocating a cell to ensure consistent vtime
tracking from the start.
Replace std::ptr::read_volatile with proper atomic load operations
with Acquire ordering when reading fields shared between BPF and Rust
code. This ensures proper memory ordering guarantees required for
multi-threaded communication:

- applied_configuration_seq: Used to synchronize configuration updates
- in_use: Used to determine cell allocation status

The Acquire ordering ensures that all writes made by the BPF side
before releasing the data are visible to the Rust side after the
load.
Replace expect() with unwrap_or_else() when looking up cell CPU
assignments. If a cell is marked as in_use but has no CPUs assigned
(which can happen during race conditions or transient states), log a
warning and use an empty cpumask rather than panicking. This makes
the scheduler more resilient to transient inconsistencies.
Add a check to ensure global_queue_decisions is non-zero before
taking log10. If it's zero (which can happen during initialization
or in periods of no activity), use MIN_DECISIONS_WIDTH directly
instead of attempting to calculate log10(0), which returns negative
infinity and causes formatting issues.
Add validation that the percpu map lookup returns at least
NR_CPUS_POSSIBLE entries before indexing into it. If the map
returns fewer entries than expected, fail with a clear error
message rather than panicking with an out-of-bounds access.
Address several race conditions and lifecycle issues that were causing
errors under load:

BPF side (mitosis.bpf.c):
- Fix cgroup_exit incorrectly using BPF_LOCAL_STORAGE_GET_F_CREATE
  flag. The exit handler should not create storage; if storage doesn't
  exist, the cgroup was never a cell owner and there's nothing to free.
  This was causing "cgrp_ctx creation failed" errors.

Rust side (main.rs):
- Fix TOCTOU race in cell discovery by inferring cell existence from
  CPU assignments rather than reading in_use flag separately. The
  in_use flag is set early in allocate_cell() but CPUs aren't assigned
  until later in update_timer_cb. Reading in_use with Acquire doesn't
  synchronize with CPU assignments (which are synchronized via
  applied_configuration_seq). This was causing "missing cell in cpu
  map" warnings.

Known remaining issue:
- Cells are not freed when a cgroup's cpuset is removed (only when
  cgroup exits). See TODO at line 870 in mitosis.bpf.c. With
  MAX_CELLS=16, systems with many dynamic cpuset changes can exhaust
  the cell pool, leading to "No available cells to allocate" errors.
  This may be an issue only present in synthetic tests.
Add a script to stress mitosis by creating and destroying cells. This
has proven useful for identifying bugs.
@likewhatevs likewhatevs added this pull request to the merge queue Oct 20, 2025
Merged via the queue into sched-ext:main with commit b39d943 Oct 20, 2025
21 checks passed
@likewhatevs likewhatevs deleted the mitosis-bugfixes branch October 20, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants