Skip to content

implement engine abstraction for mlx and mflux#2000

Merged
Evanev7 merged 1 commit into
mainfrom
runner-refactor-post-prefill
Apr 28, 2026
Merged

implement engine abstraction for mlx and mflux#2000
Evanev7 merged 1 commit into
mainfrom
runner-refactor-post-prefill

Conversation

@Evanev7
Copy link
Copy Markdown
Member

@Evanev7 Evanev7 commented Apr 28, 2026

refactor for future versions.

Comment thread src/exo/worker/engines/mlx/builder.py Outdated

def close(self) -> None:
with contextlib.suppress(NameError, AttributeError):
del self.inference_model, self.tokenizer, self.group
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this doesn't clean up all of them if it fails

if resp is None and len(self.queue) > 0:
task = self.queue.popleft()
self.current_gen = self._run_image_task(task.task_id, task.task_params)
resp = next(self.current_gen, None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah talked in person, just check if this needs a finishedresponse/cancelledresponse?

@Evanev7 Evanev7 force-pushed the runner-refactor-post-prefill branch from 0272df2 to f55ce12 Compare April 28, 2026 00:50
@Evanev7 Evanev7 force-pushed the runner-refactor-post-prefill branch from f55ce12 to 03f0808 Compare April 28, 2026 00:53
@Evanev7 Evanev7 enabled auto-merge (squash) April 28, 2026 00:54
@Evanev7 Evanev7 merged commit c80b10c into main Apr 28, 2026
6 checks passed
@Evanev7 Evanev7 deleted the runner-refactor-post-prefill branch April 28, 2026 00:58
adurham pushed a commit to adurham/exo that referenced this pull request May 1, 2026
… abstraction exo-explore#2000

Resolutions:

- src/exo/worker/runner/runner.py: took theirs entirely. PR exo-explore#2000's engine
  abstraction refactor (Builder + Engine ABC, MlxBuilder, MfluxBuilder)
  fundamentally changes how config flows: runner.py now takes a pre-built
  builder param instead of constructing inline with 14 kwargs. Our
  per-instance config (max_kv_tokens, prefill_step_size, kv_cache_bits,
  default_*) now needs to be plumbed via MlxBuilder constructor in
  bootstrap.py — TODO follow-up.

- src/exo/worker/engines/mlx/auto_parallel.py: took ours.
  DeepseekV4ShardingStrategy keeps MoE-only sharding + our
  _install_dsv4_fused_gate_up. Upstream switches to attention+MoE sharding
  via _shard_v4_attention_heads + ShardedMoEV4 wrapper, but per
  dsv4_flash_6bit_deployment memory the MoE-only approach is what's
  validated (34.2 tok/s on Blaizzy mlx-lm).

- src/exo/worker/engines/mlx/cache.py: combined ours + upstream rename.
  Kept TURBOQUANT_* imports; followed upstream's
  exo.shared.types.mlx -> exo.worker.engines.mlx.types module rename
  (PR-1192-related KVCacheType union with PoolingCache + Batch* types
  came along cleanly).

- src/exo/worker/engines/mlx/generator/generate.py + batch_generate.py:
  combined. Took upstream's remote_prefill scaffolding (MLX P/D exo-explore#1993)
  AND kept our prefill_step_size kwarg for DSv4 (DSV4_PREFILL_STEP_SIZE=256
  per dsv4_prefill_chunk_size_curve memory). uuid + deepcopy imports both.

- src/exo/worker/runner/llm_inference/batch_generator.py: combined.
  Took upstream's CancelledResponse/FinishedResponse rename + _gen
  attribute rename. Kept our trace import + with T(...) wrap for
  start_task.mlx_gen_submit.

- src/exo/master/main.py: combined. Both the prefill_only filter
  (MLX P/D-aware routing) and our hoisted in_flight set declaration.

- src/exo/download/download_utils.py: combined our session-injection
  pattern (_do_hf_download inner function for peer-share download)
  with upstream's 429 rate-limit handling.

Bumps base to PR exo-explore#1993 (MLX P/D), exo-explore#2000 (engine abstraction),
exo-explore#2009 (HF rate limits), and a few app/uninstall fixes.

Cluster validation pending — engine abstraction may have config plumbing
gaps for our DSv4 tunables. Will surface on next start_cluster run.
adurham pushed a commit to adurham/exo that referenced this pull request May 1, 2026
…-explore#1192

Three fixes needed after the PR exo-explore#2000 + exo-explore#1993 + PR-1192 merges:

1. mlx pinned back to f8bac642 (drop the JACCL barrier merge upstream/3835dbab).
   The barrier addition uses Dtype::UInt8 + non-virtual `override` markers
   that aren't compatible with our fork's Dtype/base class state. We don't
   use barrier() anywhere yet — revert is safe.

2. src/exo/worker/engines/mlx/disaggregated/adapter.py: drop
   `from mlx_lm.models.deepseek_v4 import DeepseekV4Cache` (removed in
   PR exo-explore#1192). The match arm at line 104 raised NotImplementedError for it
   anyway, so swap to PoolingCache (DSv4's new cache type that's also
   not yet wire-protocol-supported).

3. src/exo/worker/engines/mlx/patches/opt_batch_gen.py: replaced our
   old version (precompute_topk profiling stubs) with upstream's new
   API (set_needs_topk / take_ready_topk / BatchTopKLogprobs dataclass).
   Auto-merge took ours by default; the new batch_generate.py from
   upstream expects the new API.

Verified:
- uv sync builds clean
- exo imports clean (Runner, MlxBuilder, ExoBatchGenerator, prefill, sharding,
  disaggregated adapter, types, opt_batch_gen)
- exo starts + shuts down clean (master elected, API on :52415, file
  server on :52416, no errors).
team-wcv added a commit to team-wcv/exo that referenced this pull request May 7, 2026
* fix: map presence_penalty and frequency_penalty from ChatCompletionRequest (exo-explore#1991)

Upstream PR exo-explore#1947 added `presence_penalty` and `frequency_penalty` to
`TextGenerationTaskParams` and the mlx-lm generator call sites, but
missed wiring them up in the API adapter so they were silently dropped
from incoming requests. This fixes the API mapping.

Co-authored-by: Adam Durham <adam@example.com>

* Add DeepSeek V4 Flash/Pro (exo-explore#1978)

Wait for upstream merge.

---------

Co-authored-by: Evan <evanev7@gmail.com>

* Extend bench/eval tooling (exo-explore#1905)

## Motivation

Extend bench/eval tooling with robustness features, streaming support,
and align model configs with vllm eval for reproducible comparisons.

## Changes

- **exo_eval**: Checkpoint/resume (JSONL), instance health monitoring +
early abort, `top_k`/`min_p`/`enable_thinking` params, LCB
`--release-version`/`--offset`
- **exo_bench**: Streaming SSE (`--stream`), Kimi tokenizer fix for
transformers 5.x
- **Both tools**: Auto-detect running instances instead of requiring
`--skip-instance-setup`; `--fresh-instance` to override
- **harness**: SSE streaming client, `find_existing_instance()` shared
helper, removed download timeout, settle-timeout default 0→7200s
- **models.toml**: Added `enable_thinking`, aligned `max_tokens`/temps
with vllm, added new models
- **API**: Streaming SSE for `/bench/chat/completions`

## Why It Works

- Checkpoint/resume uses append-only JSONL + skip-on-load so interrupted
evals resume without re-running completed questions
- Health monitoring races an `asyncio.Event` against API calls for fast
abort when the instance dies
- Auto-detection queries `/state` for existing instances matching the
model ID before attempting placement
- Streaming reuses the existing `generate_chat_stream` infrastructure
from the regular chat endpoint

* fix: route by in-flight tasks only — completed tasks were skewing load balance (exo-explore#1989)

The load balancer counted ALL tasks (Complete, Cancelled, TimedOut,
Failed) instead of only Pending/Running ones. With 138 accumulated tasks
and only 7 active, routing decisions were based on historical
distribution, causing one node to appear permanently 'busier' and
starving the other of work.

Co-authored-by: Adam Durham <adam@example.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* MLX P/D (exo-explore#1993)

## Motivation

MLX only prefill server for Apple Silicon

* fix: uninstall-exo.sh removes both current and legacy bridge scripts (exo-explore#1998)

## Summary

The standalone `app/EXO/uninstall-exo.sh` only knew about the legacy
filename `disable_bridge_enable_dhcp.sh`. On machines installed with
newer EXO versions, the current `/Library/Application
Support/EXO/disable_bridge.sh` was left behind, and the script then
reported `EXO support directory not empty, leaving in place`.

This PR makes the script try both filenames, removing whichever ones
exist. Tolerates **either**, **both**, or **neither** being present
without erroring.

The Swift `NetworkSetupHelper.makeUninstallScript()` already handles
both paths correctly, so the GUI uninstall flow is unaffected — this is
a script-only fix.

Caught while running an end-to-end uninstall on a real machine for
exo-explore#1997.

## Test plan

Verified the new block in isolation against all four states:

- [x] both `disable_bridge.sh` and `disable_bridge_enable_dhcp.sh`
present → both removed
- [x] only `disable_bridge.sh` present → removed cleanly
- [x] only `disable_bridge_enable_dhcp.sh` present → removed cleanly
(legacy install)
- [x] neither present → prints the existing "already removed?" warning,
exits 0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* implement engine abstraction for mlx and mflux (exo-explore#2000)

refactor for future versions.

* feat: keep-models option when uninstalling EXO (exo-explore#1997)

## Summary

- Adds a **Keep downloaded models (~/.exo/models)** checkbox to the
macOS uninstall confirmation dialog (Settings → Advanced → Danger Zone).
The full `~/.exo` directory is now removed on uninstall by default; if
the checkbox is checked, `~/.exo/models` is preserved.
- The standalone `app/EXO/uninstall-exo.sh` gains a matching
`--keep-models` flag and the same `~/.exo` cleanup so GUI and CLI flows
stay in sync. Resolves the user home via `$SUDO_USER` since the script
runs under `sudo`.

Previously, "Uninstall EXO" only cleaned up system-level components
(LaunchDaemon, network location, logs, app bundle) and left the entire
`~/.exo` directory behind. Now uninstalling actually removes EXO's user
data, with a one-click opt-out for the (potentially many GB) of
downloaded models.

![Uninstall dialog with new
checkbox](https://raw.githubusercontent.com/exo-explore/exo/703b7fbbf13441217ad2903bb199f07e92af4490/uninstall-dialog.png)

> Note: the rendered icon in the screenshot above is the generic system
folder icon because it was captured from a small standalone Swift binary
(no app bundle / icon resource). When triggered from the actual EXO.app,
the EXO app icon is shown.

## Test plan

- [ ] Build EXO.app locally; open Settings → Advanced → Danger Zone →
Uninstall EXO; confirm the new "Keep downloaded models (~/.exo/models)"
checkbox is present and unchecked by default.
- [ ] Uninstall with the checkbox **checked** → `~/.exo/models/`
survives, all other entries under `~/.exo` are gone, system components
removed, app moved to Trash.
- [ ] Uninstall with the checkbox **unchecked** → `~/.exo` is fully
removed.
- [ ] `sudo app/EXO/uninstall-exo.sh --keep-models` → `~/.exo/models/`
is preserved, the rest of `~/.exo` is removed.
- [ ] `sudo app/EXO/uninstall-exo.sh` (no flag) → `~/.exo` is fully
removed.
- [ ] `app/EXO/uninstall-exo.sh --help` prints usage and exits 0;
unknown args exit 2 with a usage hint.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Evan <evanev7@gmail.com>

* feat(app): open Share Bug Report in a dedicated window (exo-explore#2003)

## Summary

- Adds a top-level **Share Bug Report…** menu item to the macOS popover
(between *Check for Updates* and *Quit*) with SF Symbol `ladybug`.
- Clicking it opens a dedicated resizable `NSWindow` ("Send a Bug
Report") that hosts the prompting / sending / success / failure flow.
- Removes the description-less duplicate from Settings → Debug Info, and
the dead `debugSection` it nominally lived behind.

## Why

PR exo-explore#1959 added a user-description prompt to the bug-report flow, but its
trigger lived inside `ContentView.debugSection` — a view that's defined
but never rendered in the body. The path users actually hit was
`SettingsView.sendBugReportButton`, which called
`BugReportService.sendReport(isManual: true)` without ever passing
`userDescription`. So the description prompt was unreachable in the
built app.

## Approach

Per Apple HIG, an action that requires further input before completing
should open a dialog, not transform the menu inline. So:

- Add a top-level menu entry that ends in `…` (HIG: ellipsis indicates
"further input required").
- Move the prompting/sending/success/failure state machine into a
standalone `BugReportWindowController` modeled after the existing
`SettingsWindowController`.
- Single-instance window with frame-autosave name, sensible
`contentMinSize`, resizable, native button layout (`.cancelAction` /
`.defaultAction` keyboard shortcuts), light/dark-mode-correct
`.textBackgroundColor` and `.separatorColor`.
- Auto-focus the description field on open. `Try Again` from failure,
`Open GitHub Issue` + `Done` from success.

## Files

- `app/EXO/EXO/Views/BugReportWindowController.swift` (new) — controller
+ view.
- `app/EXO/EXO/EXOApp.swift` — wire `BugReportWindowController` as a
`@StateObject` and inject as environment object.
- `app/EXO/EXO/ContentView.swift` — replace inline state machine with
menu item that calls `bugReportWindowController.open()`. Remove
now-unused state, helpers, and dead `debugSection`.
- `app/EXO/EXO/Views/SettingsView.swift` — remove duplicate
`sendBugReportButton`, `sendBugReport()`, and related `@State`. Section
"Debug Info" keeps Thunderbolt / interface / RDMA info.

`BugReportService` is unchanged.

## Test plan

- [ ] Open the menu-bar popover → confirm **Share Bug Report…** appears
between *Check for Updates* and *Quit*, with a ladybug icon.
- [ ] Click it → a window titled "Send a Bug Report" appears, centered,
with the description editor focused.
- [ ] Resize the window → size persists across re-opens (frame
autosave).
- [ ] Type a description, press Return → upload succeeds, success card
with **Open GitHub Issue** + **Done** appears.
- [ ] Click **Open GitHub Issue** → browser opens with the description
pre-filled into the issue template.
- [ ] Send with empty description → upload still succeeds.
- [ ] Press Esc from the prompting state → window closes.
- [ ] On failure (e.g., offline) → error card with **Try Again** +
**Close** appears; Try Again returns to the editor with the description
preserved.
- [ ] Open the Settings window → Debug Info section is unchanged except
the Send Bug Report button is gone.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(app): tighten Share Bug Report prompt layout (exo-explore#2008)

## Summary

Follow-ups to exo-explore#2003 based on feedback that the Share Bug Report window
felt visually weighty: too much padding above and below, and a
description editor that invited an essay rather than a one-liner.

## Changes (one file)

`app/EXO/EXO/Views/BugReportWindowController.swift`:

- **Auto-size the window to its content.** Switched from `NSHostingView`
+ fixed `contentRect: 480x380` + SwiftUI `frame(minHeight: 320)` to
`NSHostingController` with `sizingOptions = [.preferredContentSize,
.minSize]`. The fixed-min combo was centering the form in dead vertical
space.
- **Smaller, lower-pressure editor.** Field is now labeled `Description
(optional)` with a placeholder hint (`What were you doing when it
broke?`) inside the editor. Editor height fixed at 72pt (was 120pt min).
Replaced the long lead-in paragraph and headline with a single one-line
caption between field and buttons: `Diagnostic logs will be uploaded
with your report.`
- **Tighter spacing.** Outer padding 20 -> 16, root spacing 16 -> 12,
prompting-section spacing 12 -> 8.
- **Remove em dash from copy.**

`BugReportService` and the menu wiring are unchanged.

## Test plan

- [ ] Click `Share Bug Report...` from the menu bar.
- [ ] The window opens centered and sized to its content (no big empty
bands top/bottom).
- [ ] Description editor is visibly compact, with the placeholder hint
showing when empty.
- [ ] The optional-ness is conveyed by the field label (no separate help
paragraph).
- [ ] Caption `Diagnostic logs will be uploaded with your report.`
appears in `.caption` style under the editor, above the buttons.
- [ ] Resize the window: persists across re-opens (frame autosave still
works).
- [ ] Send/Cancel/Try Again/Done flows behave the same as before.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* A few targeted tweaks to address HF rate limits (exo-explore#2009)

## Motivation

- exo bursts ~200 HF Hub-API requests on every cold start, blowing past
the anonymous 500-req/5-min budget.
- The existing retry loop catches 429 generically and gives up in ~3s —
well before HF's reset window.
- `file_meta` and `_download_file` had no 429 handling at all (became
`AssertionError`).
- Disk file-list cache was bypassed on every process restart.

## Changes

All in `src/exo/download/download_utils.py` + tests.

- Parse `t=` from HF's `RateLimit` header on 429; sleep `min(t, 300s) +
jitter`.
- Handle 429 at all three call sites (`_fetch_file_list`, `file_meta`,
`_download_file`).
- `n_attempts`: 3 → 5.
- Disk cache now primary across restarts (24h mtime TTL).
- `?recursive=true` instead of N+1 subdir walks.

## Why It Works

`t=<seconds>` is HF's "wait this long and you'll be unblocked" —
sleeping that long lets the window reset. Disk-cache-as-primary plus
recursive listing cuts cold-start Hub-API traffic by ~10×.

## Test Plan

### Manual Testing

MacBook Pro M1 Max. Tripped the real HF 429. Pre-fix: failed in 3.4s.
Post-fix: slept (HF returned `t=158`) and recovered.

### Automated Testing

- New `test_rate_limit_handling.py` (19 tests) — header parsing,
retry-loop behaviour, plus HTTP-level coverage that mocks aiohttp to
return a 429 and asserts each call site raises
`HuggingFaceRateLimitError(retry_after=52.0)`.
- New `TestFileListCacheTTL` in `test_offline_mode.py` — fresh cache
hits, stale cache refetches.
- 421 tests pass; basedpyright / ruff / nix fmt clean.

* fix(macos-app): disable URL response caching for cluster-state polling (exo-explore#2005)

Fixes exo-explore#2004.

`ClusterStateService` polls `/state` at 2 Hz via `URLSession.shared`,
which keeps an on-disk `URLCache` attached by default. Every polled
response body gets persisted under `~/Library/Caches/exolabs.EXO/`,
sustaining ~500–620 KB/sec of file-backed memory dirtied — far above
macOS's ~25 KB/sec per-process daily-average baseline. Six
microstackshot reports observed on a single Mac Studio M3 Ultra over
eight days, with one 15-hour run accumulating 34.36 GB of cache writes.

Heaviest stack on every diagnostic report (96–98% of samples):

```
_dispatch_workloop_worker_thread → _dispatch_block_async_invoke2 →
  __CFURLCache::CreateAndStoreCacheNode → write
```

Full diagnostic data and analysis in exo-explore#2004.

## What changed

`ClusterStateService` now defaults to an ephemeral, non-caching
`URLSession` instead of `URLSession.shared`. Cluster-state responses are
time-sensitive and small; nothing benefits from being cached on disk.

```swift
private static func makeNonCachingSession() -> URLSession {
    let config = URLSessionConfiguration.ephemeral
    config.urlCache = nil
    config.requestCachePolicy = .reloadIgnoringLocalCacheData
    return URLSession(configuration: config)
}
```

The existing per-request `request.cachePolicy =
.reloadIgnoringLocalCacheData` calls are kept as defense in depth — they
only affect read behavior, but harmless to leave alongside the
session-level config.

## Scope

- **Behavioral**: none. Polled requests still go out at the same
cadence; responses still parse the same; no semantic change to any API
surface.
- **Test injection**: the `session:` parameter remains in `init`, so
tests can still inject a custom mock session unchanged.
- **`BugReportService` and other `URLSession.shared` callers**:
untouched. If maintainers prefer an app-wide URLCache disable instead,
happy to switch the approach (issue body has the alternative spelled
out).

## Verification

Verified locally that compiling EXO with this change produces a working
menubar app and `ClusterStateService` continues to fetch state
correctly. After ~30 min of idle polling, no new entries in
`/Library/Logs/DiagnosticReports/EXO_*.diag` and no growth in
`~/Library/Caches/exolabs.EXO/`.

## Test plan
- [ ] Build EXO from this branch on macOS 26.4
- [ ] Launch, let cluster state polling run for 30+ min
- [ ] Confirm no new microstackshot diagnostic reports
- [ ] Confirm `~/Library/Caches/exolabs.EXO/Cache.db*` does not grow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Jordan Miller <jordan.d.miller@gmail.com>

* feat: update rdma_ctl instructions (exo-explore#1977)

## Motivation

The RDMA setup instructions were missing a step: after booting to
Recovery mode, users need to open Terminal from the Utilities menu
before they can run the `rdma_ctl` command. Without this step, users
following the instructions wouldn't know how to access a terminal in
Recovery mode. This step was already in the README just not in the UI
notifications.

## Changes

Added a missing instruction step — "Open Terminal from the Utilities
menu" — to three instances of the RDMA setup flow in
`dashboard/src/routes/+page.svelte`.

## Why It Works

N/A copy change only. 

## Test Plan

### Manual Testing
Hardware: MacBook Pro M4 Max 48GB

### Automated Testing
No automated tests affected; this is a UI copy change only.

Co-authored-by: Sam Bradbury <sam@consultbradbury.com>

* Initialise _cancelled_tasks in ImageEngine (exo-explore#2051)

we yielded nonsense chunks from engines; we didn't initialize the image
engine correctly. mostly rewrite of exo-explore#2049

---------

Co-authored-by: ciaranbor <ciaranborourke-dev@proton.me>

* fix(inference): prevent TP collective deadlock via agree_on_tasks order (exo-explore#2048)

If you have two machines and make two requests at the same time, it can
crash. This is because the tasks can sometimes end up in different
orders on different machines. We need to sort the tasks and
mx_all_gather_tasks already sorts the tasks but the code ignores that
ordering. The fix is to make sure the sort order is preserved.

The rest is written by Sonnet (reviewed by me):

Tensor-parallel inference requires that every rank enqueues tasks in the
same order before running agree_on_tasks collectives. The old
implementation filtered from _maybe_queue:

self._queue.extend(task for task in self._maybe_queue if task in agreed)
self._maybe_queue = [task for task in self._maybe_queue if task in
different]

Because _maybe_queue is independently ordered per-rank (tasks arrive via
gRPC in whatever order the API server sends them), two concurrent
requests could produce different _maybe_queue orderings on rank 0 vs
rank 1. The filter then preserved those different orders into _queue, so
each rank started processing tasks in a different sequence. The next mlx
collective (all_reduce, all_gather, etc.) on rank 0 corresponded to a
different task than on rank 1 → permanent deadlock.

Fix: extend from agreed directly. mx_all_gather_tasks returns agreed as
a list sorted by task_id on all ranks, so every rank appends the same
sequence regardless of local arrival order.

Applies to both SequentialGenerator and BatchGenerator.

## Motivation

`agree_on_tasks` is called on every rank after accumulating new requests
in
`_maybe_queue`. Its job is to run an `all_gather` collective so all
ranks agree
on which tasks to promote to `_queue` before the next inference step.

The old implementation re-imposed **local arrival order** when extending
`_queue`:

```python
self._queue.extend(task for task in self._maybe_queue if task in agreed)
```

`mx_all_gather_tasks` already returns `agreed` sorted by `task_id` — the
same
deterministic order on every rank. But iterating `self._maybe_queue`
instead of
`agreed` discarded that sort and substituted the local gRPC arrival
order, which
differs per rank under concurrent load. Two concurrent requests arriving
in
`[A, B]` order on rank 0 and `[B, A]` on rank 1 caused the first MLX
collective
in the next step to hang permanently: each rank was executing a
different task's
collective and would never match.

## Changes

`SequentialGenerator.agree_on_tasks` and
`BatchGenerator.agree_on_tasks`:

```python
# Before
self._queue.extend(task for task in self._maybe_queue if task in agreed)
self._maybe_queue = [task for task in self._maybe_queue if task in different]

# After
self._queue.extend(agreed)          # preserves mx_all_gather_tasks sort order
self._maybe_queue = list(different) # already in local order; filter was redundant
```

## Why It Works

`mx_all_gather_tasks` (in `utils_mlx.py`) computes the agreed set then
sorts by
`task_id`:

```python
agreed = [local_tasks[tid] for tid in sorted(agreed_ids)]
```

Because `task_id` is a UUID and the sort is lexicographic, every rank
produces
the same `agreed` list regardless of local arrival order. Using `agreed`
directly
preserves this guarantee. The `different` list (tasks not yet seen on
all ranks)
is built by iterating `tasks` in local order, which is already correct.

## Test Plan

### Manual Testing

**Hardware:** 2× Mac Studio M3 Ultra 512 GB, Thunderbolt 5 direct
bridge,
`MlxJaccl` RDMA tensor-parallel (`moonshotai/Kimi-K2.6`, 595 GB INT4, 61
layers).

- Sent concurrent streaming requests; confirmed all complete without
deadlock.
- This hardware configuration (sub-millisecond inter-node latency) is
the most
likely to trigger the race, as requests from separate HTTP connections
can
reach rank 0 and rank 1 in opposite order before `agree_on_tasks` runs.

### Automated Testing

All existing tests pass: `pytest src -m "not slow"
--import-mode=importlib`
— 422/422 passed. The existing `test_event_ordering.py` covers the
`agree_on_tasks` call path with a mock that returns tasks in consistent
order;
the race requires real distributed hardware to reproduce
deterministically.

---------

Co-authored-by: Adam Durham <amdnative@gmail.com>
Co-authored-by: Adam Durham <adam@example.com>
Co-authored-by: rltakashige <rl.takashige@gmail.com>
Co-authored-by: Evan <evanev7@gmail.com>
Co-authored-by: ciaranbor <81697641+ciaranbor@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>
Co-authored-by: ecohash-co <team@ecohash.co>
Co-authored-by: Jordan Miller <jordan.d.miller@gmail.com>
Co-authored-by: Sam Bradbury <31943456+sambradbury@users.noreply.github.com>
Co-authored-by: Sam Bradbury <sam@consultbradbury.com>
Co-authored-by: ciaranbor <ciaranborourke-dev@proton.me>
Co-authored-by: Drifter4242 <davehind@yahoo.co.uk>
Co-authored-by: jw-wcv <101585096+jw-wcv@users.noreply.github.com>
adurham pushed a commit to adurham/exo that referenced this pull request May 28, 2026
Earlier note said exo-explore was 'severely backlogged' — the merge data
contradicts that. exo-explore merged 10 PRs in the 2 weeks before
2026-05-27 (exo-explore#2087-exo-explore#2114), but every one was by a core/frequent
contributor (AndreiCravtov, Evanev7, Heidar-An). Zero external-fork PRs
merged. Accurate read: 'merges insider work fast, ignores external PRs'
— so waiting does nothing for our 6 untouched exo PRs; they need a
maintainer to actively adopt. Low ROI to chase.

Also added mlx-lm read: not hostile, just paused ~3 weeks (was actively
merging external work 36-38d ago). exo-explore#1216 has a community +1.

Nudged exo#2121 framing it as a regression from exo-explore#2000 (engine
abstraction), @-mentioned Evanev7 who authored that refactor and is an
active merger.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants