implement engine abstraction for mlx and mflux by Evanev7 · Pull Request #2000 · exo-explore/exo

Evanev7 · 2026-04-28T00:21:22Z

refactor for future versions.

rltakashige · 2026-04-28T00:27:56Z

+
+    def close(self) -> None:
+        with contextlib.suppress(NameError, AttributeError):
+            del self.inference_model, self.tokenizer, self.group


Pretty sure this doesn't clean up all of them if it fails

rltakashige · 2026-04-28T00:44:43Z

+        if resp is None and len(self.queue) > 0:
+            task = self.queue.popleft()
+            self.current_gen = self._run_image_task(task.task_id, task.task_params)
+            resp = next(self.current_gen, None)


yeah talked in person, just check if this needs a finishedresponse/cancelledresponse?

… abstraction exo-explore#2000 Resolutions: - src/exo/worker/runner/runner.py: took theirs entirely. PR exo-explore#2000's engine abstraction refactor (Builder + Engine ABC, MlxBuilder, MfluxBuilder) fundamentally changes how config flows: runner.py now takes a pre-built builder param instead of constructing inline with 14 kwargs. Our per-instance config (max_kv_tokens, prefill_step_size, kv_cache_bits, default_*) now needs to be plumbed via MlxBuilder constructor in bootstrap.py — TODO follow-up. - src/exo/worker/engines/mlx/auto_parallel.py: took ours. DeepseekV4ShardingStrategy keeps MoE-only sharding + our _install_dsv4_fused_gate_up. Upstream switches to attention+MoE sharding via _shard_v4_attention_heads + ShardedMoEV4 wrapper, but per dsv4_flash_6bit_deployment memory the MoE-only approach is what's validated (34.2 tok/s on Blaizzy mlx-lm). - src/exo/worker/engines/mlx/cache.py: combined ours + upstream rename. Kept TURBOQUANT_* imports; followed upstream's exo.shared.types.mlx -> exo.worker.engines.mlx.types module rename (PR-1192-related KVCacheType union with PoolingCache + Batch* types came along cleanly). - src/exo/worker/engines/mlx/generator/generate.py + batch_generate.py: combined. Took upstream's remote_prefill scaffolding (MLX P/D exo-explore#1993) AND kept our prefill_step_size kwarg for DSv4 (DSV4_PREFILL_STEP_SIZE=256 per dsv4_prefill_chunk_size_curve memory). uuid + deepcopy imports both. - src/exo/worker/runner/llm_inference/batch_generator.py: combined. Took upstream's CancelledResponse/FinishedResponse rename + _gen attribute rename. Kept our trace import + with T(...) wrap for start_task.mlx_gen_submit. - src/exo/master/main.py: combined. Both the prefill_only filter (MLX P/D-aware routing) and our hoisted in_flight set declaration. - src/exo/download/download_utils.py: combined our session-injection pattern (_do_hf_download inner function for peer-share download) with upstream's 429 rate-limit handling. Bumps base to PR exo-explore#1993 (MLX P/D), exo-explore#2000 (engine abstraction), exo-explore#2009 (HF rate limits), and a few app/uninstall fixes. Cluster validation pending — engine abstraction may have config plumbing gaps for our DSv4 tunables. Will surface on next start_cluster run.

…-explore#1192 Three fixes needed after the PR exo-explore#2000 + exo-explore#1993 + PR-1192 merges: 1. mlx pinned back to f8bac642 (drop the JACCL barrier merge upstream/3835dbab). The barrier addition uses Dtype::UInt8 + non-virtual `override` markers that aren't compatible with our fork's Dtype/base class state. We don't use barrier() anywhere yet — revert is safe. 2. src/exo/worker/engines/mlx/disaggregated/adapter.py: drop `from mlx_lm.models.deepseek_v4 import DeepseekV4Cache` (removed in PR exo-explore#1192). The match arm at line 104 raised NotImplementedError for it anyway, so swap to PoolingCache (DSv4's new cache type that's also not yet wire-protocol-supported). 3. src/exo/worker/engines/mlx/patches/opt_batch_gen.py: replaced our old version (precompute_topk profiling stubs) with upstream's new API (set_needs_topk / take_ready_topk / BatchTopKLogprobs dataclass). Auto-merge took ours by default; the new batch_generate.py from upstream expects the new API. Verified: - uv sync builds clean - exo imports clean (Runner, MlxBuilder, ExoBatchGenerator, prefill, sharding, disaggregated adapter, types, opt_batch_gen) - exo starts + shuts down clean (master elected, API on :52415, file server on :52416, no errors).

* fix: map presence_penalty and frequency_penalty from ChatCompletionRequest (exo-explore#1991) Upstream PR exo-explore#1947 added `presence_penalty` and `frequency_penalty` to `TextGenerationTaskParams` and the mlx-lm generator call sites, but missed wiring them up in the API adapter so they were silently dropped from incoming requests. This fixes the API mapping. Co-authored-by: Adam Durham <adam@example.com> * Add DeepSeek V4 Flash/Pro (exo-explore#1978) Wait for upstream merge. --------- Co-authored-by: Evan <evanev7@gmail.com> * Extend bench/eval tooling (exo-explore#1905) ## Motivation Extend bench/eval tooling with robustness features, streaming support, and align model configs with vllm eval for reproducible comparisons. ## Changes - **exo_eval**: Checkpoint/resume (JSONL), instance health monitoring + early abort, `top_k`/`min_p`/`enable_thinking` params, LCB `--release-version`/`--offset` - **exo_bench**: Streaming SSE (`--stream`), Kimi tokenizer fix for transformers 5.x - **Both tools**: Auto-detect running instances instead of requiring `--skip-instance-setup`; `--fresh-instance` to override - **harness**: SSE streaming client, `find_existing_instance()` shared helper, removed download timeout, settle-timeout default 0→7200s - **models.toml**: Added `enable_thinking`, aligned `max_tokens`/temps with vllm, added new models - **API**: Streaming SSE for `/bench/chat/completions` ## Why It Works - Checkpoint/resume uses append-only JSONL + skip-on-load so interrupted evals resume without re-running completed questions - Health monitoring races an `asyncio.Event` against API calls for fast abort when the instance dies - Auto-detection queries `/state` for existing instances matching the model ID before attempting placement - Streaming reuses the existing `generate_chat_stream` infrastructure from the regular chat endpoint * fix: route by in-flight tasks only — completed tasks were skewing load balance (exo-explore#1989) The load balancer counted ALL tasks (Complete, Cancelled, TimedOut, Failed) instead of only Pending/Running ones. With 138 accumulated tasks and only 7 active, routing decisions were based on historical distribution, causing one node to appear permanently 'busier' and starving the other of work. Co-authored-by: Adam Durham <adam@example.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * MLX P/D (exo-explore#1993) ## Motivation MLX only prefill server for Apple Silicon * fix: uninstall-exo.sh removes both current and legacy bridge scripts (exo-explore#1998) ## Summary The standalone `app/EXO/uninstall-exo.sh` only knew about the legacy filename `disable_bridge_enable_dhcp.sh`. On machines installed with newer EXO versions, the current `/Library/Application Support/EXO/disable_bridge.sh` was left behind, and the script then reported `EXO support directory not empty, leaving in place`. This PR makes the script try both filenames, removing whichever ones exist. Tolerates **either**, **both**, or **neither** being present without erroring. The Swift `NetworkSetupHelper.makeUninstallScript()` already handles both paths correctly, so the GUI uninstall flow is unaffected — this is a script-only fix. Caught while running an end-to-end uninstall on a real machine for exo-explore#1997. ## Test plan Verified the new block in isolation against all four states: - [x] both `disable_bridge.sh` and `disable_bridge_enable_dhcp.sh` present → both removed - [x] only `disable_bridge.sh` present → removed cleanly - [x] only `disable_bridge_enable_dhcp.sh` present → removed cleanly (legacy install) - [x] neither present → prints the existing "already removed?" warning, exits 0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * implement engine abstraction for mlx and mflux (exo-explore#2000) refactor for future versions. * feat: keep-models option when uninstalling EXO (exo-explore#1997) ## Summary - Adds a **Keep downloaded models (~/.exo/models)** checkbox to the macOS uninstall confirmation dialog (Settings → Advanced → Danger Zone). The full `~/.exo` directory is now removed on uninstall by default; if the checkbox is checked, `~/.exo/models` is preserved. - The standalone `app/EXO/uninstall-exo.sh` gains a matching `--keep-models` flag and the same `~/.exo` cleanup so GUI and CLI flows stay in sync. Resolves the user home via `$SUDO_USER` since the script runs under `sudo`. Previously, "Uninstall EXO" only cleaned up system-level components (LaunchDaemon, network location, logs, app bundle) and left the entire `~/.exo` directory behind. Now uninstalling actually removes EXO's user data, with a one-click opt-out for the (potentially many GB) of downloaded models. ![Uninstall dialog with new checkbox](https://raw.githubusercontent.com/exo-explore/exo/703b7fbbf13441217ad2903bb199f07e92af4490/uninstall-dialog.png) > Note: the rendered icon in the screenshot above is the generic system folder icon because it was captured from a small standalone Swift binary (no app bundle / icon resource). When triggered from the actual EXO.app, the EXO app icon is shown. ## Test plan - [ ] Build EXO.app locally; open Settings → Advanced → Danger Zone → Uninstall EXO; confirm the new "Keep downloaded models (~/.exo/models)" checkbox is present and unchecked by default. - [ ] Uninstall with the checkbox **checked** → `~/.exo/models/` survives, all other entries under `~/.exo` are gone, system components removed, app moved to Trash. - [ ] Uninstall with the checkbox **unchecked** → `~/.exo` is fully removed. - [ ] `sudo app/EXO/uninstall-exo.sh --keep-models` → `~/.exo/models/` is preserved, the rest of `~/.exo` is removed. - [ ] `sudo app/EXO/uninstall-exo.sh` (no flag) → `~/.exo` is fully removed. - [ ] `app/EXO/uninstall-exo.sh --help` prints usage and exits 0; unknown args exit 2 with a usage hint. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Evan <evanev7@gmail.com> * feat(app): open Share Bug Report in a dedicated window (exo-explore#2003) ## Summary - Adds a top-level **Share Bug Report…** menu item to the macOS popover (between *Check for Updates* and *Quit*) with SF Symbol `ladybug`. - Clicking it opens a dedicated resizable `NSWindow` ("Send a Bug Report") that hosts the prompting / sending / success / failure flow. - Removes the description-less duplicate from Settings → Debug Info, and the dead `debugSection` it nominally lived behind. ## Why PR exo-explore#1959 added a user-description prompt to the bug-report flow, but its trigger lived inside `ContentView.debugSection` — a view that's defined but never rendered in the body. The path users actually hit was `SettingsView.sendBugReportButton`, which called `BugReportService.sendReport(isManual: true)` without ever passing `userDescription`. So the description prompt was unreachable in the built app. ## Approach Per Apple HIG, an action that requires further input before completing should open a dialog, not transform the menu inline. So: - Add a top-level menu entry that ends in `…` (HIG: ellipsis indicates "further input required"). - Move the prompting/sending/success/failure state machine into a standalone `BugReportWindowController` modeled after the existing `SettingsWindowController`. - Single-instance window with frame-autosave name, sensible `contentMinSize`, resizable, native button layout (`.cancelAction` / `.defaultAction` keyboard shortcuts), light/dark-mode-correct `.textBackgroundColor` and `.separatorColor`. - Auto-focus the description field on open. `Try Again` from failure, `Open GitHub Issue` + `Done` from success. ## Files - `app/EXO/EXO/Views/BugReportWindowController.swift` (new) — controller + view. - `app/EXO/EXO/EXOApp.swift` — wire `BugReportWindowController` as a `@StateObject` and inject as environment object. - `app/EXO/EXO/ContentView.swift` — replace inline state machine with menu item that calls `bugReportWindowController.open()`. Remove now-unused state, helpers, and dead `debugSection`. - `app/EXO/EXO/Views/SettingsView.swift` — remove duplicate `sendBugReportButton`, `sendBugReport()`, and related `@State`. Section "Debug Info" keeps Thunderbolt / interface / RDMA info. `BugReportService` is unchanged. ## Test plan - [ ] Open the menu-bar popover → confirm **Share Bug Report…** appears between *Check for Updates* and *Quit*, with a ladybug icon. - [ ] Click it → a window titled "Send a Bug Report" appears, centered, with the description editor focused. - [ ] Resize the window → size persists across re-opens (frame autosave). - [ ] Type a description, press Return → upload succeeds, success card with **Open GitHub Issue** + **Done** appears. - [ ] Click **Open GitHub Issue** → browser opens with the description pre-filled into the issue template. - [ ] Send with empty description → upload still succeeds. - [ ] Press Esc from the prompting state → window closes. - [ ] On failure (e.g., offline) → error card with **Try Again** + **Close** appears; Try Again returns to the editor with the description preserved. - [ ] Open the Settings window → Debug Info section is unchanged except the Send Bug Report button is gone. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(app): tighten Share Bug Report prompt layout (exo-explore#2008) ## Summary Follow-ups to exo-explore#2003 based on feedback that the Share Bug Report window felt visually weighty: too much padding above and below, and a description editor that invited an essay rather than a one-liner. ## Changes (one file) `app/EXO/EXO/Views/BugReportWindowController.swift`: - **Auto-size the window to its content.** Switched from `NSHostingView` + fixed `contentRect: 480x380` + SwiftUI `frame(minHeight: 320)` to `NSHostingController` with `sizingOptions = [.preferredContentSize, .minSize]`. The fixed-min combo was centering the form in dead vertical space. - **Smaller, lower-pressure editor.** Field is now labeled `Description (optional)` with a placeholder hint (`What were you doing when it broke?`) inside the editor. Editor height fixed at 72pt (was 120pt min). Replaced the long lead-in paragraph and headline with a single one-line caption between field and buttons: `Diagnostic logs will be uploaded with your report.` - **Tighter spacing.** Outer padding 20 -> 16, root spacing 16 -> 12, prompting-section spacing 12 -> 8. - **Remove em dash from copy.** `BugReportService` and the menu wiring are unchanged. ## Test plan - [ ] Click `Share Bug Report...` from the menu bar. - [ ] The window opens centered and sized to its content (no big empty bands top/bottom). - [ ] Description editor is visibly compact, with the placeholder hint showing when empty. - [ ] The optional-ness is conveyed by the field label (no separate help paragraph). - [ ] Caption `Diagnostic logs will be uploaded with your report.` appears in `.caption` style under the editor, above the buttons. - [ ] Resize the window: persists across re-opens (frame autosave still works). - [ ] Send/Cancel/Try Again/Done flows behave the same as before. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * A few targeted tweaks to address HF rate limits (exo-explore#2009) ## Motivation - exo bursts ~200 HF Hub-API requests on every cold start, blowing past the anonymous 500-req/5-min budget. - The existing retry loop catches 429 generically and gives up in ~3s — well before HF's reset window. - `file_meta` and `_download_file` had no 429 handling at all (became `AssertionError`). - Disk file-list cache was bypassed on every process restart. ## Changes All in `src/exo/download/download_utils.py` + tests. - Parse `t=` from HF's `RateLimit` header on 429; sleep `min(t, 300s) + jitter`. - Handle 429 at all three call sites (`_fetch_file_list`, `file_meta`, `_download_file`). - `n_attempts`: 3 → 5. - Disk cache now primary across restarts (24h mtime TTL). - `?recursive=true` instead of N+1 subdir walks. ## Why It Works `t=<seconds>` is HF's "wait this long and you'll be unblocked" — sleeping that long lets the window reset. Disk-cache-as-primary plus recursive listing cuts cold-start Hub-API traffic by ~10×. ## Test Plan ### Manual Testing MacBook Pro M1 Max. Tripped the real HF 429. Pre-fix: failed in 3.4s. Post-fix: slept (HF returned `t=158`) and recovered. ### Automated Testing - New `test_rate_limit_handling.py` (19 tests) — header parsing, retry-loop behaviour, plus HTTP-level coverage that mocks aiohttp to return a 429 and asserts each call site raises `HuggingFaceRateLimitError(retry_after=52.0)`. - New `TestFileListCacheTTL` in `test_offline_mode.py` — fresh cache hits, stale cache refetches. - 421 tests pass; basedpyright / ruff / nix fmt clean. * fix(macos-app): disable URL response caching for cluster-state polling (exo-explore#2005) Fixes exo-explore#2004. `ClusterStateService` polls `/state` at 2 Hz via `URLSession.shared`, which keeps an on-disk `URLCache` attached by default. Every polled response body gets persisted under `~/Library/Caches/exolabs.EXO/`, sustaining ~500–620 KB/sec of file-backed memory dirtied — far above macOS's ~25 KB/sec per-process daily-average baseline. Six microstackshot reports observed on a single Mac Studio M3 Ultra over eight days, with one 15-hour run accumulating 34.36 GB of cache writes. Heaviest stack on every diagnostic report (96–98% of samples): ``` _dispatch_workloop_worker_thread → _dispatch_block_async_invoke2 → __CFURLCache::CreateAndStoreCacheNode → write ``` Full diagnostic data and analysis in exo-explore#2004. ## What changed `ClusterStateService` now defaults to an ephemeral, non-caching `URLSession` instead of `URLSession.shared`. Cluster-state responses are time-sensitive and small; nothing benefits from being cached on disk. ```swift private static func makeNonCachingSession() -> URLSession { let config = URLSessionConfiguration.ephemeral config.urlCache = nil config.requestCachePolicy = .reloadIgnoringLocalCacheData return URLSession(configuration: config) } ``` The existing per-request `request.cachePolicy = .reloadIgnoringLocalCacheData` calls are kept as defense in depth — they only affect read behavior, but harmless to leave alongside the session-level config. ## Scope - **Behavioral**: none. Polled requests still go out at the same cadence; responses still parse the same; no semantic change to any API surface. - **Test injection**: the `session:` parameter remains in `init`, so tests can still inject a custom mock session unchanged. - **`BugReportService` and other `URLSession.shared` callers**: untouched. If maintainers prefer an app-wide URLCache disable instead, happy to switch the approach (issue body has the alternative spelled out). ## Verification Verified locally that compiling EXO with this change produces a working menubar app and `ClusterStateService` continues to fetch state correctly. After ~30 min of idle polling, no new entries in `/Library/Logs/DiagnosticReports/EXO_*.diag` and no growth in `~/Library/Caches/exolabs.EXO/`. ## Test plan - [ ] Build EXO from this branch on macOS 26.4 - [ ] Launch, let cluster state polling run for 30+ min - [ ] Confirm no new microstackshot diagnostic reports - [ ] Confirm `~/Library/Caches/exolabs.EXO/Cache.db*` does not grow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Jordan Miller <jordan.d.miller@gmail.com> * feat: update rdma_ctl instructions (exo-explore#1977) ## Motivation The RDMA setup instructions were missing a step: after booting to Recovery mode, users need to open Terminal from the Utilities menu before they can run the `rdma_ctl` command. Without this step, users following the instructions wouldn't know how to access a terminal in Recovery mode. This step was already in the README just not in the UI notifications. ## Changes Added a missing instruction step — "Open Terminal from the Utilities menu" — to three instances of the RDMA setup flow in `dashboard/src/routes/+page.svelte`. ## Why It Works N/A copy change only. ## Test Plan ### Manual Testing Hardware: MacBook Pro M4 Max 48GB ### Automated Testing No automated tests affected; this is a UI copy change only. Co-authored-by: Sam Bradbury <sam@consultbradbury.com> * Initialise _cancelled_tasks in ImageEngine (exo-explore#2051) we yielded nonsense chunks from engines; we didn't initialize the image engine correctly. mostly rewrite of exo-explore#2049 --------- Co-authored-by: ciaranbor <ciaranborourke-dev@proton.me> * fix(inference): prevent TP collective deadlock via agree_on_tasks order (exo-explore#2048) If you have two machines and make two requests at the same time, it can crash. This is because the tasks can sometimes end up in different orders on different machines. We need to sort the tasks and mx_all_gather_tasks already sorts the tasks but the code ignores that ordering. The fix is to make sure the sort order is preserved. The rest is written by Sonnet (reviewed by me): Tensor-parallel inference requires that every rank enqueues tasks in the same order before running agree_on_tasks collectives. The old implementation filtered from _maybe_queue: self._queue.extend(task for task in self._maybe_queue if task in agreed) self._maybe_queue = [task for task in self._maybe_queue if task in different] Because _maybe_queue is independently ordered per-rank (tasks arrive via gRPC in whatever order the API server sends them), two concurrent requests could produce different _maybe_queue orderings on rank 0 vs rank 1. The filter then preserved those different orders into _queue, so each rank started processing tasks in a different sequence. The next mlx collective (all_reduce, all_gather, etc.) on rank 0 corresponded to a different task than on rank 1 → permanent deadlock. Fix: extend from agreed directly. mx_all_gather_tasks returns agreed as a list sorted by task_id on all ranks, so every rank appends the same sequence regardless of local arrival order. Applies to both SequentialGenerator and BatchGenerator. ## Motivation `agree_on_tasks` is called on every rank after accumulating new requests in `_maybe_queue`. Its job is to run an `all_gather` collective so all ranks agree on which tasks to promote to `_queue` before the next inference step. The old implementation re-imposed **local arrival order** when extending `_queue`: ```python self._queue.extend(task for task in self._maybe_queue if task in agreed) ``` `mx_all_gather_tasks` already returns `agreed` sorted by `task_id` — the same deterministic order on every rank. But iterating `self._maybe_queue` instead of `agreed` discarded that sort and substituted the local gRPC arrival order, which differs per rank under concurrent load. Two concurrent requests arriving in `[A, B]` order on rank 0 and `[B, A]` on rank 1 caused the first MLX collective in the next step to hang permanently: each rank was executing a different task's collective and would never match. ## Changes `SequentialGenerator.agree_on_tasks` and `BatchGenerator.agree_on_tasks`: ```python # Before self._queue.extend(task for task in self._maybe_queue if task in agreed) self._maybe_queue = [task for task in self._maybe_queue if task in different] # After self._queue.extend(agreed) # preserves mx_all_gather_tasks sort order self._maybe_queue = list(different) # already in local order; filter was redundant ``` ## Why It Works `mx_all_gather_tasks` (in `utils_mlx.py`) computes the agreed set then sorts by `task_id`: ```python agreed = [local_tasks[tid] for tid in sorted(agreed_ids)] ``` Because `task_id` is a UUID and the sort is lexicographic, every rank produces the same `agreed` list regardless of local arrival order. Using `agreed` directly preserves this guarantee. The `different` list (tasks not yet seen on all ranks) is built by iterating `tasks` in local order, which is already correct. ## Test Plan ### Manual Testing **Hardware:** 2× Mac Studio M3 Ultra 512 GB, Thunderbolt 5 direct bridge, `MlxJaccl` RDMA tensor-parallel (`moonshotai/Kimi-K2.6`, 595 GB INT4, 61 layers). - Sent concurrent streaming requests; confirmed all complete without deadlock. - This hardware configuration (sub-millisecond inter-node latency) is the most likely to trigger the race, as requests from separate HTTP connections can reach rank 0 and rank 1 in opposite order before `agree_on_tasks` runs. ### Automated Testing All existing tests pass: `pytest src -m "not slow" --import-mode=importlib` — 422/422 passed. The existing `test_event_ordering.py` covers the `agree_on_tasks` call path with a mock that returns tasks in consistent order; the race requires real distributed hardware to reproduce deterministically. --------- Co-authored-by: Adam Durham <amdnative@gmail.com> Co-authored-by: Adam Durham <adam@example.com> Co-authored-by: rltakashige <rl.takashige@gmail.com> Co-authored-by: Evan <evanev7@gmail.com> Co-authored-by: ciaranbor <81697641+ciaranbor@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com> Co-authored-by: ecohash-co <team@ecohash.co> Co-authored-by: Jordan Miller <jordan.d.miller@gmail.com> Co-authored-by: Sam Bradbury <31943456+sambradbury@users.noreply.github.com> Co-authored-by: Sam Bradbury <sam@consultbradbury.com> Co-authored-by: ciaranbor <ciaranborourke-dev@proton.me> Co-authored-by: Drifter4242 <davehind@yahoo.co.uk> Co-authored-by: jw-wcv <101585096+jw-wcv@users.noreply.github.com>

Earlier note said exo-explore was 'severely backlogged' — the merge data contradicts that. exo-explore merged 10 PRs in the 2 weeks before 2026-05-27 (exo-explore#2087-exo-explore#2114), but every one was by a core/frequent contributor (AndreiCravtov, Evanev7, Heidar-An). Zero external-fork PRs merged. Accurate read: 'merges insider work fast, ignores external PRs' — so waiting does nothing for our 6 untouched exo PRs; they need a maintainer to actively adopt. Low ROI to chase. Also added mlx-lm read: not hostile, just paused ~3 weeks (was actively merging external work 36-38d ago). exo-explore#1216 has a community +1. Nudged exo#2121 framing it as a regression from exo-explore#2000 (engine abstraction), @-mentioned Evanev7 who authored that refactor and is an active merger.

Evanev7 mentioned this pull request Apr 28, 2026

add engine interface and impls for mlx and mflux #1946

Closed

Evanev7 force-pushed the runner-refactor-post-prefill branch from 7e5b6ef to 0272df2 Compare April 28, 2026 00:28

rltakashige reviewed Apr 28, 2026

View reviewed changes

Evanev7 force-pushed the runner-refactor-post-prefill branch from 0272df2 to f55ce12 Compare April 28, 2026 00:50

rltakashige approved these changes Apr 28, 2026

View reviewed changes

implement engine abstraction for mlx and mflux

03f0808

Evanev7 force-pushed the runner-refactor-post-prefill branch from f55ce12 to 03f0808 Compare April 28, 2026 00:53

Evanev7 enabled auto-merge (squash) April 28, 2026 00:54

Evanev7 merged commit c80b10c into main Apr 28, 2026
6 checks passed

Evanev7 deleted the runner-refactor-post-prefill branch April 28, 2026 00:58

team-wcv mentioned this pull request May 7, 2026

Sync with upstream exo-explore/exo (15 commits) team-wcv/exo#13

Merged

5 tasks

adurham mentioned this pull request May 27, 2026

fix(runner): only rank 0 emits ChunkGenerated under tensor-parallel execution #2121

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement engine abstraction for mlx and mflux#2000

implement engine abstraction for mlx and mflux#2000
Evanev7 merged 1 commit into
mainfrom
runner-refactor-post-prefill

Evanev7 commented Apr 28, 2026

Uh oh!

rltakashige Apr 28, 2026

Uh oh!

rltakashige Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Evanev7 commented Apr 28, 2026

Uh oh!

rltakashige Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

rltakashige Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants