feat: refactor shard strategy & discovery #15

erhant · 2025-10-23T08:34:23Z

Discovery has been moved to UDP, instead of mDNS (see feat: openai compat & udp-based discovery #14)
Added a post-process for k=1 case where devices with a single layer are ignored & theirs layers are given to immediate neighbors (see chore: remove model_name in favor of topology.model and small edits #29)
Variable renaming for instance (see chore: remove model_name in favor of topology.model and small edits #29)
Now the loaded model is given via topology.model instead of self.model. (see chore: remove model_name in favor of topology.model and small edits #29)
...

cleaned logs related to removed I/O paths removed getattr/hasattr

added metrics for streaming via profile:"true"

▌Python(35541,0x1f90d4f40) malloc: Double free of object 0x11db3c620 ▌Python(35541,0x1f90d4f40) malloc: *** set a breakpoint in malloc_error_break to debug

…late

feat: openai compat & udp-based discovery

- Streaming metrics only when profile=true. - Warmup serialized with MLX lock; no unload during warmup. - Offload uses mmap-only loads (mx.load fast-path disabled unless explicitly enabled). - No getattr in hot paths.

activations typed as fp16 instead of fp32 pool size reduced warmpup cancelled for offload mode

repacking now repacks api weights too

- Inter-device overlap only (idle prefetch after TX), no intra-device overlap. - Resident window size = 1 enforced by immediate eviction/shrink. - MLX kernels remain hot (no cache flushing in the hot path). - Prefetch plumbing and config complexity removed This matches exaclt what's modelled in distilp, yielding a rather deterministic planning for layer assignemnt.

-remove redundant sequential_io attribute -Remove dead code cause "Error processing activation: name 'next_window' is not defined"

redundent comments removed

chore: rename `service` / `name` to `instance`, some minor fixes

- Add per-layer repack utility with manifest; repack only assigned layers - Enable mx.load fast-path by default in sliding_fit; disable OS prefetch - Implement true delta-swap in sliding_fit (no whole-window churn) - Wire repack in load_model; always log [REPACK] with timing (profile or not) - Add shard/API endpoints to clean repacked folders - Fix lints in benches; warn when fastpath used without per-layer repacks

warmup for offloading dtype in fit mode offload = self

…y-model

chore: remove `model_name` in favor of `topology.model` and small edits

- Force materialization of mx.load weights during prepare -Schedule offload prefetch before network I/O; remove duplicate post-RPC prefetch -Track prepared windows per nonce, not globally

remove lazy params in model init not used for quantized anyways

fixed quantization errors, removed hardcoded guards + fallbacks added gpt-oss reformat

repacking removal default kv is 4 bits

…-strategy-pattern firstbatchxyz#30 firstbatchxyz#28 firstbatchxyz#9 firstbatchxyz#5

andthattoo and others added 10 commits October 21, 2025 22:00

simplified I/O paths to mlx.load and mmap+madvise

703022a

cleaned logs related to removed I/O paths removed getattr/hasattr

ruff SIM fixes

93c7ef3

fixed missin set_prefetch mode after config is set

20f46e5

prefetch fixes for config inits

d3b2686

fit mode should stream

e33e635

added metrics for streaming via profile:"true"

phase 1: prefer TB over wifi for token callbacks

418c994

mlx.lock added to warmup to prevent :

6f1c25f

▌Python(35541,0x1f90d4f40) malloc: Double free of object 0x11db3c620 ▌Python(35541,0x1f90d4f40) malloc: *** set a breakpoint in malloc_error_break to debug

bump discovery, add necessary fixes, add openai compat & test boilerp…

57faee8

…late

rm unused

4d7297b

Merge pull request #14 from firstbatchxyz/erhant/openai-compat-bump

a28700e

feat: openai compat & udp-based discovery

erhant assigned andthattoo Oct 23, 2025

andthattoo added 6 commits October 23, 2025 12:27

- TB used for final-token callbacks (no Wi‑Fi hop in the loop).

c1a2d1e

- Streaming metrics only when profile=true. - Warmup serialized with MLX lock; no unload during warmup. - Offload uses mmap-only loads (mx.load fast-path disabled unless explicitly enabled). - No getattr in hot paths.

repacking fix for tokenizers

01aea59

gitignore update for repacking

c1ef76c

read ahead for mx.load path

ba49a81

activations typed as fp16 instead of fp32 pool size reduced warmpup cancelled for offload mode

_warmup_completed was missinig from attrib.py

5b3d39e

path reset fixed

554d26f

andthattoo force-pushed the refactor/shard-strategy-pattern branch from b831aed to 554d26f Compare October 24, 2025 00:52

andthattoo added 4 commits October 24, 2025 15:10

added llama (3 series)

da4c747

repacking now repacks api weights too

disable fast path for prima-like testing

9875350

less logs

7364920

GandalfTea mentioned this pull request Oct 27, 2025

feat: add sharding support for mlx-lm models #5

Open

18 tasks

andthattoo and others added 7 commits October 28, 2025 03:00

-changed kv-cache quant flags from int8 to 8bit

37659ca

-remove redundant sequential_io attribute -Remove dead code cause "Error processing activation: name 'next_window' is not defined"

submodule updates

c7f3d72

redundent comments removed

rename service / name to instance, some minor fixes

d7ab225

Merge pull request #26 from firstbatchxyz/erhant/refactor-renaming

77895e6

chore: rename `service` / `name` to `instance`, some minor fixes

remove topology.model in favor of model_name and small edits

5f6e61d

added kv cache bits, seq len to topology request

47005fc

remove model_name, add few types

875ddd4

andthattoo and others added 6 commits October 30, 2025 20:20

distilp update

68d7338

pass kv-bits from API to shards

db501a5

the ring is quite cool frodo:

b12c671

warmup for offloading dtype in fit mode offload = self

Merge branch 'refactor/shard-strategy-pattern' into erhant/rm-topolog…

c98e66c

…y-model

Merge pull request #29 from firstbatchxyz/erhant/rm-topology-model

8977089

chore: remove `model_name` in favor of `topology.model` and small edits

andthattoo force-pushed the refactor/shard-strategy-pattern branch from 4afed30 to 8977089 Compare October 31, 2025 11:45

andthattoo added 5 commits October 31, 2025 16:05

small fixes

4bb0fd1

delta-swap fix

1e1627a

delta-swap fix 2

456bcd4

Improve offload overlap and correctness

4fe1f00

- Force materialization of mx.load weights during prepare -Schedule offload prefetch before network I/O; remove duplicate post-RPC prefetch -Track prepared windows per nonce, not globally

final window prefetches first window (head)

59a73d4

erhant mentioned this pull request Oct 31, 2025

feat: refactors & shard screen & fixes firstbatchxyz/dnet-tui#9

Merged

4 tasks

andthattoo added 13 commits November 2, 2025 23:18

remove redundant mmap.close()

0b14b36

remove duplicate code by using delta_swap_eviction function

9de6aca

add gpt-oss

f0a7a6c

remove lazy params in model init not used for quantized anyways

add gpt-oss sanitize logic from mlx-lm

a5c1dbf

remove redunant mlx.core import

9de4294

path fix for API config load

3d06105

fix mxfp4 overrides in gpt-oss

05fdc4e

tiny fix

735e2ba

tiny fix

405d53a

simplifed model implementations

75d8a1f

fixed quantization errors, removed hardcoded guards + fallbacks added gpt-oss reformat

add sanitize functions

8ccda19

move shared function to base.py for models

696c0d1

repacking removal default kv is 4 bits

force fp16 cache for gpt-oss

9b5f64f

andthattoo marked this pull request as ready for review November 7, 2025 13:51

andthattoo merged commit 213cc2a into master Nov 7, 2025

erhant mentioned this pull request Nov 7, 2025

Feat repacking discovery #24

Closed

Yuvrajxms09 pushed a commit to Yuvrajxms09/dnet that referenced this pull request Dec 21, 2025

Merge pull request firstbatchxyz#15 from firstbatchxyz/refactor/shard…

028dc44

…-strategy-pattern firstbatchxyz#30 firstbatchxyz#28 firstbatchxyz#9 firstbatchxyz#5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: refactor shard strategy & discovery #15

feat: refactor shard strategy & discovery #15

Uh oh!

erhant commented Oct 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: refactor shard strategy & discovery #15

feat: refactor shard strategy & discovery #15

Uh oh!

Conversation

erhant commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erhant commented Oct 23, 2025 •

edited

Loading