Skip to content

Conversation

@erhant
Copy link
Member

@erhant erhant commented Oct 23, 2025

andthattoo and others added 10 commits October 21, 2025 22:00
cleaned logs related to removed I/O paths
removed getattr/hasattr
added metrics for streaming via profile:"true"
▌Python(35541,0x1f90d4f40) malloc: Double free of object 0x11db3c620
▌Python(35541,0x1f90d4f40) malloc: *** set a breakpoint in malloc_error_break to debug
feat: openai compat & udp-based discovery
  - Streaming metrics only when profile=true.
  - Warmup serialized with MLX lock; no unload during warmup.
  - Offload uses mmap-only loads (mx.load fast-path disabled unless explicitly enabled).
  - No getattr in hot paths.
activations typed as fp16 instead of fp32
pool size reduced
warmpup cancelled for offload mode
@andthattoo andthattoo force-pushed the refactor/shard-strategy-pattern branch from b831aed to 554d26f Compare October 24, 2025 00:52
repacking now repacks api weights too
  - Inter-device overlap only (idle prefetch after TX), no intra-device overlap.
  - Resident window size = 1 enforced by immediate eviction/shrink.
  - MLX kernels remain hot (no cache flushing in the hot path).
  - Prefetch plumbing and config complexity removed

  This matches exaclt what's modelled in distilp, yielding a rather deterministic planning for layer assignemnt.
andthattoo and others added 7 commits October 28, 2025 03:00
andthattoo and others added 6 commits October 30, 2025 20:20
- Add per-layer repack utility with manifest; repack only assigned layers
- Enable mx.load fast-path by default in sliding_fit; disable OS prefetch
- Implement true delta-swap in sliding_fit (no whole-window churn)
- Wire repack in load_model; always log [REPACK] with timing (profile or not)
- Add shard/API endpoints to clean repacked folders
- Fix lints in benches; warn when fastpath used without per-layer repacks
warmup for offloading
dtype in fit mode offload = self
chore: remove `model_name` in favor of `topology.model` and small edits
@andthattoo andthattoo force-pushed the refactor/shard-strategy-pattern branch from 4afed30 to 8977089 Compare October 31, 2025 11:45
- Force materialization of mx.load weights during prepare
-Schedule offload prefetch before network I/O; remove duplicate post-RPC prefetch
-Track prepared windows per nonce, not globally
@andthattoo andthattoo marked this pull request as ready for review November 7, 2025 13:51
@andthattoo andthattoo merged commit 213cc2a into master Nov 7, 2025
@erhant erhant mentioned this pull request Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants