-
Notifications
You must be signed in to change notification settings - Fork 7
feat: refactor shard strategy & discovery #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cleaned logs related to removed I/O paths removed getattr/hasattr
added metrics for streaming via profile:"true"
▌Python(35541,0x1f90d4f40) malloc: Double free of object 0x11db3c620 ▌Python(35541,0x1f90d4f40) malloc: *** set a breakpoint in malloc_error_break to debug
feat: openai compat & udp-based discovery
- Streaming metrics only when profile=true. - Warmup serialized with MLX lock; no unload during warmup. - Offload uses mmap-only loads (mx.load fast-path disabled unless explicitly enabled). - No getattr in hot paths.
activations typed as fp16 instead of fp32 pool size reduced warmpup cancelled for offload mode
b831aed to
554d26f
Compare
repacking now repacks api weights too
- Inter-device overlap only (idle prefetch after TX), no intra-device overlap. - Resident window size = 1 enforced by immediate eviction/shrink. - MLX kernels remain hot (no cache flushing in the hot path). - Prefetch plumbing and config complexity removed This matches exaclt what's modelled in distilp, yielding a rather deterministic planning for layer assignemnt.
18 tasks
-remove redundant sequential_io attribute -Remove dead code cause "Error processing activation: name 'next_window' is not defined"
redundent comments removed
chore: rename `service` / `name` to `instance`, some minor fixes
- Add per-layer repack utility with manifest; repack only assigned layers - Enable mx.load fast-path by default in sliding_fit; disable OS prefetch - Implement true delta-swap in sliding_fit (no whole-window churn) - Wire repack in load_model; always log [REPACK] with timing (profile or not) - Add shard/API endpoints to clean repacked folders - Fix lints in benches; warn when fastpath used without per-layer repacks
warmup for offloading dtype in fit mode offload = self
chore: remove `model_name` in favor of `topology.model` and small edits
4afed30 to
8977089
Compare
- Force materialization of mx.load weights during prepare -Schedule offload prefetch before network I/O; remove duplicate post-RPC prefetch -Track prepared windows per nonce, not globally
4 tasks
remove lazy params in model init not used for quantized anyways
fixed quantization errors, removed hardcoded guards + fallbacks added gpt-oss reformat
repacking removal default kv is 4 bits
Closed
Yuvrajxms09
pushed a commit
to Yuvrajxms09/dnet
that referenced
this pull request
Dec 21, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
k=1case where devices with a single layer are ignored & theirs layers are given to immediate neighbors (see chore: removemodel_namein favor oftopology.modeland small edits #29)instance(see chore: removemodel_namein favor oftopology.modeland small edits #29)topology.modelinstead ofself.model. (see chore: removemodel_namein favor oftopology.modeland small edits #29)