-
Notifications
You must be signed in to change notification settings - Fork 15
Phase 4: Radix-trie and block-pool unification (scheduler wiring) #121
Copy link
Copy link
Open
Labels
area:architectureArchitecture and code structure changesArchitecture and code structure changesarea:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layerspriority:highHigh priorityHigh prioritystatus:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Milestone
Metadata
Metadata
Assignees
Labels
area:architectureArchitecture and code structure changesArchitecture and code structure changesarea:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layerspriority:highHigh priorityHigh prioritystatus:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Type
Fields
Give feedbackNo fields configured for issues without a type.
Context
This is the actual unification and the core of the epic. Today the radix trie stores whole dense
DetachedCacheSetsnapshots and adopt moves tensors into a new sequence, and the whole path is disabled under the paged backend. Make trie nodes reference physical blocks and reuse them across requests by refcount, reusing the copy-on-write primitive that already exists incache/paged_detach.rs.Tasks
src/server/prompt_cache/entry.rs,store.rs) to hold physical block references viaDetachedPagedCacheSet/retain_blockinstead of denseDetachedCacheSet.CachePool::detach_paged/adopt_pagedintodonate_finished_sequence_cacheandtry_adopt_cached_prefix(scheduler.rs).decode_storage_backend == Paged -> return Noneguard intry_adopt_cached_prefixand replace it with the paged adopt path.Acceptance criteria
cached_tokensin the usage payload is reported correctly for paged hits.Dependencies
Blocked by Phase 3 (paged prefill).
Part of #116