Skip to content

Phase 4: Radix-trie and block-pool unification (scheduler wiring) #121

@inureyes

Description

@inureyes

Context

This is the actual unification and the core of the epic. Today the radix trie stores whole dense DetachedCacheSet snapshots and adopt moves tensors into a new sequence, and the whole path is disabled under the paged backend. Make trie nodes reference physical blocks and reuse them across requests by refcount, reusing the copy-on-write primitive that already exists in cache/paged_detach.rs.

Tasks

  • Change prompt-cache entries (src/server/prompt_cache/entry.rs, store.rs) to hold physical block references via DetachedPagedCacheSet / retain_block instead of dense DetachedCacheSet.
  • Wire CachePool::detach_paged / adopt_paged into donate_finished_sequence_cache and try_adopt_cached_prefix (scheduler.rs).
  • Remove the decode_storage_backend == Paged -> return None guard in try_adopt_cached_prefix and replace it with the paged adopt path.
  • On a prefix hit, point the new sequence's block table at the matched physical blocks (refcount bump), with no tensor copy; the first write triggers the existing COW automatically.
  • Reuse the APC block-hash chains as the block-granular match key so matching aligns to block boundaries.

Acceptance criteria

  • Prefix reuse and paged batching run simultaneously (the documented "Dense backend only" limitation no longer applies).
  • A hit reuses physical blocks with zero tensor copy, verified by observing refcount > 1 on the shared blocks.
  • cached_tokens in the usage payload is reported correctly for paged hits.

Dependencies

Blocked by Phase 3 (paged prefill).

Part of #116

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:architectureArchitecture and code structure changesarea:coremlxcel-core: MLX FFI, primitives, KV cache, layerspriority:highHigh prioritystatus:backlogIn the backlog, not yet readytype:enhancementNew features, capabilities, or significant additions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions