[RFC]: [Store] KVCache offloading to SSD in DFS

### Changes proposed

As mentioned in the previous issue #171,#333 , by offloading KV cache to SSDs to support Mooncake's multi-level caching mechanism, we can further improve the reuse rate of KV cache and address the issue of limited DRAM space in certain scenarios.  

Currently, we have implemented **Version 1 of KV cache offloading**, #437 with the following mechanisms:  
- **Client-side persistence**: We plan to offload and install KV cache on **DFS (3FS)** to facilitate unified file synchronization across nodes. All read/write/query operations for KV cache objects are performed entirely on the client side, with the master node remaining unaware of them. The index mapping from keys to KV cache objects in the file system is maintained by a fixed indexing mechanism, where each file corresponds to a KV cache object (the filename serves as the key).  
- **POSIX read/write**: Currently, all file I/O operations are performed using POSIX interfaces. For `put`/`batchput` operations, we only submit a persistence request to the thread pool **after a successful in-memory write**, without further verification of write success. (If the write fails, the file is automatically deleted to prevent indexing by other instances.) For `get` operations, synchronous reads are used, while `batchget` employs asynchronous batch reads to improve throughput.  

### **Future To-Do List**  
1. **Native 3FS Interface**  (Merged)
   Since the ultimate goal is to support this persistence feature on **3FS**, and the current POSIX implementation (via FUSE) still impacts I/O performance, we plan to introduce a **3FS-native plugin interface** to further optimize file read performance for `get`/`batchget`.  

2. **Master-Managed KV Cache in SSD**  (Merged)
   The current implementation manages SSD KV cache on the client side, with metadata synchronization handled by DFS (the master remains unaware). While this approach ensures loose coupling, the lack of centralized management introduces **consistency and performance issues**. Future plans include migrating KV cache metadata to the master, leveraging an extended replica mechanism to support both **memory and disk modes**. Benefits include:  
   - **Reduced query latency**: Currently, `query`/`exist` operations require filesystem access, incurring high overhead for large datasets. Moving metadata to the master enables single-RPC lookups for SSD/memory status.  
   - **Consistent behavior**: Ensures alignment with memory semantics for operations like `removeAll` and `tearDownAll`.  
   - **Race condition mitigation**: Resolves issues like "remove-before-write" through centralized coordination.  

3. **File Eviction Mechanism**  (WIP)
   Currently, file deletion relies on manual user calls (`remove`/`removeAll`) or admin intervention. Without automatic eviction, long-running clusters risk **storage bloat**. Future versions will introduce **monitoring and auto-eviction policies**.  

4. **Master-Triggered Eviction & Persistence**  (WIP)
   Presently, every successful `put` triggers persistence, effectively backing up KV cache entries. We aim to shift persistence to the **master’s eviction phase**, where evicted data is written to SSDs. Challenges include:  
   - The master currently handles only metadata, not data flow.  
   - Data distribution across nodes complicates persistence during eviction.  
   A well-designed solution will be explored in future iterations.  

We welcome feedback and suggestions on this design and implementation.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues and read the [documentation](https://kvcache-ai.github.io/Mooncake/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: [Store] KVCache offloading to SSD in DFS #578

Changes proposed

Future To-Do List

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: [Store] KVCache offloading to SSD in DFS #578

Description

Changes proposed

Future To-Do List

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions