Skip to content

Feature Request: Implement KV Cache Persistance on Disk for system+user on llama-server webui #17107

@jhemmond

Description

@jhemmond

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

This tutorial shows how a client can save KV Cache on disk and restore it using Slot Persistence. The underlying code exists and works for anyone building their own client. However, it's currently missing from Web-UI.

Motivation

The prompt_cache is enabled by default and works perfectly in llama.cpp today with most clients. The issue is once the model is unloaded (which is common for normal consumer hardware), the entire KV Cache is discarded, and must be reprocessed if the user wants to launch the same conversation minutes, hours, days later.

I see llama-server webui implementing export and import feature for conversations, which cements the need to have a faster reload of older conversations.

Possible Implementation

Look at the tutorial here as a start.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions