-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
This tutorial shows how a client can save KV Cache on disk and restore it using Slot Persistence. The underlying code exists and works for anyone building their own client. However, it's currently missing from Web-UI.
Motivation
The prompt_cache is enabled by default and works perfectly in llama.cpp today with most clients. The issue is once the model is unloaded (which is common for normal consumer hardware), the entire KV Cache is discarded, and must be reprocessed if the user wants to launch the same conversation minutes, hours, days later.
I see llama-server webui implementing export and import feature for conversations, which cements the need to have a faster reload of older conversations.
Possible Implementation
Look at the tutorial here as a start.
firefox42 and jlherren
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request