Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save/Load Just One Sequence #5843

Closed
martindevans opened this issue Mar 3, 2024 · 4 comments
Closed

Save/Load Just One Sequence #5843

martindevans opened this issue Mar 3, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@martindevans
Copy link
Contributor

Feature Description

Would it be possible to create functions that looked something like this:

  • llama_kv_save_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * dst);
  • llama_kv_load_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * src);

Motivation

In llama.cpp it is possible to save and load the entire context state in one operation with llama_copy_state_data and llama_set_state_data. For example this could be used to evaluate a large system prompt once, save it to disk, and then load the state every time a new conversation is started.

However with the batch decoding this isn't really possible. If you have many sequences being evaluated at once you can only load and save them all simultaneously.

@martindevans martindevans added the enhancement New feature or request label Mar 3, 2024
@ngxson
Copy link
Collaborator

ngxson commented Mar 3, 2024

+1 for this, also, I'd prefer to have a struct save_config to llama_session_save / llama_session_load.

The reason is because we may have other save options in the future, for example in my playground I'm experimenting with the ability to use f16 KV cache, but save/load them as q4_K.

With that, we can also choose save / not to save embeddings / logits

@kaetemi
Copy link
Collaborator

kaetemi commented Mar 9, 2024

+1, for saving and loading cache of individual slots in server.

@kaetemi
Copy link
Collaborator

kaetemi commented Mar 27, 2024

Working on this. :)

@kaetemi
Copy link
Collaborator

kaetemi commented Apr 20, 2024

Implemented as llama_state_seq_get_size, llama_state_seq_get_data, and llama_state_seq_set_data in commit beea6e1.

@kaetemi kaetemi closed this as completed Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants