Skip to content

Feature Request: llama-server hot swapping cvectors via API like we can do with LoRA adapters now #10685

@gghfez

Description

@gghfez

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

The ability load/unload and adjust the scale of cvectors via API, similar to the new LoRA scale/host-swap feature recently implmented:

POST /lora-adapters: Set list of LoRA adapters

To disable an adapter, either remove it from the list below, or set scale to 0.
Request format

To know the id of the adapter, use GET /lora-adapters

[
  {"id": 0, "scale": 0.2},
  {"id": 1, "scale": 0.8}
]
``

I read in the change log that this was inspired by cvector scaling which is already implemented, so would it be possible to expose this via the API as well?

### Motivation

During creative writing, I often use control-vectors to steer the responses of the AI, using a simple web ui with sliders to tweak the vector.

Currently, I've written a wrapper API/web ui with sliders for the different vectors so I can adjust them as needed.
However, after each change to the scaling, or toggling a cvector on/off, I have to restart the llama-server and reload the model.

If we could get this in the llama-server API instead, it would make cvectors useful for a lot of other people, and I could do away with the entire wrapper server I wrote.

### Possible Implementation

This could be exposed the same way LoRAs are right now
GET /cvectors
[
    {
        "id": 0,
        "path": "language-ornate_vs_simple.gguf",
        "scale": 0.7
    },
    {
        "id": 1,
        "path": "character-focus-naration_vs_dialogue.gguf",
        "scale": 0.2
    }
]

POST /cvectors
[
  {"id": 0, "scale": 0.5},
  {"id": 1, "scale": 0.5}
]


For reference, this is how they're called via command line at the moment:

--control-vector XXXXX-language__debias.gguf \
--control-vector-scaled XXXXX-language__ornate.gguf 0.20

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions