Feature Request: Apply LoRA adapters per-request

### Prerequisites

- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Server now supports hot-swapping LoRA adapters via `/lora-adapters` endpoint, which changes the global adapter config.

With this, the only "safe" moment to apply LoRA changes is when all slots are idle.

However, this is not practical in case the server has a **high number of requests** (ref: https://github.com/ggerganov/llama.cpp/issues/10374). With continuous batching, the chance of all slots become idle is rare.

### Motivation

-

### Possible Implementation

1. We can group only requests using the same LoRA config to the same batch
2. Call `common_lora_adapters_apply` before processing the batch (remember to clear KV if needed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Apply LoRA adapters per-request #10377

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Apply LoRA adapters per-request #10377

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions