fix: only reset LoRa configs when they have changed from previous batch by agent-enemy-2 · Pull Request #19280 · ggml-org/llama.cpp

agent-enemy-2 · 2026-02-03T01:56:27Z

Overview

Fix for #19217

Currently we are setting the LoRa config for each and every token request, even if the configuration between batches has not changed. It appears in this PR, in llama-context.cpp, line 1078 we added schedule reserving for when we set new LoRa configs, and so now when we have a LoRa config, for every batch we are decoding, we are reserving the scheduler.

This change adds a field to the server slot that holds the last batche's LoRa config. For each batch, we run a check to see if the config is the same as the previous batch, and only if it is not do we go ahead and set it. Otherwise, proceed.

Testing

I was able to re-produce the issue relatively easily with some debug logs

./llama-server -hf bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0 --lora ~/Downloads/LoRA-Llama-3.1-8B-MultiReflection-f16.gguf -c 1024 --parallel 1 --host 0.0.0.0 --port 8080 --verbose

and was able to see this endlessly along with huge memory spikes:

Setting sched_need_reserve to TRUE in the set_adapter_lora functionset_embeddings: value = 0
Reserving memory during decoding

sched_reserve: reserving ...
sched_reserve: max_nodes = 3692
srv    operator(): http: streamed chunk: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" requests"}}],"created":1770077607,"id":"chatcmpl-1gqsKPcqCDHRRMZFNqfWqnl557ARXVc9","model":"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0","system_fingerprint":"b7916-0dfcd3b60","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":109,"prompt_ms":628.118,"prompt_per_token_ms":5.762550458715597,"prompt_per_second":173.5342722227352,"predicted_n":2,"predicted_ms":180.088,"predicted_per_token_ms":90.044,"predicted_per_second":11.105681666740704}}


sched_reserve: reserving full memory module
sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
sched_reserve:       MTL0 compute buffer size =   509.12 MiB
sched_reserve:        CPU compute buffer size =    18.01 MiB
sched_reserve: graph nodes  = 1903
sched_reserve: graph splits = 2

After these changes, the issue no longer appears and memory remains stable.

ggerganov

Better to move the check inside llama_context::set_adapter_lora() and not set the sched_need_reserve flag if the same adapters are passed.

agent-enemy-2 · 2026-02-04T05:38:07Z

Had to implement the check in the common_set_adapter_lora because in the method we were clearing the loras every time, and then setting sched_need_reserve flag as well.

Added some helpers to validate if the lora configs are the same, and if they are, skip clearing and skip reserving the scheduler.

ggerganov · 2026-02-04T10:43:28Z

Had to implement the check in the common_set_adapter_lora because in the method we were clearing the loras every time, and then setting sched_need_reserve flag as well.

Can't we update the logic in llama_set_adapter_lora() to first check if they are the same and if they are - not do anything. We already have such logic in other functions, such as:

llama.cpp/src/llama-context.cpp

Lines 991 to 1001 in 6ab881b

    
           void llama_context::set_causal_attn(bool value) { 
        
               LLAMA_LOG_DEBUG("%s: value = %d\n", __func__, value); 
        
               if (cparams.causal_attn == value) { 
        
                   return; 
        
               } 
        
               cparams.causal_attn = value; 
        
               sched_need_reserve = true; 
        
           }

agent-enemy-2 · 2026-02-04T13:10:19Z

Apologies because I might be missing something.

In the common_set_adapter_lora function, we call llama_clear_adapter_lora which would prevent any checks in lama_context::set_adapter_lora from being able to tell if the LoRa adapter being passed in is the same as what is currently in the context, since there would be no current adapter.

We could just remove that call to clear the LoRas (or move that to the llama_context::set_adapter_lora`, but I wasn't sure if there is some reason we wanted that in the common functionality.

My latest change attempts to do what you're looking for, just in the common_set_adapter_lora function, prior to clearing the context LoRas.

agent-enemy-2 · 2026-02-05T04:55:40Z

@ggerganov I was able to refactor by creating a new llama_context::put_adapter_loras function that first checks if the provided LoRa configs are the same as what exists on the context and if so, do nothing. Else it clears the LoRa adapters from context and then re-applies them (same logic as before).

Is this more along the lines of what you had in mind?

tugot17 · 2026-02-11T14:18:37Z

@ggerganov I can confirm this fixes issues I had goes from 8tps to 360tps for my small 1.2B parameter model.

@agent-enemy-2 great work, thank you 😄

ggerganov · 2026-02-13T06:21:30Z

@agent-enemy-2 @tugot17 Could you confirm that the last version that I just pushed works as expected?

agent-enemy-2 · 2026-02-13T12:53:51Z

@ggerganov Just tested your latest commit and it LGTM!

@tugot17 Thanks for testing!

…-org#19280)" This reverts commit 2d8015e.

agent-enemy-2 requested review from ggerganov and ngxson as code owners February 3, 2026 01:56

loci-dev mentioned this pull request Feb 3, 2026

UPSTREAM PR #19280: fix: only reset LoRa configs when they have changed from previous batch auroralabs-loci/llama.cpp#1142

Open

github-actions bot added examples server labels Feb 3, 2026

ggerganov reviewed Feb 3, 2026

View reviewed changes

agent-enemy-2 requested review from 0cc4m, CISC, JohannesGaessler, lhez and max-krasnyansky as code owners February 4, 2026 05:16

agent-enemy-2 force-pushed the bug-fix/memory-leak branch 3 times, most recently from 8edc85d to b131ad4 Compare February 4, 2026 05:36

CISC removed request for 0cc4m, CISC, JohannesGaessler, lhez and max-krasnyansky February 4, 2026 08:21

Refactoring to use new llama_put_adapter_loras

3b82c3d

agent-enemy-2 force-pushed the bug-fix/memory-leak branch from 483b739 to 3b82c3d Compare February 5, 2026 04:52

cont : alternative lora API

67f99e6

ggerganov merged commit 2d8015e into ggml-org:master Feb 14, 2026
75 of 78 checks passed

ggerganov added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Feb 14, 2026

ggerganov mentioned this pull request Feb 14, 2026

changelog : libllama API #9289

Open

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Feb 14, 2026

Revert "llama : update LoRA API. + fix excessive graph reserves (ggml…

4916ada

…-org#19280)" This reverts commit 2d8015e.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Feb 14, 2026

Revert "llama : update LoRA API. + fix excessive graph reserves (ggml…

37ffa00

…-org#19280)" This reverts commit 2d8015e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: only reset LoRa configs when they have changed from previous batch#19280

fix: only reset LoRa configs when they have changed from previous batch#19280
ggerganov merged 2 commits intoggml-org:masterfrom
agent-enemy-2:bug-fix/memory-leak

agent-enemy-2 commented Feb 3, 2026 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

agent-enemy-2 commented Feb 4, 2026 •

edited

Loading

Uh oh!

ggerganov commented Feb 4, 2026 •

edited

Loading

Uh oh!

agent-enemy-2 commented Feb 4, 2026

Uh oh!

agent-enemy-2 commented Feb 5, 2026

Uh oh!

tugot17 commented Feb 11, 2026

Uh oh!

ggerganov commented Feb 13, 2026

Uh oh!

agent-enemy-2 commented Feb 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

agent-enemy-2 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

agent-enemy-2 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agent-enemy-2 commented Feb 4, 2026

Uh oh!

agent-enemy-2 commented Feb 5, 2026

Uh oh!

tugot17 commented Feb 11, 2026

Uh oh!

ggerganov commented Feb 13, 2026

Uh oh!

agent-enemy-2 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

agent-enemy-2 commented Feb 3, 2026 •

edited

Loading

agent-enemy-2 commented Feb 4, 2026 •

edited

Loading

ggerganov commented Feb 4, 2026 •

edited

Loading

agent-enemy-2 commented Feb 13, 2026 •

edited

Loading