tests: enable kv_unified for test-backend-sampler by taronaeo · Pull Request #20645 · ggml-org/llama.cpp

taronaeo · 2026-03-16T14:51:13Z

While running tests to ensure that my local server is ready to be onboard as a GGML CI runner, I ran into an odd CUDA OOM error for the dist sampling test as shown:

2026-03-15T15:43:18.5103166Z ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5320.00 MiB on device 0: cudaMalloc failed: out of memory
2026-03-15T15:43:18.5103360Z alloc_tensor_range: failed to allocate CUDA0 buffer of size 5578424320
2026-03-15T15:43:18.5103633Z llama_init_from_model: failed to initialize the context: failed to allocate buffer for kv cache
2026-03-15T15:43:18.5103764Z Error running test 'dist': failed to create context

Comparing the dist sampling test across all the other available tests, I noticed that seq_id = 0 has been used everywhere except test_backend_dist_sampling. Setting seq_id from 189 to 0 seem to have solved the OOM error.

~~cc: @danbev; please let me know if seq_id = 189 was intentional or if there is another approach to solving this :)~~

Edit: PR has been updated to change only kv_unified = true as suggested #20645 (comment).

danbev · 2026-03-16T15:01:47Z

please let me know if seq_id = 189 was intentional or if there is another approach to solving this :)

This was just to use something other than 0 which I used in most other test, and I did not take this into consideration.

ggerganov · 2026-03-16T15:10:46Z

Hm, it should work even with seq_id = 189. Any seq_id up to LLAMA_MAX_SEQ = 256 should work with the unified kv cache. Might be worth to trace down the root cause for this OOM.

ggerganov · 2026-03-16T15:20:39Z

We are actually not enabling the unified KV cache. This patch should fix it:

diff --git a/tests/test-backend-sampler.cpp b/tests/test-backend-sampler.cpp
index d4cd62c71..58361ae80 100644
--- a/tests/test-backend-sampler.cpp
+++ b/tests/test-backend-sampler.cpp
@@ -89,6 +89,7 @@ struct test_context {
         cparams.n_batch = 512;
         cparams.samplers = configs.data();
         cparams.n_samplers = configs.size();
+        cparams.kv_unified = true;
 
         // If n_seq_max is not specified, calculate it from configs
         if (n_seq_max < 0) {

taronaeo · 2026-03-17T17:10:59Z

Sorry for the delay. Thank you, that patch fixed it! I'll update this PR to reflect that change :)

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo · 2026-03-18T06:14:41Z

Merge in a few hours if no further comments :)

…org#20645) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo requested a review from ggerganov as a code owner March 16, 2026 14:51

taronaeo requested a review from danbev March 16, 2026 14:51

danbev approved these changes Mar 16, 2026

View reviewed changes

github-actions bot added the testing Everything test related label Mar 16, 2026

tests: enable kv_unified to prevent cuda oom error on rtx 2060

86a7113

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo force-pushed the feat/self-hosted-ci branch from 9d385f3 to 86a7113 Compare March 17, 2026 17:12

taronaeo changed the title ~~tests: set dist_sampling seq_id to 0~~ tests: enable kv_unified for test-backend-sampler Mar 17, 2026

taronaeo merged commit fe00a84 into ggml-org:master Mar 18, 2026
54 of 56 checks passed

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026

tests: enable kv_unified to prevent cuda oom error on rtx 2060 (ggml-…

e6deecd

…org#20645) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: enable kv_unified for test-backend-sampler#20645

tests: enable kv_unified for test-backend-sampler#20645
taronaeo merged 1 commit intoggml-org:masterfrom
taronaeo:feat/self-hosted-ci

taronaeo commented Mar 16, 2026 •

edited

Loading

Uh oh!

danbev commented Mar 16, 2026

Uh oh!

ggerganov commented Mar 16, 2026

Uh oh!

ggerganov commented Mar 16, 2026

Uh oh!

taronaeo commented Mar 17, 2026

Uh oh!

taronaeo commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

taronaeo commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Mar 16, 2026

Uh oh!

ggerganov commented Mar 16, 2026

Uh oh!

ggerganov commented Mar 16, 2026

Uh oh!

taronaeo commented Mar 17, 2026

Uh oh!

taronaeo commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taronaeo commented Mar 16, 2026 •

edited

Loading