Skip to content

tests: enable kv_unified for test-backend-sampler#20645

Merged
taronaeo merged 1 commit intoggml-org:masterfrom
taronaeo:feat/self-hosted-ci
Mar 18, 2026
Merged

tests: enable kv_unified for test-backend-sampler#20645
taronaeo merged 1 commit intoggml-org:masterfrom
taronaeo:feat/self-hosted-ci

Conversation

@taronaeo
Copy link
Copy Markdown
Member

@taronaeo taronaeo commented Mar 16, 2026

While running tests to ensure that my local server is ready to be onboard as a GGML CI runner, I ran into an odd CUDA OOM error for the dist sampling test as shown:

2026-03-15T15:43:18.5103166Z ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5320.00 MiB on device 0: cudaMalloc failed: out of memory
2026-03-15T15:43:18.5103360Z alloc_tensor_range: failed to allocate CUDA0 buffer of size 5578424320
2026-03-15T15:43:18.5103633Z llama_init_from_model: failed to initialize the context: failed to allocate buffer for kv cache
2026-03-15T15:43:18.5103764Z Error running test 'dist': failed to create context

Comparing the dist sampling test across all the other available tests, I noticed that seq_id = 0 has been used everywhere except test_backend_dist_sampling. Setting seq_id from 189 to 0 seem to have solved the OOM error.

cc: @danbev; please let me know if seq_id = 189 was intentional or if there is another approach to solving this :)

Edit: PR has been updated to change only kv_unified = true as suggested #20645 (comment).

@taronaeo taronaeo requested a review from ggerganov as a code owner March 16, 2026 14:51
@taronaeo taronaeo requested a review from danbev March 16, 2026 14:51
@danbev
Copy link
Copy Markdown
Member

danbev commented Mar 16, 2026

please let me know if seq_id = 189 was intentional or if there is another approach to solving this :)

This was just to use something other than 0 which I used in most other test, and I did not take this into consideration.

@ggerganov
Copy link
Copy Markdown
Member

Hm, it should work even with seq_id = 189. Any seq_id up to LLAMA_MAX_SEQ = 256 should work with the unified kv cache. Might be worth to trace down the root cause for this OOM.

@ggerganov
Copy link
Copy Markdown
Member

We are actually not enabling the unified KV cache. This patch should fix it:

diff --git a/tests/test-backend-sampler.cpp b/tests/test-backend-sampler.cpp
index d4cd62c71..58361ae80 100644
--- a/tests/test-backend-sampler.cpp
+++ b/tests/test-backend-sampler.cpp
@@ -89,6 +89,7 @@ struct test_context {
         cparams.n_batch = 512;
         cparams.samplers = configs.data();
         cparams.n_samplers = configs.size();
+        cparams.kv_unified = true;
 
         // If n_seq_max is not specified, calculate it from configs
         if (n_seq_max < 0) {

@github-actions github-actions bot added the testing Everything test related label Mar 16, 2026
@taronaeo
Copy link
Copy Markdown
Member Author

Sorry for the delay. Thank you, that patch fixed it! I'll update this PR to reflect that change :)

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
@taronaeo taronaeo force-pushed the feat/self-hosted-ci branch from 9d385f3 to 86a7113 Compare March 17, 2026 17:12
@taronaeo taronaeo changed the title tests: set dist_sampling seq_id to 0 tests: enable kv_unified for test-backend-sampler Mar 17, 2026
@taronaeo
Copy link
Copy Markdown
Member Author

Merge in a few hours if no further comments :)

@taronaeo taronaeo merged commit fe00a84 into ggml-org:master Mar 18, 2026
54 of 56 checks passed
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants