Kv_unified, n_seq_max #17421

RedDragonGecko · 2025-11-21T03:22:11Z

RedDragonGecko
Nov 21, 2025

Okay, preamble.
I'm been running GLM 4.6 (UD-Q5_K_XL) since its release. I hadn't updated to a new build of llama.cpp since.
I recently had a long conversation going, previously having capped my context at 32k, I hit that limit. So I expanded my context to 64k. I also removed "-ctk q8_0 -ctv q8_0" to see how that would affect performance.
With this loaded I got a couple messages further when it started to repeat a single character. ????????????? forever. Changing Top K would change that character ////////////// for instance. But nothing I tried would stop it.

So my first thought was to try a different quant of the model. Loaded up the messages and exact same behavior. Second thought was to upgrade to the latest build of llama.cpp

Cannot load, runs out of vram with the same settings. I looked at what was happening and discovered;
llama_context: constructing llama_context
llama_context: n_seq_max = 4
llama_context: n_ctx = 64000
llama_context: n_ctx_seq = 64000
llama_context: n_batch = 4096
llama_context: n_ubatch = 4096
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = true
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (64000) < n_ctx_train (202752) -- the full capacity of the model will not be utilized
llama_context: CUDA_Host output buffer size = 2.31 MiB
llama_kv_cache: CUDA0 KV buffer size = 9000.00 MiB
llama_kv_cache: CUDA1 KV buffer size = 4250.00 MiB
llama_kv_cache: CUDA2 KV buffer size = 4250.00 MiB
llama_kv_cache: CUDA3 KV buffer size = 4250.00 MiB
llama_kv_cache: CUDA4 KV buffer size = 1250.00 MiB
llama_kv_cache: size = 23000.00 MiB ( 64000 cells, 92 layers, 4/1 seqs), K (f16): 11500.00 MiB, V (f16): 11500.00 MiB

Now, I'm not sure what they do but I haven't found a way to turn them back off to test if they are causing the increase to vram usage. Looking at my older version load they are set to 1 and false respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kv_unified, n_seq_max #17421

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Kv_unified, n_seq_max #17421

Uh oh!

Uh oh!

RedDragonGecko Nov 21, 2025

Replies: 0 comments

RedDragonGecko
Nov 21, 2025