Skip to content

Random purging of the one single slot in use... #17196

@whoreson

Description

@whoreson

Name and Version

86fde91

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

./llama-server --grammar-file fuck-emdashes.gbnf -ngl 99 --host 0.0.0.0 -c 190000 -m Qwen3-4B-Instruct-2507-Q8_0.gguf  -fa auto -cram 0 --slots --slot-save-path kv/qwen3-4b --no-mmap

Problem description & steps to reproduce

slot update_slots: id  1 | task 9700 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.033170
decode: failed to find a memory slot for batch of size 2048
srv  try_purge_id: purging slot 2 with 1724 tokens
srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 2048, ret = 1
decode: failed to find a memory slot for batch of size 2048
srv  try_purge_id: purging slot 3 with 184145 tokens
srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 2048, ret = 1
slot update_slots: id  1 | task 9700 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id  1 | task 9700 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, prog

There isn't any other slots in use, the code randomly decides after 4-5 requests that it's out of space, and nukes the slot's cached content.

Could you please not do that. Best regards.

Also, might as well revert the RAM caching misfeature, tmpfs (or other ramdrives) exists, and llama.cpp can't detect the total memory size anyways -> OOM killed every time.

#16736

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions