Skip to content

server: fix --cache-ram not preventing RAM OOM#23561

Open
zzhenyao wants to merge 2 commits into
ggml-org:masterfrom
zzhenyao:fix/server-cache-ram-oom
Open

server: fix --cache-ram not preventing RAM OOM#23561
zzhenyao wants to merge 2 commits into
ggml-org:masterfrom
zzhenyao:fix/server-cache-ram-oom

Conversation

@zzhenyao
Copy link
Copy Markdown

@zzhenyao zzhenyao commented May 23, 2026

Overview

Fix RAM OOM crash in server_prompt_cache when saving prompt cache with --cache-ram limit.

A previous fix added pre-allocation checks, but checkpoint data was not included in the cache size calculation, so the real cache size could still exceed --cache-ram.

Also, update() did not account for the pending allocation, and the states.size() > 1 guard skipped eviction when only one entry existed.

Fixed:

  • include checkpoint data in the cache size calculation
  • pass pending allocation size and token count to update() so it evicts until the new entry fits
  • remove states.size() > 1 guard
  • remove catch(bad_alloc) recovery

Additional information

alloc() checked the limit using only KV cache and draft state sizes, and the try block only allocated those. But the cache entry also included checkpoint data, so actual RAM usage could still exceed the configured limit and trigger OOM.

Before fix:

saving prompt with length 146764, total state size = 5332.813 MiB (draft: 307.363 MiB)
oom-kill: ... task=llama-server,pid=92787
Out of memory: Killed process 92787 (llama-server) ...

After fix:

saving prompt with length 72820, total state size = 2721.372 MiB (draft: 152.505 MiB)
single state (2721.372 MiB) exceeds cache limit (360.000 MiB), skipping cache

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, llama.cpp + claude code, used for checking code style

@zzhenyao zzhenyao requested a review from a team as a code owner May 23, 2026 04:18
@zzhenyao zzhenyao changed the title server : fix OOM crash in prompt cache by checking size limit before allocation server: fix --cache-ram not preventing RAM OOM May 24, 2026
@zzhenyao
Copy link
Copy Markdown
Author

@aldehir @ggerganov could you please take a look when convenient?
Possibly related: #22925, #22629, #21690

Follow-up: if --cache-ram is supposed to include checkpoint too, then this fix should stop prompt cache from exceeding the limit and causing RAM OOM. But that also means the default value may now be too small. Should it be adjusted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant