Skip to content

server : fix swa-full logic#22288

Merged
ggerganov merged 1 commit intomasterfrom
gg/server-fix-n-swa
Apr 24, 2026
Merged

server : fix swa-full logic#22288
ggerganov merged 1 commit intomasterfrom
gg/server-fix-n-swa

Conversation

@ggerganov
Copy link
Copy Markdown
Member

Overview

fix #21468
alt #21749

Simplify the logic by augmenting llama_model_n_swa with a server_context.n_swa member. When --swa-full is passed, we set n_swa = 0 to simulate a non-SWA model

Requirements

@shipped-it
Copy link
Copy Markdown

I'm a bit unsure about the approach. Why would the model be reported as non-SWA? It is still SWA, just with a full-size cache.

Also, you've made the suggestion to use --swa-full --no-mmproj. Correct me if I'm wrong, but in practice I've seen repetitions issues when using Gemma 4.

I will confirm shortly if this fixes the issue, or if I get the repetition problem.

@ggerganov
Copy link
Copy Markdown
Member Author

Also, you've made the suggestion to use --swa-full --no-mmproj.

Cache reuse does not work with mmproj.

Correct me if I'm wrong, but in practice I've seen repetitions issues when using Gemma 4.

I am not following. This fixes the cache reuse logic - I am not aware of any repetitions.

@shipped-it
Copy link
Copy Markdown

I confirm that it is fixed with this PR or #21749 (tested on ROCm)

Without PR:
warm req: prompt_n=821, prompt_ms=982

With PR:
warm req: prompt_n=5, prompt_ms=71 so about 13x faster

@ggerganov ggerganov merged commit ffdd983 into master Apr 24, 2026
42 of 47 checks passed
@ggerganov ggerganov deleted the gg/server-fix-n-swa branch April 24, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cache reuse is not supported for Gemma 4 models despite -fa enabled and --swa-full

2 participants