Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 14, 2025

fix #17260

  • If we overrun the total context size, clear the active slots
  • Rename purge -> clear
  • Remove obsolete kv_cache_clear()

@ggerganov ggerganov force-pushed the gg/server-fix-decode-error-handling branch from 82eb17b to 741baaf Compare November 14, 2025 12:04
@ggerganov ggerganov marked this pull request as ready for review November 14, 2025 12:05
@ggerganov ggerganov requested a review from ngxson as a code owner November 14, 2025 12:05
@ggerganov ggerganov merged commit 5b2093b into master Nov 16, 2025
72 checks passed
@ggerganov ggerganov deleted the gg/server-fix-decode-error-handling branch November 16, 2025 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Recent updates lead to /infill requests on the Qwen2.5-Coder model failing and ultimately crashing.

3 participants