Skip to content

[Fix] Resolve occasional infinite loops during vLLM inference#228

Merged
MSLDCherryPick merged 2 commits intomicrosoft:mainfrom
Damon-Salvetore:vllm-1
Feb 3, 2026
Merged

[Fix] Resolve occasional infinite loops during vLLM inference#228
MSLDCherryPick merged 2 commits intomicrosoft:mainfrom
Damon-Salvetore:vllm-1

Conversation

@MSLDCherryPick
Copy link
Copy Markdown
Contributor

We add repetition detection to the inference process. When toxic repetition is detected, the inference temperature is increased to mitigate the repetition issue. The already generated part of the sequence is moved into the prefilling stage, so the overall impact on inference latency is minimal.

@MSLDCherryPick MSLDCherryPick changed the title Vllm 1 [Fix] Resolve occasional infinite loops during vLLM inference Feb 3, 2026
@MSLDCherryPick MSLDCherryPick merged commit e16491d into microsoft:main Feb 3, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants