[deps][llm] Upgrade to vLLM 0.16.0#61389
Merged
kouroshHakha merged 3 commits intomasterfrom Mar 4, 2026
Merged
Conversation
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Contributor
Author
|
First pass with CC. Running through tests now to see if it misses anything. |
Contributor
There was a problem hiding this comment.
Code Review
This pull request upgrades the vLLM dependency to version 0.16.0. The changes are primarily focused on updating dependency files, including Dockerfiles, requirements files, and numerous lock files. The code has also been adapted to the new vLLM version by updating import paths and removing workarounds for issues that have been resolved in the new release. All changes appear to be correct and consistent with the goal of upgrading vLLM.
aslonnie
approved these changes
Feb 28, 2026
| # vLLM 0.12.0 ignores truncate_prompt_tokens in the pooling_params. | ||
| # TODO (jeffreywang): Remove the following line once | ||
| # https://github.com/vllm-project/vllm/issues/31012 is fixed. | ||
| truncate_prompt_tokens=request.params.truncate_prompt_tokens, |
Contributor
Author
There was a problem hiding this comment.
AI @jeffreywang-anyscale validate the behavior of truncate_prompt_tokens.
kouroshHakha
approved these changes
Mar 3, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
kouroshHakha
approved these changes
Mar 4, 2026
bittoby
pushed a commit
to bittoby/ray
that referenced
this pull request
Mar 6, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: bittoby <bittoby@users.noreply.github.com>
ryanaoleary
pushed a commit
to ryanaoleary/ray
that referenced
this pull request
Mar 13, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Upgrade to vLLM 0.16.0 and adapt to breaking API changes in
init_app_stateandtruncate_prompt_tokens.Breaking change 1 (Figured out by Claude)
vLLM 0.16.0's
init_app_stateno longer unconditionally initializes all serving endpoints. Instead, it expects a supported_tasks tuple to decide which serving objects. Without it, vLLM falls back to ('generate',) only.Fix: query supported_tasks from engine_client.get_supported_tasks() (which inspects the model's actual capabilities) and pass it to
init_app_state, matching what vLLM's own API server entrypoint does.Breaking change 2
truncate_prompt_tokensis deprecated in pooling/embedding.Fix
tokenization_kwargsas a new optional input column for embedding/pooling tasks, passed through toengine.encode().truncate_prompt_tokensinpooling_params, it is automatically converted to the equivalenttokenization_kwargswith a deprecation warning.truncate_prompt_tokens: -1is resolved tomax_model_len.tokenization_kwargspath and the legacytruncate_prompt_tokenscompatibility path.Related issues
Additional information