Skip to content

[deps][llm] Upgrade to vLLM 0.16.0#61389

Merged
kouroshHakha merged 3 commits intomasterfrom
vllm-0.16.0
Mar 4, 2026
Merged

[deps][llm] Upgrade to vLLM 0.16.0#61389
kouroshHakha merged 3 commits intomasterfrom
vllm-0.16.0

Conversation

@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor

@jeffreywang-anyscale jeffreywang-anyscale commented Feb 27, 2026

Description

Upgrade to vLLM 0.16.0 and adapt to breaking API changes in init_app_state and truncate_prompt_tokens.

Breaking change 1 (Figured out by Claude)

vLLM 0.16.0's init_app_state no longer unconditionally initializes all serving endpoints. Instead, it expects a supported_tasks tuple to decide which serving objects. Without it, vLLM falls back to ('generate',) only.

Fix: query supported_tasks from engine_client.get_supported_tasks() (which inspects the model's actual capabilities) and pass it to init_app_state, matching what vLLM's own API server entrypoint does.

Breaking change 2

truncate_prompt_tokens is deprecated in pooling/embedding.

Fix

  • Add tokenization_kwargs as a new optional input column for embedding/pooling tasks, passed through to engine.encode().
  • Add backward-compatible shim: if users still pass truncate_prompt_tokens in pooling_params, it is automatically converted to the equivalent tokenization_kwargs with a deprecation warning. truncate_prompt_tokens: -1 is resolved to max_model_len.
  • Update tests to cover both the new tokenization_kwargs path and the legacy truncate_prompt_tokens compatibility path.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Feb 27, 2026
@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor Author

jeffreywang-anyscale commented Feb 27, 2026

First pass with CC.

mkdir -p ~/.claude/skills/vllm-upgrade
touch ~/.claude/skills/vllm-upgrade/SKILL.md

Running through tests now to see if it misses anything.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the vLLM dependency to version 0.16.0. The changes are primarily focused on updating dependency files, including Dockerfiles, requirements files, and numerous lock files. The code has also been adapted to the new vLLM version by updating import paths and removing workarounds for issues that have been resolved in the new release. All changes appear to be correct and consistent with the goal of upgrading vLLM.

Comment thread python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue llm labels Feb 28, 2026
# vLLM 0.12.0 ignores truncate_prompt_tokens in the pooling_params.
# TODO (jeffreywang): Remove the following line once
# https://github.com/vllm-project/vllm/issues/31012 is fixed.
truncate_prompt_tokens=request.params.truncate_prompt_tokens,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI @jeffreywang-anyscale validate the behavior of truncate_prompt_tokens.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@kouroshHakha kouroshHakha merged commit 2aa11b9 into master Mar 4, 2026
6 checks passed
@kouroshHakha kouroshHakha deleted the vllm-0.16.0 branch March 4, 2026 03:13
bittoby pushed a commit to bittoby/ray that referenced this pull request Mar 6, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: bittoby <bittoby@users.noreply.github.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Mar 13, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants