Skip to content
This repository was archived by the owner on May 27, 2026. It is now read-only.

update readme for vLLM 0.17.0 release on Intel GPU#971

Merged
jitendra42 merged 3 commits into
intel:mainfrom
yma11:017
Mar 26, 2026
Merged

update readme for vLLM 0.17.0 release on Intel GPU#971
jitendra42 merged 3 commits into
intel:mainfrom
yma11:017

Conversation

@yma11
Copy link
Copy Markdown
Contributor

@yma11 yma11 commented Mar 25, 2026

Description

release note for v0.17.0 release.

Related Issue

Changes Made

  • The code follows the project's coding standards.
  • No Intel Internal IP is present within the changes.
  • The documentation has been updated to reflect any changes in functionality.

Validation

  • I have tested any changes in container groups locally with test_runner.py with all existing tests passing, and I have added new tests where applicable.

Signed-off-by: Yan Ma <yan.ma@intel.com>
@yma11
Copy link
Copy Markdown
Contributor Author

yma11 commented Mar 25, 2026

@rogerxfeng8 please take a look. Thanks.

Comment thread vllm/0.17.0-xpu.md Outdated
| KMD Driver | 6.14.0 |
| oneAPI | 2025.3.2.4 with hotfix |
| PyTorch | 2.10 |
| vllm-xpu-kernels | 0.14.0 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.1.4

Comment thread vllm/0.17.0-xpu.md

* **torch.compile**: Can be enabled for the FP16/BF16 path.
* **speculative decoding**: Supports methods `n-gram`, `EAGLE`, `EAGLE3`, `medusa` and `suffix`. For detailed usage, refer [document](https://docs.vllm.ai/en/stable/features/speculative_decoding/).
* **async scheduling**: Can be enabled by `--async-scheduling`. This may help reduce the CPU overheads, leading to better latency and throughput.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async scheduling is not supported in this release.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disabled by default but user can explicitly set it. It doesn't fail in all cases so I think we can call it experimental.

Signed-off-by: Yan Ma <yan.ma@intel.com>
Comment thread vllm/0.17.0-xpu.md Outdated
In addition, features such as [reasoning_outputs](https://docs.vllm.ai/en/latest/features/reasoning_outputs.html), [structured_outputs](https://docs.vllm.ai/en/latest/features/structured_outputs.html), and [tool calling](https://docs.vllm.ai/en/latest/features/tool_calling.html) are supported. The following experimental features are also available:

* **torch.compile**: Can be enabled for the FP16/BF16 path.
* **speculative decoding**: Supports methods `n-gram`, `EAGLE`, `EAGLE3`, `medusa` and `suffix`. For detailed usage, refer [document](https://docs.vllm.ai/en/stable/features/speculative_decoding/).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer -> refer to

Signed-off-by: Yan Ma <yan.ma@intel.com>
@jitendra42 jitendra42 merged commit 573910e into intel:main Mar 26, 2026
7 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants