Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Feb 25, 2025

Updates the requirements on vllm to permit the latest version.

Release notes

Sourced from vllm's releases.

v0.7.3

Highlights

🎉 253 commits from 93 contributors, including 29 new contributors!

  • Deepseek enhancements:
    • Support for DeepSeek Multi-Token Prediction, 1.69x speedup in low QPS scenarios (#12755)
    • AMD support: DeepSeek tunings, yielding 17% latency reduction (#13199)
    • Using FlashAttention3 for MLA (#12807)
    • Align the expert selection code path with official implementation (#13474)
    • Optimize moe_align_block_size for deepseek_v3 (#12850)
    • Expand MLA to support most types of quantization (#13181)
  • V1 Engine:
    • LoRA Support (#10957, #12883)
    • Logprobs and prompt logprobs support (#9880), min_p sampling support (#13191), logit_bias in v1 Sampler (#13079)
    • Use msgpack for core request serialization (#12918)
    • Pipeline parallelism support (#12996, #13353, #13472, #13417, #13315)
    • Metrics enhancements: GPU prefix cache hit rate % gauge (#12592), iteration_tokens_total histogram (#13288), several request timing histograms (#12644)
    • Initial speculative decoding support with ngrams (#12193, #13365)

Model Support

  • Enhancement to Qwen2.5-VL: BNB support (#12944), LoRA (#13261), Optimizations (#13155)
  • Support GPTQModel Dynamic [2,3,4,8]bit GPTQ quantization (#7086)
  • Support Unsloth Dynamic 4bit BnB quantization (#12974)
  • IBM/NASA Prithvi Geospatial model (#12830)
  • Support Mamba2 (Codestral Mamba) (#9292), Bamba Model (#10909)
  • Ultravox Model: Support v0.5 Release (#12912)
  • transformers backend
    • Enable quantization support for transformers backend (#12960)
    • Set torch_dtype in TransformersModel (#13088)
  • VLM:
    • Implement merged multimodal processor for Mllama (#11427), GLM4V (#12449), Molmo (#12966)
    • Separate text-only and vision variants of the same model architecture (#13157)

Hardware Support

  • Pluggable platform-specific scheduler (#13161)
  • NVIDIA: Support nvfp4 quantization (#12784)
  • AMD:
    • Per-Token-Activation Per-Channel-Weight FP8 (#12501)
    • Tuning for Mixtral on MI325 and Qwen MoE on MI300 (#13503), Mixtral8x7B on MI300 (#13577)
    • Add intial ROCm support to V1 (#12790)
  • TPU: V1 Support (#13049)
  • Neuron: Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)
  • Gaudi:
    • Support Contiguous Cache Fetch (#12139)
    • Enable long-contexts + LoRA support (#12812)

Engine Feature

  • Add sleep and wake up endpoint and v1 support (#12987)
  • Add /v1/audio/transcriptions OpenAI API endpoint (#12909)

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Feb 25, 2025
@dependabot dependabot bot force-pushed the dependabot/pip/vllm-lte-0.7.3 branch from ef8de4d to fa8ef4b Compare February 25, 2025 03:10
Updates the requirements on [vllm](https://github.com/vllm-project/vllm) to permit the latest version.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Commits](vllm-project/vllm@v0.1.0...v0.7.3)

---
updated-dependencies:
- dependency-name: vllm
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/vllm-lte-0.7.3 branch from fa8ef4b to 1baac1d Compare March 2, 2025 08:28
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Mar 31, 2025

Superseded by #9.

@dependabot dependabot bot closed this Mar 31, 2025
@dependabot dependabot bot deleted the dependabot/pip/vllm-lte-0.7.3 branch March 31, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant