-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Project URL
https://pypi.org/project/vllm/
Does this project already exist?
- Yes
New Limit
400
Update issue title
- I have updated the title.
Which indexes
PyPI
About the project
vLLM is a fast and easy-to-use library for LLM inference and serving.
It plans to ship nvidia-nccl-cu12==2.18.3
within the package.
Reasons for the request
We identified nccl>=2.19
with a bug that largely increased GPU memory overhead, so we have to pin and ship nccl
versions ourselves.
We cannot use pip install nvidia-nccl-cu12==2.18.3
because we depend on torch
, which has binary dependency with pip install nvidia-nccl-cu12==2.19.5
. So we are in a dependency hell, and we have to keep a nccl library ourselves.
vllm is a popular library for LLM inference, and it is used by many tech companies. Shipping nccl
with vllm
can increase its throughput and the quality of LLM serving. However, the downside is that the package wheel will become much larger. So we have to come here for support, to ask for a larger file size limit.
Code of Conduct
- I agree to follow the PSF Code of Conduct