-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Limit Request: vllm - 400 MiB #3792
Comments
bump up 👀 |
1 similar comment
bump up 👀 |
+1, it would be great to have this! |
From README.md
Project maintainers are having to limit or cut architecture/GPU/format support in order to fit <100mb: |
Kindly cc @cmaureir for visibility. vLLM is the most popular open-source LLM serving engine in the world right now. Having a larger package limit can help us support more different types of hardware, and help democratize LLMs to the vast majority of developers. |
Hi @cmaureir, I'm also a maintainer of vLLM. We do make our best effort to keep the binary size small, but it's increasingly difficult to meet the current limit since vLLM is rapidly growing with new features and optimizations that require new GPU kernels (binaries). Increasing the limit would be very helpful for the development of vLLM. |
Hello @youkaichao 👋 Additionally, I see you have one package per-python version, which heavily increases the release total size, I recommend you to look into the Python Limited API in order to provide one-wheel per platform. https://docs.python.org/3/c-api/stable.html Have a nice rest of the week 🚀 |
@cmaureir thanks for your support! We will try to see if we can build just one wheel for all python versions. |
@cmaureir is it possible to build one wheel for all supported python version, when we have extensions? I find the wheel name always contains python version. Not sure how to build a Python-agnostic wheel. |
I did a quick investigation: To use Python Limited API in order to provide one-wheel per platform:
I tried, however, since we use pybind11, which does not support Python Limited API (c.f. pybind/pybind11#1755 ), we have to build one wheel for each Python version. Sorry for the trouble :( |
Hi @cmaureir, I would like to inquire the current total usage of vLLM packages and whether we can increase the project limit of 10GB. We have made quite some progress over the last few months. We are finally releasing version agnostic wheels. |
Project URL
https://pypi.org/project/vllm/
Does this project already exist?
New Limit
400
Update issue title
Which indexes
PyPI
About the project
vLLM is a fast and easy-to-use library for LLM inference and serving.
It plans to ship
nvidia-nccl-cu12==2.18.3
within the package.Reasons for the request
We identified
nccl>=2.19
with a bug that largely increased GPU memory overhead, so we have to pin and shipnccl
versions ourselves.We cannot use
pip install nvidia-nccl-cu12==2.18.3
because we depend ontorch
, which has binary dependency withpip install nvidia-nccl-cu12==2.19.5
. So we are in a dependency hell, and we have to keep a nccl library ourselves.vllm is a popular library for LLM inference, and it is used by many tech companies. Shipping
nccl
withvllm
can increase its throughput and the quality of LLM serving. However, the downside is that the package wheel will become much larger. So we have to come here for support, to ask for a larger file size limit.Code of Conduct
The text was updated successfully, but these errors were encountered: