-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to enable vllm #536
Comments
Use 0.9 and a supported model |
@OlivierDehaene Hi, don' know if you are stuff in huggingface or not,
|
Please read your initial "Feature request" and tell me you did the effort to actually express intelligibly a feature request ? There are many ways you could have phrased that, starting with a request to improve the docs because you couldn't find what you want. What is the actual question you had, where did you look for it, and what did you find instead. Also filling in properly the template instead of repeating the same thing over. Our effort in replying scale with your effort.
There's no doc for it because well, it's not necessary, if you use flash models, you just get it. It's also NOT vllm. It's a custom variant of PagedAttention, which is what makes vllm faster. We do reuse a slightly modified version of their low level kernel. |
@Narsil Hi, where does PagedAttention custom kernel included in? |
Feature request
How to enable vllm
Motivation
How to enable vllm
Your contribution
How to enable vllm
The text was updated successfully, but these errors were encountered: