Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate problems with VLLM running on windows #151

Closed
nieznanysprawiciel opened this issue Jun 13, 2024 · 1 comment
Closed

Investigate problems with VLLM running on windows #151

nieznanysprawiciel opened this issue Jun 13, 2024 · 1 comment
Assignees

Comments

@nieznanysprawiciel
Copy link
Contributor

nieznanysprawiciel commented Jun 13, 2024

Investigate potential problems and list possible approaches.

@nieznanysprawiciel nieznanysprawiciel self-assigned this Jun 13, 2024
@nieznanysprawiciel
Copy link
Contributor Author

nieznanysprawiciel commented Jun 13, 2024

Problems:

  1. Problems seems to boil down to lack of windows build of triton dependency
    I don't see any indicator that there is any fundamental problem. Dev team closed PRs claiming they have no capacity to maintain windows builds (source).
    It seems that code worked for people engaged in this PR, at least a few months ago.
  2. There are additional checks in code which needs to be disabled (at least vllm, but might be in dependencies as well)
    Example here but might be more.
  3. Pytorch is installed in cpu version by default. Must be manually substituted for cuda version (source)
  4. It can turn out in the process that other dependencies need our attention as well
    Other suspicious dependencies (claimed here):
  5. We can't be sure that we won't encounter any runtime problems after creating custom builds
  6. There seems to be small performance penalty on WSL (source)
  7. Custom builds of triton can have performance penalty (source)

Options:

  1. Try to use one of triton unofficial builds for windows done by community
    There are few options available:
  2. Maintain fork of triton and build packages ourselves
    There were some successful attempts to do this:
  3. Delegate preparing vllm for windows to external company
  4. Distribute vllm as optional package with GamerHash requiring WSL (user will be asked and warned that installation requires privileges)
  5. Get rid of torch.compile from vllm code
    torch.compile seems to be code optimization to run faster on cuda. Maybe it is possible to omit this step and accept suffering performance penalty. (Note: it is not the same as running on cpu. Just running not optimized cuda code)
    @pwalski managed to run whisper (which depends on pytorch) without having problems with triton nor pytorch. That could means, that triton is not really necessary for pytorch.
  6. Give up integrating vllm :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant