Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor-parallelism for multi-gpu support #213

Open
SalomonKisters opened this issue Apr 29, 2024 · 1 comment
Open

Tensor-parallelism for multi-gpu support #213

SalomonKisters opened this issue Apr 29, 2024 · 1 comment
Labels
wontfix This will not be worked on

Comments

@SalomonKisters
Copy link

Feature request

Being able to split models into multiple GPUs, as with vllm/aphrodite engine for LLMs.

Motivation

It would be extremely helpful to be able to split larger models into multiple GPUs.
Also, without TP, one GPU loses lots of vram and the other does not, making it impossible to use tensor parallelism on another program at the same time. (Without losing as much VRAM on the non-utilized GPU)

Your contribution

communicating the feature

@michaelfeil
Copy link
Owner

You typically do data-parallel style inference on sentence-transformers. TP is used when one GPU can't handle the desired batch size or the model at all. Unless there are some compelling benchmarks for bert-base, there is no need for tensor parallelism.

@michaelfeil michaelfeil added the wontfix This will not be worked on label May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants