Tensor-parallelism for multi-gpu support #213

SalomonKisters · 2024-04-29T21:03:23Z

Feature request

Being able to split models into multiple GPUs, as with vllm/aphrodite engine for LLMs.

Motivation

It would be extremely helpful to be able to split larger models into multiple GPUs.
Also, without TP, one GPU loses lots of vram and the other does not, making it impossible to use tensor parallelism on another program at the same time. (Without losing as much VRAM on the non-utilized GPU)

Your contribution

communicating the feature

michaelfeil · 2024-05-14T16:03:53Z

You typically do data-parallel style inference on sentence-transformers. TP is used when one GPU can't handle the desired batch size or the model at all. Unless there are some compelling benchmarks for bert-base, there is no need for tensor parallelism.

michaelfeil added the wontfix This will not be worked on label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor-parallelism for multi-gpu support #213

Tensor-parallelism for multi-gpu support #213

SalomonKisters commented Apr 29, 2024

michaelfeil commented May 14, 2024

Tensor-parallelism for multi-gpu support #213

Tensor-parallelism for multi-gpu support #213

Comments

SalomonKisters commented Apr 29, 2024

Feature request

Motivation

Your contribution

michaelfeil commented May 14, 2024