[FEATURE] add support for multigpu, splitting model across gpus without using deepspeed/fsdp #710

Quetzalcohuatl · 2024-05-16T16:17:49Z

🚀 Feature

Currently distributed.sh, disable zero3 and disable fsdp, the vram is quite a lot higher than using accelerate+SFTTrainer natively. I believe it is because each gpu is receiving a model copy on its own (which is why each gpu is always at like 99% vram, as opposed to native accelerate+SFTTrainer where you can visually see n-1 gpus at 0% while one of them is at 100%).

The feature would be an option to run a multigpu training script in this kind of manner. This would lower the vram required.

for example

Llmstudio 512 seqlen mistral with flashatt2 and int4 quant, batchsize 1, lora=true: each gpu has approx 10.6gb vram occupied

Sfttrainer 512 seqlen mistral with flashatt2 and nf4 quantization with bnb, batchsize 1, the same LoraConfig, device map is auto. Three gpus have 2.4gb vram, last gpu has 3.2gb vram. So total vram usage is like 10.4gb vram.

Motivation

This would reduce the vram requirements in a multigpu setup.

psinger · 2024-05-17T12:58:03Z

Hi @Quetzalcohuatl -

I am not sure I really understand the question. If you disable deepspeed you will not do sharding, but DDP. And DDP will always have full copies of the weights on each GPU.

So if you want sharding, we support deepspeed already.

Quetzalcohuatl · 2024-05-18T12:27:27Z

Hi @psinger

deepspeed does not support nf4 quantization so the vram requirements when sharding will be higher there. It would be nice to shard without using deepspeed. Native accelerate/transformers can do that, by using device_map=auto, or specifying a device_map. It would be good to have that ability to support the quantization use case.

pascal-pfeiffer · 2024-05-19T08:44:33Z

Accelerate wraps FSDP, deepspeed, DDP and so on. So this is probably a duplicate of #631.

Rewriting to use accelerate could be an option. Last time we did that, we ran into issues as it isn't fully customizable, but might be good to reconsider again.

Closing as duplicate.

Quetzalcohuatl added the type/feature Feature request label May 16, 2024

pascal-pfeiffer closed this as not planned Won't fix, can't repro, duplicate, stale May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] add support for multigpu, splitting model across gpus without using deepspeed/fsdp #710

[FEATURE] add support for multigpu, splitting model across gpus without using deepspeed/fsdp #710

Quetzalcohuatl commented May 16, 2024

psinger commented May 17, 2024

Quetzalcohuatl commented May 18, 2024

pascal-pfeiffer commented May 19, 2024

[FEATURE] add support for multigpu, splitting model across gpus without using deepspeed/fsdp #710

[FEATURE] add support for multigpu, splitting model across gpus without using deepspeed/fsdp #710

Comments

Quetzalcohuatl commented May 16, 2024

🚀 Feature

Motivation

psinger commented May 17, 2024

Quetzalcohuatl commented May 18, 2024

pascal-pfeiffer commented May 19, 2024