You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an EC2 instance with 4 GPUs and each GPU has 16GB memory.
By default, hugging face uses Data Parallelism and tries to fit one copy of the model in each GPU. Of course, 16GB is not enough and I ran into CUDA Out of Memory error.
I tried to use Tensor Parallelism and Pipeline Parallelism through Megatron-LM . For example, with Pipeline Parallel Degree of 4 using Megatron-LM, it is supposed to split the model over 4 GPUs and totally 64 GB GPU memory. However, I still ran into CUDA Out of Memory error.
I also tried to use ZeRO Stage 2 or 3 through DeepSpeed. However, I still ran into CUDA Out of Memory error.
In both cases, I used hugging face accelerate to configure Megatron-LM or DeepSpeed.
I wonder if the diffuser repo works out of box with Megatron-LM and DeepSpeed. Or, do I need to make code change to make it work with Tensor/Pipeline parallelism?