Make all Transformer models compatible with model parallelism #22561

sgugger · 2023-04-04T13:58:00Z

muaid-mughrabi · 2023-04-04T14:45:43Z

I think I can help with this Issue :)

iamarunbrahma · 2023-04-04T19:44:54Z

I would like to work on this issue - BART model :)

kausmeows · 2023-04-04T19:47:38Z

Hi, I can take this up 🙌🏻

zsc · 2023-04-05T04:38:09Z

Indeed, this fix is required for BLOOM. main...zsc:transformers:main (my fix is hacky and not PR-ready. Just FYI)

TerryCM · 2023-04-05T06:02:51Z

Just to make sure does LlamaForCausalLM supports this feature already?(#22546 ) it seems that, still there are some errors when using device_map="auto" for this task.

mollerup23 · 2023-04-05T14:35:22Z

Hi, I'd like to pick up the GPT-2 model!

xssChauhan · 2023-04-05T21:31:44Z

Hi! I am taking this up for LlamaForSequenceClassification.

kooshi · 2023-04-06T17:37:54Z

Just to make sure does LlamaForCausalLM supports this feature already?(#22546 ) it seems that, still there are some errors when using device_map="auto" for this task.

It does (#22329). I have started seeing similar errors to #22546, but only after updating my drivers from 525 to 530, similar to #22546 (comment)

(which is good news to me, I had no idea why that gpu started disappearing occasionally. It seems it can happen when that gpu is under any load, not just during training)

Edit: seems like the errors I was getting were actually caused by GPU sag. I haven't yet reproduced that exact error, but it has been reported elsewhere. It is certainly not consistent though.

innat · 2023-04-07T09:49:27Z

@younesbelkada @sgugger
Does this fix (moving label/logit to same device) supposed to work (model parallelism) for all models (listed above)? Or, a crucial step toward it? Also, this design fix is only for pytorch model and not for jax or tf?

younesbelkada · 2023-04-07T10:32:12Z

I think it is supposed to work for all models listed above, as long as you are loading your model with device_map=xxx. And yes this should be for Pytorch only, though I am not really aware of how model parallelism work on TF & Jax

innat · 2023-04-07T10:43:15Z

I think it is supposed to work for all models listed above, as long as you are loading your model with device_map=xxx

I tried with such fix here #22591 (comment) but sadly it didn't work out. Any catch?

innat · 2023-04-08T02:12:11Z

@sgugger
As the goal of this ticket is to enable model parallelism with easy fix, have the merged PR(s) checked on multi-gpu? I couldn't find any test script here #22663 regarding that .

shahad-mahmud · 2023-04-08T11:12:21Z

I would love to work with BridgeTower

trantuantdt · 2023-04-09T13:55:54Z

Hi. I would like to try with "Whisper"

mollerup23 · 2023-04-10T14:44:52Z

I'd like to claim OPT model if no one else has picked it up.

* add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: #22561) * fix

mayankagarwals · 2023-04-11T03:50:50Z

Taking this up for the remaining GPT models

jprivera44 · 2023-04-11T22:17:50Z

Hello, I just completed the GPT-J code. Just filling in the PR now.

oscar-garzon · 2023-04-14T01:17:15Z

Hello! I'd like to work in Whisper model

abhigyan631 · 2023-04-14T07:49:50Z

Hi, is there any model on which I can work, please? Thanks.

Tanmaypatil123 · 2023-04-17T15:05:01Z

Is there any remaining model on which I can work ? Thanks .

JuheonChu · 2023-04-18T03:30:04Z

@sgugger Hello, can I work on the JukeBox?

elabongaatuo · 2023-04-18T13:29:18Z

Hello @sgugger , I'd like to work on m2m100

Batese2001 · 2023-04-18T18:31:03Z

@sgugger I would love to work on CodeGen if it is unclaimed

katiele47 · 2023-04-18T18:41:53Z

Hi @sgugger I can work on Luke if it has not been taken

VomV · 2023-04-23T18:10:36Z

@sgugger I would like to work on SwitchTransformer, if not taken.

sushmanthreddy · 2023-04-25T13:35:26Z

@sgugger I think all transformers are covered, I have checked for others also...for example, switch transformers have parallelism implemented already. i think we can close this issue. The only pending models are clip,jukebox,owlvit, and Nllb , may be model parallelism is not applicable for some of there models

sgugger · 2023-04-25T14:44:33Z

Indeed, all models have been covered. Thanks a lot everyone!

* add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: huggingface#22561) * fix

sgugger added the Good First Issue label Apr 4, 2023

kausmeows mentioned this issue Apr 5, 2023

feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart #22591

Merged

xssChauhan mentioned this issue Apr 5, 2023

Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 #22596

Merged

iamarunbrahma mentioned this issue Apr 7, 2023

moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models #22663

Merged

shahad-mahmud mentioned this issue Apr 9, 2023

Model parallelism: Moving labels to the same device as logits for BridgeTower models #22676

Merged

xssChauhan mentioned this issue Apr 9, 2023

(feat): Moving labels to same device as logits for Deit #22679

Merged

3 tasks

Asugawara mentioned this issue Apr 10, 2023

add GPTNeoXForSequenceClassification #22671

Merged

5 tasks

Asugawara added a commit to Asugawara/transformers that referenced this issue Apr 10, 2023

move the labels to logits.device (ref: huggingface#22561)

330e779

sgugger pushed a commit that referenced this issue Apr 10, 2023

add GPTNeoXForSequenceClassification (#22671)

6daa9cb

* add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: #22561) * fix

shahad-mahmud mentioned this issue Apr 10, 2023

Model parallelism: Moving labels to same devices as the logits are #22691

Merged

mayankagarwals mentioned this issue Apr 11, 2023

Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese #22702

Merged

Xrenya mentioned this issue Apr 11, 2023

Make vilt, switch_transformers compatible with model parallelism #22703

Merged

5 tasks

jprivera44 mentioned this issue Apr 11, 2023

Added parallel device usage for GPT-J #22713

Merged

sgugger closed this as completed in #22703 Apr 13, 2023

sgugger reopened this Apr 14, 2023

oscar-garzon mentioned this issue Apr 14, 2023

Move labels to the same device as logits for Whisper #22779

Merged

youssefadr mentioned this issue Apr 18, 2023

Make ClipSeg compatible with model parallelism #22844

Merged

5 tasks

elabongaatuo mentioned this issue Apr 19, 2023

feat(model parallelism): move labels to the same device as logits for M2M100 #22850

Merged

AdiaWu mentioned this issue Apr 21, 2023

JukeBox Model Parallelism by moving labels to same devices for logits #22905

Closed

5 tasks

katiele47 mentioned this issue Apr 21, 2023

Moved labels to enable parallelism pipeline in Luke model #22907

Closed

5 tasks

This was referenced Apr 21, 2023

Moved labels to enable parallelism pipeline in Luke model #22909

Merged

vilt_model #22930

Merged

sgugger closed this as completed Apr 25, 2023

novice03 pushed a commit to novice03/transformers that referenced this issue Jun 23, 2023

add GPTNeoXForSequenceClassification (huggingface#22671)

1af81ee

* add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: huggingface#22561) * fix

This was referenced Nov 28, 2023

RuntimeError(s) when attempting multi-GPU fine-tuning of IDEFICS with naive model parallelism #27736

Closed

Move tensors to same device to enable IDEFICS naive MP training #27746

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make all Transformer models compatible with model parallelism #22561

Make all Transformer models compatible with model parallelism #22561

sgugger commented Apr 4, 2023 •

edited

muaid-mughrabi commented Apr 4, 2023

iamarunbrahma commented Apr 4, 2023 •

edited

kausmeows commented Apr 4, 2023

zsc commented Apr 5, 2023 •

edited

TerryCM commented Apr 5, 2023 •

edited

mollerup23 commented Apr 5, 2023

xssChauhan commented Apr 5, 2023

kooshi commented Apr 6, 2023 •

edited

innat commented Apr 7, 2023

younesbelkada commented Apr 7, 2023

innat commented Apr 7, 2023

innat commented Apr 8, 2023

shahad-mahmud commented Apr 8, 2023

trantuantdt commented Apr 9, 2023

mollerup23 commented Apr 10, 2023

mayankagarwals commented Apr 11, 2023

jprivera44 commented Apr 11, 2023 •

edited

oscar-garzon commented Apr 14, 2023

abhigyan631 commented Apr 14, 2023

Tanmaypatil123 commented Apr 17, 2023

JuheonChu commented Apr 18, 2023

elabongaatuo commented Apr 18, 2023 •

edited

Batese2001 commented Apr 18, 2023

katiele47 commented Apr 18, 2023

VomV commented Apr 23, 2023

sushmanthreddy commented Apr 25, 2023 •

edited

sgugger commented Apr 25, 2023

Make all Transformer models compatible with model parallelism #22561

Make all Transformer models compatible with model parallelism #22561

Comments

sgugger commented Apr 4, 2023 • edited

muaid-mughrabi commented Apr 4, 2023

iamarunbrahma commented Apr 4, 2023 • edited

kausmeows commented Apr 4, 2023

zsc commented Apr 5, 2023 • edited

TerryCM commented Apr 5, 2023 • edited

mollerup23 commented Apr 5, 2023

xssChauhan commented Apr 5, 2023

kooshi commented Apr 6, 2023 • edited

innat commented Apr 7, 2023

younesbelkada commented Apr 7, 2023

innat commented Apr 7, 2023

innat commented Apr 8, 2023

shahad-mahmud commented Apr 8, 2023

trantuantdt commented Apr 9, 2023

mollerup23 commented Apr 10, 2023

mayankagarwals commented Apr 11, 2023

jprivera44 commented Apr 11, 2023 • edited

oscar-garzon commented Apr 14, 2023

abhigyan631 commented Apr 14, 2023

Tanmaypatil123 commented Apr 17, 2023

JuheonChu commented Apr 18, 2023

elabongaatuo commented Apr 18, 2023 • edited

Batese2001 commented Apr 18, 2023

katiele47 commented Apr 18, 2023

VomV commented Apr 23, 2023

sushmanthreddy commented Apr 25, 2023 • edited

sgugger commented Apr 25, 2023

sgugger commented Apr 4, 2023 •

edited

iamarunbrahma commented Apr 4, 2023 •

edited

zsc commented Apr 5, 2023 •

edited

TerryCM commented Apr 5, 2023 •

edited

kooshi commented Apr 6, 2023 •

edited

jprivera44 commented Apr 11, 2023 •

edited

elabongaatuo commented Apr 18, 2023 •

edited

sushmanthreddy commented Apr 25, 2023 •

edited