Add parallelization support for T5EncoderModel #9082

agemagician · 2020-12-12T21:14:18Z

What does this PR do?

Extend T5EncoderModel to support model parallization across different GPUs.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

T5: @patrickvonplaten

add model parallelism to T5EncoderModel

LysandreJik · 2020-12-14T03:04:27Z

Very cool! Could you also enable the parallelization tests for these models? You can check how it was done in the initial model parallel PR, here's the commit related to the tests. You can just add the T5EncoderModel to the all_parallelizable_model_classes attribute of the T5ModelTester class.

agemagician · 2020-12-14T08:54:43Z

Very cool! Could you also enable the parallelization tests for these models? You can check how it was done in the initial model parallel PR, here's the commit related to the tests. You can just add the T5EncoderModel to the all_parallelizable_model_classes attribute of the T5ModelTester class.

Thanks for the tip.
Done, please let me know if anything else is needed from my side.

LysandreJik

This LGTM. Looking into it it seems we have an error in T5Stask as it is creating the device map with torch.cuda.device_count(), rather than the range of that value like you're doing it here. Since we're always passing the device map to T5Stack (it's never used as a standalone model) we don't see it, but it doesn't seem correct.

What do you think? If you think this is true, do you mind adding a range in T5Stack so that we can merge it together? Thanks!

LysandreJik · 2020-12-14T15:01:03Z

Also it would be great if you could run make style && make quality or make fixup to solve the quality issues.

agemagician · 2020-12-14T15:23:22Z

This LGTM. Looking into it it seems we have an error in T5Stask as it is creating the device map with torch.cuda.device_count(), rather than the range of that value like you're doing it here. Since we're always passing the device map to T5Stack (it's never used as a standalone model) we don't see it, but it doesn't seem correct.

What do you think? If you think this is true, do you mind adding a range in T5Stack so that we can merge it together? Thanks!

Yes, you are correct, T5Stack should also use range. Since "get_device_map" function apply len to it .
I have updated T5Stack using range.

agemagician · 2020-12-14T15:56:35Z

Also it would be great if you could run make style && make quality or make fixup to solve the quality issues.

Done and passed the code quality testing.

LysandreJik · 2020-12-14T17:00:41Z

Wonderful!

agemagician added 3 commits December 11, 2020 21:20

add model parallelism to T5EncoderModel

0cb5088

add model parallelism to T5EncoderModel

remove decoder from T5EncoderModel parallelize

dfb4638

uodate T5EncoderModel docs

6c89e14

Extend T5ModelTest for T5EncoderModel

309dc14

LysandreJik approved these changes Dec 14, 2020

View reviewed changes

fix T5Stask using range for get_device_map

1b1b404

fix style

53c31c5

LysandreJik merged commit a9c8bff into huggingface:master Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallelization support for T5EncoderModel #9082

Add parallelization support for T5EncoderModel #9082

agemagician commented Dec 12, 2020

LysandreJik commented Dec 14, 2020

agemagician commented Dec 14, 2020

LysandreJik left a comment

LysandreJik commented Dec 14, 2020

agemagician commented Dec 14, 2020

agemagician commented Dec 14, 2020

LysandreJik commented Dec 14, 2020

Add parallelization support for T5EncoderModel #9082

Add parallelization support for T5EncoderModel #9082

Conversation

agemagician commented Dec 12, 2020

What does this PR do?

Before submitting

Who can review?

LysandreJik commented Dec 14, 2020

agemagician commented Dec 14, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Dec 14, 2020

agemagician commented Dec 14, 2020

agemagician commented Dec 14, 2020

LysandreJik commented Dec 14, 2020