Add multi-GPU support for seq2seq example #1982

gmittal · 2022-03-11T02:18:13Z

What does this PR do?

Adds multi-GPU for seq2seq example by using pmap to implement data parallel training. We used this modified example as part of our research and hope that others will find it useful as well.

Checklist

This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other
checks if that's the case).
This change is discussed in a Github issue/
discussion (please add a
link).
The documentation and docstrings adhere to the
documentation guidelines.
This change includes necessary high-coverage tests.
(No quality testing = no merge!)

marcvanzee · 2022-03-11T09:32:19Z

Thanks for filing this PR @gmittal! To be honest, I am not sure if it is a good idea to complicate the seq2seq example further. The main thing we want to display in this example is how to implement a recurrent network, so we intentionally keep everything else as simple as possible (input pipeline, training, metrics, ...).

Also, since the dataset is so simple and training takes only a few minutes on CPU (thanks to your change!), it doesn't seem necessary to use data parallel training either.

However, I do think it is very nice that you made this example, and one thing we could do is to turn this into a HOWTO called "Data-parallel training on multiple devices". We could then show the minimal changes required to rewrite the seq2seq example, similar to our existing howto Ensembling on multiple devices.

Our HOWTO system has the ability to use side-by-side diff views very easily (see the source code ensembling.rst), so that could be quite simple to implement.

I think it could be a really great addition to our HOWTOs, since many people are interested in doing data-parallel training. Would you be interested in doing this?

gmittal · 2022-03-11T20:16:05Z

That makes sense (I suspected that DP was a bit overkill for this anyways) and I'd be happy to make a HOWTO tutorial with the side-by-side diffs.

marcvanzee · 2022-03-12T06:18:09Z

That makes sense (I suspected that DP was a bit overkill for this anyways) and I'd be happy to make a HOWTO tutorial with the side-by-side diffs.

That is really awesome! I'll create an issue and assign it to you.

marcvanzee · 2022-05-16T12:05:03Z

Closing this since we decided it would be better to add this in a HOWTO.

Add pmap to seq2seq example

ccbe477

marcvanzee mentioned this pull request Mar 12, 2022

Add HOWTO: Data-parallel training #1986

Closed

andsteing assigned marcvanzee Apr 26, 2022

andsteing added the Priority: P3 - low label Apr 26, 2022

marcvanzee mentioned this pull request May 12, 2022

HOWTOS tracking issue #2116

Open

16 tasks

marcvanzee closed this May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-GPU support for seq2seq example #1982

Add multi-GPU support for seq2seq example #1982

gmittal commented Mar 11, 2022

marcvanzee commented Mar 11, 2022

gmittal commented Mar 11, 2022

marcvanzee commented Mar 12, 2022

marcvanzee commented May 16, 2022

Add multi-GPU support for seq2seq example #1982

Add multi-GPU support for seq2seq example #1982

Conversation

gmittal commented Mar 11, 2022

What does this PR do?

Checklist

marcvanzee commented Mar 11, 2022

gmittal commented Mar 11, 2022

marcvanzee commented Mar 12, 2022

marcvanzee commented May 16, 2022