Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-GPU support for seq2seq example #1982

Closed
wants to merge 1 commit into from

Conversation

gmittal
Copy link
Collaborator

@gmittal gmittal commented Mar 11, 2022

What does this PR do?

Adds multi-GPU for seq2seq example by using pmap to implement data parallel training. We used this modified example as part of our research and hope that others will find it useful as well.

Checklist

  • This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other
    checks if that's the case).
  • This change is discussed in a Github issue/
    discussion (please add a
    link).
  • The documentation and docstrings adhere to the
    documentation guidelines.
  • This change includes necessary high-coverage tests.
    (No quality testing = no merge!)

@marcvanzee
Copy link
Collaborator

Thanks for filing this PR @gmittal! To be honest, I am not sure if it is a good idea to complicate the seq2seq example further. The main thing we want to display in this example is how to implement a recurrent network, so we intentionally keep everything else as simple as possible (input pipeline, training, metrics, ...).

Also, since the dataset is so simple and training takes only a few minutes on CPU (thanks to your change!), it doesn't seem necessary to use data parallel training either.

However, I do think it is very nice that you made this example, and one thing we could do is to turn this into a HOWTO called "Data-parallel training on multiple devices". We could then show the minimal changes required to rewrite the seq2seq example, similar to our existing howto Ensembling on multiple devices.

Our HOWTO system has the ability to use side-by-side diff views very easily (see the source code ensembling.rst), so that could be quite simple to implement.

I think it could be a really great addition to our HOWTOs, since many people are interested in doing data-parallel training. Would you be interested in doing this?

@gmittal
Copy link
Collaborator Author

gmittal commented Mar 11, 2022

That makes sense (I suspected that DP was a bit overkill for this anyways) and I'd be happy to make a HOWTO tutorial with the side-by-side diffs.

@marcvanzee
Copy link
Collaborator

That makes sense (I suspected that DP was a bit overkill for this anyways) and I'd be happy to make a HOWTO tutorial with the side-by-side diffs.

That is really awesome! I'll create an issue and assign it to you.

@marcvanzee
Copy link
Collaborator

Closing this since we decided it would be better to add this in a HOWTO.

@marcvanzee marcvanzee closed this May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants