Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to switch between APEX and AMP in Trainer #9137

Merged
merged 4 commits into from
Dec 15, 2020

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Dec 15, 2020

What does this PR do?

When PyTorch >= 1.6 is installed, Trainer is always using native AMP right now. This PR adds the option to switch between AMP and APEX, which can be useful:

  • because of the memory leak in AMP fixed (fixed in 1.7.1 but present in 1.6)
  • to benchmark APEX vs. AMP

It also simplifies a little bit the internal of Trainer with those.

trainer.add_callback(EarlyStoppingCallback(1, 0.0001))
train_output = trainer.train()
self.assertLess(train_output.global_step, 20 * 64 / 16)
with tempfile.TemporaryDirectory() as tmp_dir:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was saving things in a regression folder and adding lots of unwanted files. Moving it to a temp folder.

try:
trainer.train()
except AssertionError:
with tempfile.TemporaryDirectory() as tmp_dir:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely fantastic! Thank you, @sgugger!

I added a few small suggestions in the comments

src/transformers/training_args.py Outdated Show resolved Hide resolved
src/transformers/training_args.py Outdated Show resolved Hide resolved
src/transformers/trainer.py Outdated Show resolved Hide resolved
@stas00
Copy link
Contributor

stas00 commented Dec 15, 2020

This PR also removes _use_ddp_no_sync since presumably transformers no longer supports pytorch < 1.2

sgugger and others added 2 commits December 15, 2020 15:26
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM. The new auto is clean.

@sgugger sgugger merged commit ad895af into master Dec 15, 2020
@sgugger sgugger deleted the apex_amp_switch branch December 15, 2020 21:38
@stas00
Copy link
Contributor

stas00 commented Dec 15, 2020

There was one nit that you agreed with but didn't integrate - but I'm fine if it remains as merged - just a potential for divergence down the road...

@sgugger
Copy link
Collaborator Author

sgugger commented Dec 15, 2020

Oh, which one did I miss?

@stas00
Copy link
Contributor

stas00 commented Dec 15, 2020

#9137 (comment)

@stas00
Copy link
Contributor

stas00 commented Dec 15, 2020

Argh, one sec - I see what happened - that wasn't what I meant - sorry for not being clear. choices is crucial here - since you don't validate the user-provided values - this is error-prone.

I tried to suggest not repeating the options in the help comment - please let's have choices back, and duplicate them if you prefer the help to have the explicit repetition - thanks.

@sgugger
Copy link
Collaborator Author

sgugger commented Dec 15, 2020

Fixed directly on master in this commit.

@stas00
Copy link
Contributor

stas00 commented Dec 15, 2020

That's perfect. Thank you, @sgugger!

guyrosin pushed a commit to guyrosin/transformers that referenced this pull request Jan 15, 2021
…e#9137)

* Add possibility to switch between APEX and AMP in Trainer

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants