Add AdaFactor optimizer from fairseq #6722

moscow25 · 2020-08-25T15:55:27Z

Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.

… MLM -- reduced memory consumption compared to ADAM.

moscow25 · 2020-08-25T15:56:19Z

Hey @sshleifer -- here is belated PR for AdaFactor. Please let me know how to edit this properly, and what tests or examples we should add. Thanks!

src/transformers/optimization.py

sshleifer

This is gunna be awesome!

Want to add a test similar to test_adamw here ?

Also I can take over whenever!

sshleifer · 2020-08-25T18:01:23Z

We will integrate into examples/ in a separate PR I think.

moscow25 · 2020-08-25T19:08:14Z

Thanks @sshleifer -- let me try to make those changes.

Agree that I should be able to add a single test -- appreciate the link -- and you can add examples in separate PR.

If I don't get this figure out soon, yes happy for you to make the changes yourself :-)

…ransformers into add_fairseq_adafactor

moscow25 · 2020-08-25T20:54:17Z

Hey @sshleifer -- think I got a test working finally. We can squash the commits.

Still not sure what I need to clean up for the code standards/linter.

Please advise, thanks!

sshleifer · 2020-08-25T21:37:57Z

For local style checking, you need: pip install isort --upgrade
Then make style and make quality to both suggest you have no errors.
They should autofix things or at least give error messages. My workflow is to define

sty () {
	make style
	flake8 examples templates tests src utils
}

and then run sty a lot.

tests/test_optimization.py

sshleifer · 2020-08-25T21:40:41Z

Also squashing happens automatically at merge time, don't worry about that.

moscow25 · 2020-08-25T21:47:23Z

For local style checking, you need: pip install isort --upgrade
Then make style and make quality to both suggest you have no errors.
They should autofix things or at least give error messages. My workflow is to define
sty () {
	make style
	flake8 examples templates tests src utils
}
and then run sty a lot.

Hmm. Is there a way for style to tell me the location in offending file? Output seems pretty minimal.

sshleifer · 2020-08-25T22:13:06Z

if you also run the flake8 command it should just fix it.

moscow25 · 2020-08-26T02:46:23Z

I think I fixed the formatting, as requested. Took a sec to figure that all out...

src/transformers/optimization.py

sshleifer · 2020-08-26T13:54:03Z

src/transformers/optimization.py

+#
+# Alternatively, relative_step with warmup_init can also be used.
+# Training without LR warmup or clip threshold, is not recommended. Additional optimizer operations
+# like gradient clipping, should not be used.


(nit)
This "second docstring" breaks style convention, I am OK to leave it here, because it is very useful, but would prefer to consolidate with the class docstring below.

Gotcha. It's up to you. Happy to move it, or if you want to consolidate the docstring in a future PR.

Let me try to make the change and see if you like it.

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

moscow25 · 2020-08-26T17:47:24Z

@sshleifer -- any idea what happened with the black / code quality changes overnite? I'm very confused. Seems as if the standard changed from yesterday...

sshleifer · 2020-08-26T18:59:36Z

Yes they did, sorry about that. I did some cleanup on this branch.
If you are curious about the style change: I tried to future proof it here #6748

codecov · 2020-08-26T19:06:52Z

Codecov Report

Merging #6722 into master will decrease coverage by 0.02%.
The diff coverage is 68.23%.

@@            Coverage Diff             @@
##           master    #6722      +/-   ##
==========================================
- Coverage   78.96%   78.94%   -0.03%     
==========================================
  Files         157      157              
  Lines       28486    28571      +85     
==========================================
+ Hits        22495    22555      +60     
- Misses       5991     6016      +25

Impacted Files	Coverage Δ
src/transformers/__init__.py	`99.28% <ø> (ø)`
src/transformers/optimization.py	`82.28% <68.23%> (-13.27%)`	⬇️
src/transformers/file_utils.py	`82.41% <0.00%> (-0.26%)`	⬇️
src/transformers/generation_tf_utils.py	`83.70% <0.00%> (+0.75%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a75c64d...8958b9f. Read the comment docs.

moscow25 · 2020-08-26T20:04:02Z

Awesome. Thanks @sshleifer. I'll start working more on the other less mature PRs we discussed. And please ping me if/when you write tests or examples for this. Happy to contribute to that as well if you need.

LysandreJik

Great, thanks a lot! Cool test as well.

LysandreJik · 2020-08-27T09:17:25Z

I've added Adafactor to the docs and slightly changed the style of the docstrings in #6765

sshleifer · 2020-08-27T22:02:16Z

Thanks! I'll add a --adafactor option lightning_base and trainer in 2 prs.

* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM. * update PR fixes, add basic test * bug -- incorrect params in test * bugfix -- import Adafactor into test * bugfix -- removed accidental T5 include * resetting T5 to master * bugfix -- include Adafactor in __init__ * longer loop for adafactor test * remove double error class declare * lint * black * isort * Update src/transformers/optimization.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * single docstring * Cleanup docstring Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

This reverts commit 006deb9.

AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and…

23be950

… MLM -- reduced memory consumption compared to ADAM.

sshleifer reviewed Aug 25, 2020

View reviewed changes

src/transformers/optimization.py Outdated Show resolved Hide resolved

sshleifer reviewed Aug 25, 2020

View reviewed changes

src/transformers/optimization.py Outdated Show resolved Hide resolved

sshleifer suggested changes Aug 25, 2020

View reviewed changes

Nikolai Y added 8 commits August 25, 2020 15:43

update PR fixes, add basic test

4c2fab9

bug -- incorrect params in test

7dcd9d7

bugfix -- import Adafactor into test

6225a21

bugfix -- removed accidental T5 include

079e62e

Merge branch 'add_fairseq_adafactor' of https://github.com/moscow25/t…

f8817c9

…ransformers into add_fairseq_adafactor

resetting T5 to master

c1dd815

bugfix -- include Adafactor in __init__

08dbc79

longer loop for adafactor test

706e5b2

sshleifer reviewed Aug 25, 2020

View reviewed changes

tests/test_optimization.py Outdated Show resolved Hide resolved

sshleifer requested review from sgugger and LysandreJik August 25, 2020 21:39

sshleifer changed the title ~~AdaFactor optimizer ported from fairseq.~~ Add AdaFactor optimizer from fairseq Aug 25, 2020

remove double error class declare

fc954b1

Nikolai Y added 3 commits August 25, 2020 21:53

lint

3330292

black

a17fcde

isort

99efc15

sshleifer approved these changes Aug 26, 2020

View reviewed changes

moscow25 and others added 2 commits August 26, 2020 13:25

Update src/transformers/optimization.py

4e7b3a0

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

single docstring

00fd3e5

sshleifer added 2 commits August 26, 2020 14:47

Merge branch 'master' into add_fairseq_adafactor

ec94c79

Cleanup docstring

8958b9f

LysandreJik approved these changes Aug 27, 2020

View reviewed changes

LysandreJik merged commit 971d180 into huggingface:master Aug 27, 2020

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Add AdaFactor optimizer from fairseq (huggingface#6722)"

d222f6d

This reverts commit 006deb9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AdaFactor optimizer from fairseq #6722

Add AdaFactor optimizer from fairseq #6722

moscow25 commented Aug 25, 2020 •

edited

Loading

moscow25 commented Aug 25, 2020

sshleifer left a comment

sshleifer commented Aug 25, 2020

moscow25 commented Aug 25, 2020

moscow25 commented Aug 25, 2020

sshleifer commented Aug 25, 2020

sshleifer commented Aug 25, 2020

moscow25 commented Aug 25, 2020

sshleifer commented Aug 25, 2020

moscow25 commented Aug 26, 2020

sshleifer Aug 26, 2020 •

edited

Loading

moscow25 Aug 26, 2020

moscow25 commented Aug 26, 2020

sshleifer commented Aug 26, 2020 •

edited

Loading

codecov bot commented Aug 26, 2020

moscow25 commented Aug 26, 2020

LysandreJik left a comment

LysandreJik commented Aug 27, 2020

sshleifer commented Aug 27, 2020

Add AdaFactor optimizer from fairseq #6722

Add AdaFactor optimizer from fairseq #6722

Conversation

moscow25 commented Aug 25, 2020 • edited Loading

moscow25 commented Aug 25, 2020

sshleifer left a comment

Choose a reason for hiding this comment

sshleifer commented Aug 25, 2020

moscow25 commented Aug 25, 2020

moscow25 commented Aug 25, 2020

sshleifer commented Aug 25, 2020

sshleifer commented Aug 25, 2020

moscow25 commented Aug 25, 2020

sshleifer commented Aug 25, 2020

moscow25 commented Aug 26, 2020

sshleifer Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

moscow25 Aug 26, 2020

Choose a reason for hiding this comment

moscow25 commented Aug 26, 2020

sshleifer commented Aug 26, 2020 • edited Loading

codecov bot commented Aug 26, 2020

Codecov Report

moscow25 commented Aug 26, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Aug 27, 2020

sshleifer commented Aug 27, 2020

moscow25 commented Aug 25, 2020 •

edited

Loading

sshleifer Aug 26, 2020 •

edited

Loading

sshleifer commented Aug 26, 2020 •

edited

Loading