Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to set Adam beta1, beta2 in TrainingArgs #5592

Merged
merged 2 commits into from
Jul 27, 2020

Conversation

gonglinyuan
Copy link
Contributor

In some models, beta1 and beta2 in Adam optimizer are set to be different from the default values (0.9, 0.999). For example, RoBERTa set beta2 = 0.98. It is thereby necessary to add beta1 and beta2 in TrainingArgs if the user wants to fine-tune RoBERTa and other similar models. Also, another hyperparameter of Adam, adam_epsilon, has already been added to TrainingArgs. For the purpose of consistency, it would be better of adam_beta1 and adam_beta2 are also added.

@codecov
Copy link

codecov bot commented Jul 8, 2020

Codecov Report

Merging #5592 into master will increase coverage by 1.21%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5592      +/-   ##
==========================================
+ Coverage   76.95%   78.16%   +1.21%     
==========================================
  Files         145      145              
  Lines       25317    25319       +2     
==========================================
+ Hits        19482    19790     +308     
+ Misses       5835     5529     -306     
Impacted Files Coverage Δ
src/transformers/trainer.py 37.96% <0.00%> (ø)
src/transformers/trainer_tf.py 16.53% <ø> (ø)
src/transformers/optimization_tf.py 57.65% <100.00%> (ø)
src/transformers/training_args.py 78.00% <100.00%> (+0.44%) ⬆️
src/transformers/modeling_tf_roberta.py 43.98% <0.00%> (-49.38%) ⬇️
src/transformers/generation_tf_utils.py 83.20% <0.00%> (-1.76%) ⬇️
src/transformers/file_utils.py 79.26% <0.00%> (-0.34%) ⬇️
src/transformers/modeling_tf_mobilebert.py 96.77% <0.00%> (+73.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfbb982...9bdb1a5. Read the comment docs.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, love this! @julien-c @jplu

@jplu
Copy link
Contributor

jplu commented Jul 9, 2020

LGTM! Really nice!!!

@julien-c
Copy link
Member

I'm fine with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants