Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test schedulers] adjust to test the first step's reading #6429

Merged
merged 2 commits into from
Aug 27, 2020

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Aug 12, 2020

As I was working on a new scheduler, it was difficult to match numbers since the first step's reading was dropped in unwrap_schedule wrappers (they were taking the measurement after stepping). This PR adjusts the wrappers to first take a reading and then step.

This PR also makes a small refactoring to move all the unwrapping into the script, so the test just compares 2 lists. (avoiding multiple [l[0] for l in lrs_1])

The updated table is:

        scheds = {
            get_constant_schedule: ({}, [10.0] * self.num_steps),
            get_constant_schedule_with_warmup: (
                {"num_warmup_steps": 4},
                [0.0, 2.5, 5.0, 7.5, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0],
            ),
            get_linear_schedule_with_warmup: (
                {**common_kwargs},
                [0.0, 5.0, 10.0, 8.75, 7.5, 6.25, 5.0, 3.75, 2.5, 1.25],
            ),
            get_cosine_schedule_with_warmup: (
                {**common_kwargs},
                [0.0, 5.0, 10.0, 9.61, 8.53, 6.91, 5.0, 3.08, 1.46, 0.38],
            ),
            get_cosine_with_hard_restarts_schedule_with_warmup: (
                {**common_kwargs, "num_cycles": 2},
                [0.0, 5.0, 10.0, 8.53, 5.0, 1.46, 10.0, 8.53, 5.0, 1.46],
            ),
            get_polynomial_decay_schedule_with_warmup: (
                {**common_kwargs, "power": 2.0, "lr_end": 1e-7},
                [0.0, 5.0, 10.0, 7.656, 5.625, 3.906, 2.5, 1.406, 0.625, 0.156],
            ),
        }

Unrelated to the changes suggestion in this PR, it exposes 2 minor issues:

  1. We definitely have a one off problem there, as the last step's reading is one reading too early (which this change exposes) - it doesn't complete the intended cycle. This is probably unimportant for 100s of steps, but it definitely stands out when developing a new scheduler.

To illustrate, see this change in reported number for get_polynomial_decay_schedule_with_warmup:

-                [5.0, 10.0, 7.656, 5.625, 3.906, 2.5, 1.406, 0.625, 0.156, 1e-07],
+                [0.0, 5.0, 10.0, 7.656, 5.625, 3.906, 2.5, 1.406, 0.625, 0.156],

the expected last step of 1e-07 is not there. It never was.

  1. Also the first step's reading is 0.0 in all schedulers, except in get_constant_schedule, so the first step does nothing. This can be fixed with a potentially added min_lr=1e-7 to all schedulers, as it was suggested by @sshleifer in one of the recent scheduler-related PRs.

Let me know if this better fits into its own issue, as these issues have nothing to do with the PR itself. Or perhaps the 2 issues are just unimportant...

@codecov
Copy link

codecov bot commented Aug 12, 2020

Codecov Report

Merging #6429 into master will increase coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6429      +/-   ##
==========================================
+ Coverage   79.89%   79.94%   +0.05%     
==========================================
  Files         153      153              
  Lines       27902    27902              
==========================================
+ Hits        22291    22307      +16     
+ Misses       5611     5595      -16     
Impacted Files Coverage Δ
src/transformers/tokenization_albert.py 28.84% <0.00%> (-58.66%) ⬇️
src/transformers/modeling_tf_distilbert.py 64.47% <0.00%> (-32.95%) ⬇️
src/transformers/tokenization_utils.py 90.40% <0.00%> (+0.40%) ⬆️
src/transformers/tokenization_bert.py 91.51% <0.00%> (+0.44%) ⬆️
src/transformers/configuration_utils.py 96.59% <0.00%> (+0.68%) ⬆️
src/transformers/tokenization_openai.py 84.09% <0.00%> (+1.51%) ⬆️
src/transformers/tokenization_utils_fast.py 94.28% <0.00%> (+2.14%) ⬆️
src/transformers/tokenization_auto.py 97.72% <0.00%> (+2.27%) ⬆️
src/transformers/tokenization_transfo_xl.py 42.48% <0.00%> (+3.75%) ⬆️
src/transformers/generation_tf_utils.py 86.71% <0.00%> (+5.01%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ffea5c...324dd60. Read the comment docs.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, cleaner! Thanks a lot @stas00

@LysandreJik LysandreJik merged commit dbfe34f into huggingface:master Aug 27, 2020
@stas00 stas00 deleted the sched2 branch August 27, 2020 16:41
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants