refactor almost identical tests #6339

stas00 · 2020-08-08T03:36:10Z

in preparation for adding more schedulers this PR refactors these almost identical tests.

Unfortunately can't use pytest.mark.parametrize, so the only drawback that it makes them all into a single test. It'd have been nice to parametrize instead.

codecov · 2020-08-08T03:42:08Z

Codecov Report

Merging #6339 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #6339      +/-   ##
==========================================
- Coverage   78.37%   78.34%   -0.04%     
==========================================
  Files         148      148              
  Lines       27196    27196              
==========================================
- Hits        21316    21307       -9     
- Misses       5880     5889       +9

Impacted Files	Coverage Δ
src/transformers/generation_tf_utils.py	`84.21% <0.00%> (-2.26%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 322dffc...92d3825. Read the comment docs.

stas00 · 2020-08-08T03:43:23Z

could also modify unwrap_schedule and unwrap_and_save_reload_schedule to return a clean list of numbers, and then it'd be just:

        for scheduler_func, data in scheds.items():
            kwargs, expected_learning_rates = data

            scheduler = scheduler_func(self.optimizer, **kwargs)
            lrs_1 = unwrap_schedule(scheduler, self.num_steps)
            self.assertListAlmostEqual(lrs_1, expected_learning_rates, tol=1e-2)

            scheduler = scheduler_func(self.optimizer, **kwargs)
            lrs_2 = unwrap_and_save_reload_schedule(scheduler, self.num_steps)
            self.assertListEqual(lrs_1, lrs_2)

but perhaps it'd be less intuitive for those reading the test code.

sshleifer · 2020-08-08T03:46:56Z

Does this impact tracebacks in a bad way? Previously I would know which scheduler I broke if test_warmup_constant_scheduler failed.

stas00 · 2020-08-08T03:59:29Z

That's super-imporant, @sshleifer, thank you for flagging that!

Added an assert msg to make it clear what fails, e.g. if I break data for the sake of demo, we now get:

        for scheduler_func, data in scheds.items():
            kwargs, expected_learning_rates = data

            scheduler = scheduler_func(self.optimizer, **kwargs)
            lrs_1 = unwrap_schedule(scheduler, self.num_steps)
            self.assertEqual(len(lrs_1[0]), 1)
            self.assertListAlmostEqual(
>               [l[0] for l in lrs_1], expected_learning_rates, tol=1e-2, msg=f"failed for {scheduler_func}"
            )

tests/test_optimization.py:126:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_optimization.py:92: in assertListAlmostEqual
    self.assertAlmostEqual(a, b, delta=tol, msg=msg)
E   AssertionError: 2.5 != 3.5 within 0.01 delta (1.0 difference) : failed for <function get_constant_schedule_with_warmup at 0x7f5da6f0bdd0>

stas00 · 2020-08-08T04:02:32Z

hmm, not sure whether the last commit, to make the assert message even more specific, was needed.

Also, alternatively, I can move the code out of unittest class and then use pytest parametrization so it'll be self-documenting on assert. Ala:

transformers/examples/seq2seq/test_seq2seq_examples.py

Line 238 in 175cd45

@pytest.mark.parametrize(

sshleifer · 2020-08-08T05:06:49Z

LGTM as is, but won't merge it myself.

LysandreJik

Looks good to me, looking forward to new schedulers!

* [wip] add get_polynomial_decay_schedule_with_warmup * style * add assert * change lr_end to a much smaller default number * check for exact equality * [model_cards] electra-base-turkish-cased-ner (#6350) * for electra-base-turkish-cased-ner * Add metadata Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Temporarily de-activate TPU CI * Update modeling_tf_utils.py (#6372) fix typo: ckeckpoint->checkpoint * the test now works again (#6371) * correct pl link in readme (#6364) * refactor almost identical tests (#6339) * refactor almost identical tests * important to add a clear assert error message * make the assert error even more descriptive than the original bt * Small docfile fixes (#6328) * Patch models (#6326) * TFAlbertFor{TokenClassification, MultipleChoice} * Patch models * BERT and TF BERT info s * Update check_repo * Ci GitHub caching (#6382) * Cache Github Actions CI * Remove useless file * Colab button (#6389) * Add colab button * Add colab link for tutorials * Fix links for open in colab (#6391) * Update src/transformers/optimization.py consistently use lr_end=1e-7 default Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [wip] add get_polynomial_decay_schedule_with_warmup * style * add assert * change lr_end to a much smaller default number * check for exact equality * Update src/transformers/optimization.py consistently use lr_end=1e-7 default Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove dup (leftover from merge) * convert the test into the new refactored format * stick to using the current_step as is, without ++ Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Alexander Measure <ameasure@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

refactor almost identical tests

16de4fb

important to add a clear assert error message

9d7e1a2

make the assert error even more descriptive than the original bt

92d3825

LysandreJik approved these changes Aug 10, 2020

View reviewed changes

LysandreJik merged commit 1429b92 into huggingface:master Aug 10, 2020

stas00 deleted the opt-refact branch August 10, 2020 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor almost identical tests #6339

refactor almost identical tests #6339

stas00 commented Aug 8, 2020

codecov bot commented Aug 8, 2020 •

edited

Loading

stas00 commented Aug 8, 2020 •

edited

Loading

sshleifer commented Aug 8, 2020 •

edited

Loading

stas00 commented Aug 8, 2020 •

edited

Loading

stas00 commented Aug 8, 2020 •

edited

Loading

sshleifer commented Aug 8, 2020

LysandreJik left a comment

refactor almost identical tests #6339

refactor almost identical tests #6339

Conversation

stas00 commented Aug 8, 2020

codecov bot commented Aug 8, 2020 • edited Loading

Codecov Report

stas00 commented Aug 8, 2020 • edited Loading

sshleifer commented Aug 8, 2020 • edited Loading

stas00 commented Aug 8, 2020 • edited Loading

stas00 commented Aug 8, 2020 • edited Loading

sshleifer commented Aug 8, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 8, 2020 •

edited

Loading

stas00 commented Aug 8, 2020 •

edited

Loading

sshleifer commented Aug 8, 2020 •

edited

Loading

stas00 commented Aug 8, 2020 •

edited

Loading

stas00 commented Aug 8, 2020 •

edited

Loading