[s2s test_finetune_trainer] failing multigpu test #8400

@sshleifer

Sam, ``` RUN_SLOW=1 pytest examples/seq2seq/test_finetune_trainer.py::TestFinetuneTrainer::test_finetune_trainer_slow ``` fails for me - not learning anything. ``` > assert first_step_stats["eval_bleu"] < last_step_stats["eval_bleu"] # model learned nothing E AssertionError: assert 0.0 < 0.0 ``` Looking at the logs, it gains some knowledge in the first half of the epochs and then drops back to 0.00 in the last ones. Changing to lr 3e-3 (this PR) seems to make it more stable, but it could be a card specific thing - this is with rtx3090. Alternatively the test should compare not the first and last metrics, but perhaps something more flexible? But other way it feels too dependent on the card/config - perhaps a long term approach to make it more resilient is by feeding it more than 8 records. @sshleifer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[s2s test_finetune_trainer] failing multigpu test #8400

[s2s test_finetune_trainer] failing multigpu test #8400

Commits on Nov 8, 2020