Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470

egaznep · 2022-02-16T14:08:48Z

I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.

Description: #583 reported that gradient accumulation did not work at all. The fix done at the time broke gradient loggability for sake of correct gradient accumulation. This fix achieves both objectives.

The behavior for AMP, APEX and TPU is not changed as I cannot test them.

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

…#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.

vfdev-5 · 2022-02-16T16:35:50Z

Thanks for the PR @egaznep !

The behavior for AMP, APEX and TPU is not changed as I cannot test them.

if you use docker, you can try https://hub.docker.com/r/pytorchignite/apex image where APEX is installed. AMP is built-in torch option, so this can be also tested.

As for TPU, it is a bit more tricky, either Colab or XLA on cpu, I think I can test it myself.

vfdev-5 · 2022-02-21T21:56:22Z

@egaznep can you update the code for amp, apex and tpu please ?

egaznep · 2022-02-24T10:25:03Z

@egaznep can you update the code for amp, apex and tpu please ?

I just found the time today and did the changes. Is there a "training test" of some sort that is built-in that I can perform before submitting? Also sadly I don't have docker and can't install it either as I don't have admin rights on this machine.

vfdev-5 · 2022-02-24T10:40:55Z

@egaznep we have mnist example which uses create_supervised_trainer: https://github.com/pytorch/ignite/blob/master/examples/mnist/mnist.py but we didn't expose grad accumulation option.
I think you can update the code and I'll test it myself

vfdev-5 · 2022-02-24T15:55:57Z

@egaznep thanks for the update! Can I ask you to add a test to https://github.com/pytorch/ignite/blob/master/tests/ignite/engine/test_create_supervised.py that checking that model grads are not empty between iterations (and thus could be logged) ?

Closes pytorch#2459 with help of PR pytorch#2470

* Remove unnecessary code in BaseOutputHandler Closes #2438 * Add ReduceLROnPlateauScheduler Closes #1754 * Fix indentation issue * Fix another indentation issue * Fix PEP8 related issues * Fix other PEP8 related issues * Fix hopefully the last PEP8 related issue * Fix hopefully the last PEP8 related issue * Remove ReduceLROnPlateau's specific params and add link to it Also fix bug in min_lr check * Fix state_dict bug and add a test * Update docs * Fix gradients flushed at the end for 'supervised_train_step' (#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it. * Add doctest and fix typo * Fix gradient loggability for AMP, APEX and TPU (#2459) * Fix zero_grad place in trainer step Closes #2459 with help of PR #2470 * Improve tests and fix bug * Remove redundant stmts after pytest parametrize * Refactor tests * autopep8 fix * Improvement * Fix bug Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Unal Ege Gaznepoglu <egaznep@gmail.com> Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>

* Remove unnecessary code in BaseOutputHandler Closes #2438 * Add ReduceLROnPlateauScheduler Closes #1754 * Fix indentation issue * Fix another indentation issue * Fix PEP8 related issues * Fix other PEP8 related issues * Fix hopefully the last PEP8 related issue * Fix hopefully the last PEP8 related issue * Remove ReduceLROnPlateau's specific params and add link to it Also fix bug in min_lr check * Fix state_dict bug and add a test * Update docs * Fix gradients flushed at the end for 'supervised_train_step' (#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it. * Add doctest and fix typo * Fix gradient loggability for AMP, APEX and TPU (#2459) * Fix zero_grad place in trainer step Closes #2459 with help of PR #2470 * Improve tests and fix bug * Remove redundant stmts after pytest parametrize * Refactor tests * autopep8 fix * Improvement * Fix bug * Fix test bugs in test_create_supervised * Revert refactor * Empty commit * Fix pep Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Unal Ege Gaznepoglu <egaznep@gmail.com> Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>

vfdev-5 · 2022-05-02T22:36:31Z

Closing in favor of #2560

* Remove unnecessary code in BaseOutputHandler Closes #2438 * Add ReduceLROnPlateauScheduler Closes #1754 * Fix indentation issue * Fix another indentation issue * Fix PEP8 related issues * Fix other PEP8 related issues * Fix hopefully the last PEP8 related issue * Fix hopefully the last PEP8 related issue * Remove ReduceLROnPlateau's specific params and add link to it Also fix bug in min_lr check * Fix state_dict bug and add a test * Update docs * Add doctest and fix typo * Add whitelist param and refactor Closes #2548 * Fix docstrings and a bug * Change reduction parameter * Fix zero_grad place in trainer step Closes #2459 with help of PR #2470 * autopep8 fix * Fix bugs * Fix bugs in loggers * Fix bug in test_create_supervised * Change reduction type hint in base_logger * Fix mypy error * Fix bug causing missing clearml histograms Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>

egaznep · 2022-05-03T08:31:26Z

Closing in favor of #2560

I was extremely busy with my master's thesis. I am sorry that I could not complete the remaining tasks on time. I was planning to address the changes soon but it seems that someone else has already done it nicely. Thank you all for maintaining this immensely useful library.

vfdev-5 · 2022-05-03T08:39:56Z

@egaznep no problems, hope your master thesis is done and in a good shape !
Yes, your commits from this PR were reused in above referenced PR such that your contribution is also reported in the commits history.

egaznep · 2022-05-03T08:48:19Z

@vfdev-5 yes it finally is! Thank you, I appreciate it.

Fix gradients flushed at the end for 'supervised_train_step' (pytorch…

7dbf8f5

…#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.

github-actions bot added the module: engine Engine module label Feb 16, 2022

Merge branch 'master' into fix_2459

d4b513e

egaznep added 2 commits February 24, 2022 11:47

Fix gradient loggability for AMP, APEX and TPU (pytorch#2459)

b715475

Merge branch 'fix_2459' of github.com:egaznep/ignite into fix_2459

310ba5a

Merge branch 'master' into fix_2459

25065d0

vfdev-5 mentioned this pull request Apr 17, 2022

Feature add whitelist parameter issue 2548 #2550

Merged

3 tasks

sadra-barikbin added a commit to sadra-barikbin/ignite that referenced this pull request Apr 17, 2022

Fix zero_grad place in trainer step

b1fd3b0

Closes pytorch#2459 with help of PR pytorch#2470

sadra-barikbin added a commit to sadra-barikbin/ignite that referenced this pull request Apr 17, 2022

Fix zero_grad place in trainer step

fef3f67

Closes pytorch#2459 with help of PR pytorch#2470

sadra-barikbin added a commit to sadra-barikbin/ignite that referenced this pull request Apr 19, 2022

Fix zero_grad place in trainer step

530e69a

Closes pytorch#2459 with help of PR pytorch#2470

sadra-barikbin mentioned this pull request Apr 19, 2022

Fix zero_grad place resulting in zero logs #2555

Merged

3 tasks

vfdev-5 closed this May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470

Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470

egaznep commented Feb 16, 2022

vfdev-5 commented Feb 16, 2022

vfdev-5 commented Feb 21, 2022

egaznep commented Feb 24, 2022

vfdev-5 commented Feb 24, 2022

vfdev-5 commented Feb 24, 2022

vfdev-5 commented May 2, 2022

egaznep commented May 3, 2022

vfdev-5 commented May 3, 2022

egaznep commented May 3, 2022

Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470

Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470

Conversation

egaznep commented Feb 16, 2022

vfdev-5 commented Feb 16, 2022

vfdev-5 commented Feb 21, 2022

egaznep commented Feb 24, 2022

vfdev-5 commented Feb 24, 2022

vfdev-5 commented Feb 24, 2022

vfdev-5 commented May 2, 2022

egaznep commented May 3, 2022

vfdev-5 commented May 3, 2022

egaznep commented May 3, 2022