Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470

Closed
wants to merge 5 commits into from

Conversation

egaznep
Copy link
Contributor

@egaznep egaznep commented Feb 16, 2022

I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.

Fixes #2459

Description: #583 reported that gradient accumulation did not work at all. The fix done at the time broke gradient loggability for sake of correct gradient accumulation. This fix achieves both objectives.

The behavior for AMP, APEX and TPU is not changed as I cannot test them.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

…#2459)

I  have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.
@github-actions github-actions bot added the module: engine Engine module label Feb 16, 2022
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 16, 2022

Thanks for the PR @egaznep !

The behavior for AMP, APEX and TPU is not changed as I cannot test them.

if you use docker, you can try https://hub.docker.com/r/pytorchignite/apex image where APEX is installed. AMP is built-in torch option, so this can be also tested.

As for TPU, it is a bit more tricky, either Colab or XLA on cpu, I think I can test it myself.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 21, 2022

@egaznep can you update the code for amp, apex and tpu please ?

@egaznep
Copy link
Contributor Author

egaznep commented Feb 24, 2022

@egaznep can you update the code for amp, apex and tpu please ?

I just found the time today and did the changes. Is there a "training test" of some sort that is built-in that I can perform before submitting? Also sadly I don't have docker and can't install it either as I don't have admin rights on this machine.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 24, 2022

@egaznep we have mnist example which uses create_supervised_trainer: https://github.com/pytorch/ignite/blob/master/examples/mnist/mnist.py but we didn't expose grad accumulation option.
I think you can update the code and I'll test it myself

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 24, 2022

@egaznep thanks for the update! Can I ask you to add a test to https://github.com/pytorch/ignite/blob/master/tests/ignite/engine/test_create_supervised.py that checking that model grads are not empty between iterations (and thus could be logged) ?

sadra-barikbin added a commit to sadra-barikbin/ignite that referenced this pull request Apr 17, 2022
sadra-barikbin added a commit to sadra-barikbin/ignite that referenced this pull request Apr 17, 2022
sadra-barikbin added a commit to sadra-barikbin/ignite that referenced this pull request Apr 19, 2022
vfdev-5 added a commit that referenced this pull request May 1, 2022
* Remove unnecessary code in BaseOutputHandler

Closes #2438

* Add ReduceLROnPlateauScheduler

Closes #1754

* Fix indentation issue

* Fix another indentation issue

* Fix PEP8 related issues

* Fix other PEP8 related issues

* Fix hopefully the last PEP8 related issue

* Fix hopefully the last PEP8 related issue

* Remove ReduceLROnPlateau's specific params and add link to it

Also fix bug in min_lr check

* Fix state_dict bug and add a test

* Update docs

* Fix gradients flushed at the end for 'supervised_train_step' (#2459)
I  have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.

* Add doctest and fix typo

* Fix gradient loggability for AMP, APEX and TPU (#2459)

* Fix zero_grad place in trainer step

Closes #2459 with help of PR #2470

* Improve tests and fix bug

* Remove redundant stmts after pytest parametrize

* Refactor tests

* autopep8 fix

* Improvement

* Fix bug

Co-authored-by: vfdev <vfdev.5@gmail.com>
Co-authored-by: Unal Ege Gaznepoglu <egaznep@gmail.com>
Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>
vfdev-5 added a commit that referenced this pull request May 2, 2022
* Remove unnecessary code in BaseOutputHandler

Closes #2438

* Add ReduceLROnPlateauScheduler

Closes #1754

* Fix indentation issue

* Fix another indentation issue

* Fix PEP8 related issues

* Fix other PEP8 related issues

* Fix hopefully the last PEP8 related issue

* Fix hopefully the last PEP8 related issue

* Remove ReduceLROnPlateau's specific params and add link to it

Also fix bug in min_lr check

* Fix state_dict bug and add a test

* Update docs

* Fix gradients flushed at the end for 'supervised_train_step' (#2459)
I  have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.

* Add doctest and fix typo

* Fix gradient loggability for AMP, APEX and TPU (#2459)

* Fix zero_grad place in trainer step

Closes #2459 with help of PR #2470

* Improve tests and fix bug

* Remove redundant stmts after pytest parametrize

* Refactor tests

* autopep8 fix

* Improvement

* Fix bug

* Fix test bugs in test_create_supervised

* Revert refactor

* Empty commit

* Fix pep

Co-authored-by: vfdev <vfdev.5@gmail.com>
Co-authored-by: Unal Ege Gaznepoglu <egaznep@gmail.com>
Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>
@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 2, 2022

Closing in favor of #2560

@vfdev-5 vfdev-5 closed this May 2, 2022
vfdev-5 added a commit that referenced this pull request May 3, 2022
* Remove unnecessary code in BaseOutputHandler

Closes #2438

* Add ReduceLROnPlateauScheduler

Closes #1754

* Fix indentation issue

* Fix another indentation issue

* Fix PEP8 related issues

* Fix other PEP8 related issues

* Fix hopefully the last PEP8 related issue

* Fix hopefully the last PEP8 related issue

* Remove ReduceLROnPlateau's specific params and add link to it

Also fix bug in min_lr check

* Fix state_dict bug and add a test

* Update docs

* Add doctest and fix typo

* Add whitelist param and refactor

Closes #2548

* Fix docstrings and a bug

* Change reduction parameter

* Fix zero_grad place in trainer step

Closes #2459 with help of PR #2470

* autopep8 fix

* Fix bugs

* Fix bugs in loggers

* Fix bug in test_create_supervised

* Change reduction type hint in base_logger

* Fix mypy error

* Fix bug causing missing clearml histograms

Co-authored-by: vfdev <vfdev.5@gmail.com>
Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>
@egaznep
Copy link
Contributor Author

egaznep commented May 3, 2022

Closing in favor of #2560

I was extremely busy with my master's thesis. I am sorry that I could not complete the remaining tasks on time. I was planning to address the changes soon but it seems that someone else has already done it nicely. Thank you all for maintaining this immensely useful library.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 3, 2022

@egaznep no problems, hope your master thesis is done and in a good shape !
Yes, your commits from this PR were reused in above referenced PR such that your contribution is also reported in the commits history.

@egaznep
Copy link
Contributor Author

egaznep commented May 3, 2022

@vfdev-5 yes it finally is! Thank you, I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: engine Engine module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GradsScalarHandler logs 0 gradients if default update function is used
2 participants