-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gradients flushed at the end for 'supervised_train_step' (#2459) #2470
Conversation
…#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.
Thanks for the PR @egaznep !
if you use docker, you can try https://hub.docker.com/r/pytorchignite/apex image where APEX is installed. AMP is built-in torch option, so this can be also tested. As for TPU, it is a bit more tricky, either Colab or XLA on cpu, I think I can test it myself. |
@egaznep can you update the code for amp, apex and tpu please ? |
I just found the time today and did the changes. Is there a "training test" of some sort that is built-in that I can perform before submitting? Also sadly I don't have docker and can't install it either as I don't have admin rights on this machine. |
@egaznep we have mnist example which uses create_supervised_trainer: https://github.com/pytorch/ignite/blob/master/examples/mnist/mnist.py but we didn't expose grad accumulation option. |
@egaznep thanks for the update! Can I ask you to add a test to https://github.com/pytorch/ignite/blob/master/tests/ignite/engine/test_create_supervised.py that checking that model grads are not empty between iterations (and thus could be logged) ? |
Closes pytorch#2459 with help of PR pytorch#2470
Closes pytorch#2459 with help of PR pytorch#2470
Closes pytorch#2459 with help of PR pytorch#2470
* Remove unnecessary code in BaseOutputHandler Closes #2438 * Add ReduceLROnPlateauScheduler Closes #1754 * Fix indentation issue * Fix another indentation issue * Fix PEP8 related issues * Fix other PEP8 related issues * Fix hopefully the last PEP8 related issue * Fix hopefully the last PEP8 related issue * Remove ReduceLROnPlateau's specific params and add link to it Also fix bug in min_lr check * Fix state_dict bug and add a test * Update docs * Fix gradients flushed at the end for 'supervised_train_step' (#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it. * Add doctest and fix typo * Fix gradient loggability for AMP, APEX and TPU (#2459) * Fix zero_grad place in trainer step Closes #2459 with help of PR #2470 * Improve tests and fix bug * Remove redundant stmts after pytest parametrize * Refactor tests * autopep8 fix * Improvement * Fix bug Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Unal Ege Gaznepoglu <egaznep@gmail.com> Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>
* Remove unnecessary code in BaseOutputHandler Closes #2438 * Add ReduceLROnPlateauScheduler Closes #1754 * Fix indentation issue * Fix another indentation issue * Fix PEP8 related issues * Fix other PEP8 related issues * Fix hopefully the last PEP8 related issue * Fix hopefully the last PEP8 related issue * Remove ReduceLROnPlateau's specific params and add link to it Also fix bug in min_lr check * Fix state_dict bug and add a test * Update docs * Fix gradients flushed at the end for 'supervised_train_step' (#2459) I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it. * Add doctest and fix typo * Fix gradient loggability for AMP, APEX and TPU (#2459) * Fix zero_grad place in trainer step Closes #2459 with help of PR #2470 * Improve tests and fix bug * Remove redundant stmts after pytest parametrize * Refactor tests * autopep8 fix * Improvement * Fix bug * Fix test bugs in test_create_supervised * Revert refactor * Empty commit * Fix pep Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Unal Ege Gaznepoglu <egaznep@gmail.com> Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>
Closing in favor of #2560 |
* Remove unnecessary code in BaseOutputHandler Closes #2438 * Add ReduceLROnPlateauScheduler Closes #1754 * Fix indentation issue * Fix another indentation issue * Fix PEP8 related issues * Fix other PEP8 related issues * Fix hopefully the last PEP8 related issue * Fix hopefully the last PEP8 related issue * Remove ReduceLROnPlateau's specific params and add link to it Also fix bug in min_lr check * Fix state_dict bug and add a test * Update docs * Add doctest and fix typo * Add whitelist param and refactor Closes #2548 * Fix docstrings and a bug * Change reduction parameter * Fix zero_grad place in trainer step Closes #2459 with help of PR #2470 * autopep8 fix * Fix bugs * Fix bugs in loggers * Fix bug in test_create_supervised * Change reduction type hint in base_logger * Fix mypy error * Fix bug causing missing clearml histograms Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: sadra-barikbin <sadra-barikbin@users.noreply.github.com>
I was extremely busy with my master's thesis. I am sorry that I could not complete the remaining tasks on time. I was planning to address the changes soon but it seems that someone else has already done it nicely. Thank you all for maintaining this immensely useful library. |
@egaznep no problems, hope your master thesis is done and in a good shape ! |
@vfdev-5 yes it finally is! Thank you, I appreciate it. |
I have not changed the behavior of AMP, APEX, TPU as I don't have the means to test it.
Fixes #2459
Description: #583 reported that gradient accumulation did not work at all. The fix done at the time broke gradient loggability for sake of correct gradient accumulation. This fix achieves both objectives.
The behavior for AMP, APEX and TPU is not changed as I cannot test them.
Check list: