Reset the gradients before compute them in loss.backward() #2800

jmarintur · 2024-03-15T21:01:23Z

Fixes #2799 and #2693

Description

Change backward propagation steps to the usual order.

Checklist

The issue that is being fixed is referred in the description
[] Only one issue is addressed in this pull request
[] Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

cc @subramen @albanD

pytorch-bot · 2024-03-15T21:01:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2800

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e1d1add with merge base 30e14df ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

Both are equivalent in terms of the computed values and we actually prefer to zero_grad after the optimizer step to clear the memory and not keep it around during fw/ beginning of bw.

jmarintur · 2024-03-28T22:26:56Z

Hi @albanD, funny thing that the accepted answer in here was to do it before (I do it too), but I can perfectly understand why in scenarios when you want to free memory as soon as you can, doing it right after may be the best strategy. Thanks for your clarification. Perhaps we should also close #2799 and #2693. Wdyt?

albanD · 2024-03-29T01:49:54Z

In general, any order works if you're not memory constrained (which was the case for most people in 2018 :D ). But these days, I think people care a lot more about memory use!

Thanks for pointing out these issues, they can indeed be closed.

geopapa11 · 2024-03-29T02:42:54Z

Hi @albanD out of curiosity why placing optimizer.zero_grad() right after optimizer.step() is more memory-efficient?

Is it because you are resetting all gradients right before you execute the next step of the internal loop for batch, (X, y) in enumerate(dataloader) and hence you release the respective memory before the execution of the forward propagation step?

While when optimizer.zero_grad() is before optimizer.step(), this means that memory is allocated for the gradients before we enter into the computation of the forward propagation step. And this means that less memory is available for forward propagation?

Is that the logic? Thanks in advance for your response! 😊

Reset the gradients before compute them in loss.backward()

e1d1add

facebook-github-bot added the cla signed label Mar 15, 2024

svekars added core Tutorials of any level of difficulty related to the core pytorch functionality intro labels Mar 15, 2024

albanD requested changes Mar 28, 2024

View reviewed changes

jmarintur closed this Mar 28, 2024

This was referenced Mar 29, 2024

I think optimizer.zero_grad() should go before loss.backward() #2799

Closed

optimizer.zero_grad call in wrong place in quickstart tutorial #2693

Closed

jmarintur deleted the correct-backpropagation-steps branch March 29, 2024 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reset the gradients before compute them in loss.backward() #2800

Reset the gradients before compute them in loss.backward() #2800

Uh oh!

jmarintur commented Mar 15, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Mar 15, 2024 •

edited

Loading

Uh oh!

albanD left a comment

Uh oh!

jmarintur commented Mar 28, 2024

Uh oh!

albanD commented Mar 29, 2024 •

edited

Loading

Uh oh!

geopapa11 commented Mar 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Reset the gradients before compute them in loss.backward() #2800

Reset the gradients before compute them in loss.backward() #2800

Uh oh!

Conversation

jmarintur commented Mar 15, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

pytorch-bot bot commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2800

✅ No Failures

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

jmarintur commented Mar 28, 2024

Uh oh!

albanD commented Mar 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geopapa11 commented Mar 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jmarintur commented Mar 15, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 15, 2024 •

edited

Loading

albanD commented Mar 29, 2024 •

edited

Loading