Skip to content

Conversation

@jmarintur
Copy link
Contributor

@jmarintur jmarintur commented Mar 15, 2024

Fixes #2799 and #2693

Description

Change backward propagation steps to the usual order.

Checklist

  • The issue that is being fixed is referred in the description
  • [] Only one issue is addressed in this pull request
  • [] Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @subramen @albanD

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2800

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e1d1add with merge base 30e14df (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@svekars svekars added core Tutorials of any level of difficulty related to the core pytorch functionality intro labels Mar 15, 2024
Copy link
Contributor

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are equivalent in terms of the computed values and we actually prefer to zero_grad after the optimizer step to clear the memory and not keep it around during fw/ beginning of bw.

@jmarintur
Copy link
Contributor Author

Hi @albanD, funny thing that the accepted answer in here was to do it before (I do it too), but I can perfectly understand why in scenarios when you want to free memory as soon as you can, doing it right after may be the best strategy. Thanks for your clarification. Perhaps we should also close #2799 and #2693. Wdyt?

@jmarintur jmarintur closed this Mar 28, 2024
@albanD
Copy link
Contributor

albanD commented Mar 29, 2024

In general, any order works if you're not memory constrained (which was the case for most people in 2018 :D ). But these days, I think people care a lot more about memory use!

Thanks for pointing out these issues, they can indeed be closed.

@geopapa11
Copy link

Hi @albanD out of curiosity why placing optimizer.zero_grad() right after optimizer.step() is more memory-efficient?

Is it because you are resetting all gradients right before you execute the next step of the internal loop for batch, (X, y) in enumerate(dataloader) and hence you release the respective memory before the execution of the forward propagation step?

While when optimizer.zero_grad() is before optimizer.step(), this means that memory is allocated for the gradients before we enter into the computation of the forward propagation step. And this means that less memory is available for forward propagation?

Is that the logic? Thanks in advance for your response! 😊

@jmarintur jmarintur deleted the correct-backpropagation-steps branch March 29, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed core Tutorials of any level of difficulty related to the core pytorch functionality intro

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I think optimizer.zero_grad() should go before loss.backward()

5 participants