Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to zero out the gradient before the forward #4905

Closed
wants to merge 1 commit into from

Conversation

sf-wind
Copy link

@sf-wind sf-wind commented Apr 10, 2023

Summary:
Currently the optimizer zeros the gradients after the forward and before the backward. In a recent PyTorch change, it set all gradients to None by default. This has a benefit of reducing the memory consumption (since all gradients are None).

However, doing this after the forward does not provide any memory saving, since the the memory consumption is maximum at the end of forward.

It doesn't matter whether the gradient is set to None before the forward or after the forward. So we should set it before the forward the enjoy the memory saving.

We add a flag to enable it instead of doing it by default for now. Since people can override the zero_grad function (as the comment indicates), we do not know exactly what is done inside the function. This is to be on the safe side so the current flows may not be broken.

Once we have gone through the existing flows, we should make the flag enabled by default.

Reviewed By: tglik

Differential Revision: D44264848

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Apr 10, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44264848

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44264848

sf-wind added a commit to sf-wind/detectron2 that referenced this pull request Apr 10, 2023
Summary:
Pull Request resolved: facebookresearch#4905

Currently the optimizer zeros the gradients after the forward and before the backward. In a recent PyTorch change, it set all gradients to None by default. This has a benefit of reducing the memory consumption (since all gradients are None).

However, doing this after the forward does not provide any memory saving, since the the memory consumption is maximum at the end of forward.

It doesn't matter whether the gradient is set to None before the forward or after the forward. So we should set it before the forward the enjoy the memory saving.

We add a flag to enable it instead of doing it by default for now. Since people can override the zero_grad function (as the comment indicates), we do not know exactly what is done inside the function. This is to be on the safe side so the current flows may not be broken.

Once we have gone through the existing flows, we should make the flag enabled by default.

Reviewed By: tglik

Differential Revision: D44264848

fbshipit-source-id: 7fa9b44dc44b10ff2b52adc5c57162b80d537efe
Summary:
Pull Request resolved: facebookresearch#4905

Currently the optimizer zeros the gradients after the forward and before the backward. In a recent PyTorch change, it set all gradients to None by default. This has a benefit of reducing the memory consumption (since all gradients are None).

However, doing this after the forward does not provide any memory saving, since the the memory consumption is maximum at the end of forward.

It doesn't matter whether the gradient is set to None before the forward or after the forward. So we should set it before the forward the enjoy the memory saving.

We add a flag to enable it instead of doing it by default for now. Since people can override the zero_grad function (as the comment indicates), we do not know exactly what is done inside the function. This is to be on the safe side so the current flows may not be broken.

Once we have gone through the existing flows, we should make the flag enabled by default.

Reviewed By: tglik

Differential Revision: D44264848

fbshipit-source-id: 9d79eef6d667ebc9753d3ac22573d42f72f98622
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44264848

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 88217ca.

danielm322 pushed a commit to danielm322/detectron2 that referenced this pull request Jun 9, 2023
Summary:
Pull Request resolved: facebookresearch#4905

Currently the optimizer zeros the gradients after the forward and before the backward. In a recent PyTorch change, it set all gradients to None by default. This has a benefit of reducing the memory consumption (since all gradients are None).

However, doing this after the forward does not provide any memory saving, since the the memory consumption is maximum at the end of forward.

It doesn't matter whether the gradient is set to None before the forward or after the forward. So we should set it before the forward the enjoy the memory saving.

We add a flag to enable it instead of doing it by default for now. Since people can override the zero_grad function (as the comment indicates), we do not know exactly what is done inside the function. This is to be on the safe side so the current flows may not be broken.

Once we have gone through the existing flows, we should make the flag enabled by default.

Reviewed By: tglik

Differential Revision: D44264848

fbshipit-source-id: a68c7cbd36439faf65801f0f771ae8bc9c130699
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants