-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP][2/N] Remove params_with_grad
#87480
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87480
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 3 PendingAs of commit d69725e: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 0a98d576c4887b164322c01170aad00c927a43f9 Pull Request resolved: #87480
params_with_grad
params_with_grad
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. [ghstack-poisoned]
ghstack-source-id: 291ed256981ba2929496cae224548aecf2240059 Pull Request resolved: #87480
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. [ghstack-poisoned]
ghstack-source-id: b75d378f4f918d32160153e5c95bb16fef9468f6 Pull Request resolved: #87480
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good please add BC breaking label for release tracking purposes.
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. cc @ezyang @gchanan [ghstack-poisoned]
ghstack-source-id: d3ce1d344c03644e6d91300379f2f757a5a36299 Pull Request resolved: #87480
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
ghstack-source-id: 291ed256981ba2929496cae224548aecf2240059 Pull Request resolved: pytorch#87480
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. cc @ezyang @gchanan Pull Request resolved: pytorch#87480 Approved by: https://github.com/rohan-varma
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. cc @ezyang @gchanan Pull Request resolved: pytorch#87480 Approved by: https://github.com/rohan-varma
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. cc @ezyang @gchanan Pull Request resolved: pytorch#87480 Approved by: https://github.com/rohan-varma
Stack from ghstack:
params_with_grad
#87480 [FSDP][2/N] Removeparams_with_grad
clip_grad_norm_()
and tests #87479 [FSDP][1/N] Reworkclip_grad_norm_()
and testsThis PR removes the property
params_with_grad
fromFullyShardedDataParallel
. It was introduced when implementingclip_grad_norm_()
but was not consistently used. Personally, I do not think it makes sense forFullyShardedDataParallel
to expose this helper because it is not a common paradigm.This PR is technically BC-breaking. However, I checked that no one internally is using this API.
cc @ezyang @gchanan