[FSDP][2/N] Remove params_with_grad (pytorch#87480)

This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. cc @ezyang @gchanan Pull Request resolved: pytorch#87480 Approved by: https://github.com/rohan-varma
kulinseth · Dec 9, 2022 · 27dc00b · 27dc00b
1 parent d84cd1f
commit 27dc00b
Showing 1 changed file with 0 additions and 9 deletions.
diff --git a/torch/distributed/fsdp/fully_sharded_data_parallel.py b/torch/distributed/fsdp/fully_sharded_data_parallel.py
@@ -52,8 +52,6 @@
     _sync_params_and_buffers,
     _to_kwargs,
 )
-from torch.nn.parameter import Parameter
-
 from ._optim_utils import (
     _broadcast_pos_dim_tensor_states,
     _broadcast_processed_optim_state_dict,
@@ -3913,13 +3911,6 @@ def no_sync(self) -> Generator:
                 )
                 m._sync_gradients = old_flag
 
-    @property
-    def params_with_grad(self) -> List[Parameter]:
-        """
-        Recursively returns a list of all module parameters that have a gradient.
-        """
-        return [p for p in self.parameters() if p.grad is not None]
-
     @torch.no_grad()
     def clip_grad_norm_(
         self, max_norm: Union[float, int], norm_type: Union[float, int] = 2.0