-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP][3/N] Unify fully_shard
auto wrap
#104408
Commits on Jun 29, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 960ce5c - Browse repository at this point
Copy the full SHA 960ce5cView commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for e6e27b7 - Browse repository at this point
Copy the full SHA e6e27b7View commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for c0097c2 - Browse repository at this point
Copy the full SHA c0097c2View commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for b5e6340 - Browse repository at this point
Copy the full SHA b5e6340View commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for a6b5800 - Browse repository at this point
Copy the full SHA a6b5800View commit details
Commits on Jun 30, 2023
-
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 90a74eb - Browse repository at this point
Copy the full SHA 90a74ebView commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 0458533 - Browse repository at this point
Copy the full SHA 0458533View commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 14a0ef4 - Browse repository at this point
Copy the full SHA 14a0ef4View commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for e35cc2e - Browse repository at this point
Copy the full SHA e35cc2eView commit details
Commits on Jul 5, 2023
-
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for f3be93d - Browse repository at this point
Copy the full SHA f3be93dView commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 141f2e8 - Browse repository at this point
Copy the full SHA 141f2e8View commit details
Commits on Jul 6, 2023
-
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for ecbfd72 - Browse repository at this point
Copy the full SHA ecbfd72View commit details
Commits on Jul 7, 2023
-
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 6a01c9a - Browse repository at this point
Copy the full SHA 6a01c9aView commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 954b142 - Browse repository at this point
Copy the full SHA 954b142View commit details -
Update on "[FSDP][3/N] Unify
fully_shard
auto wrap"This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for c501068 - Browse repository at this point
Copy the full SHA c501068View commit details