Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSDP][3/N] Unify fully_shard auto wrap #104408

Closed
wants to merge 15 commits into from
Closed

Commits on Jun 29, 2023

  1. [FSDP][3/N] Unify fully_shard auto wrap

    [ghstack-poisoned]
    awgu committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    960ce5c View commit details
    Browse the repository at this point in the history
  2. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    e6e27b7 View commit details
    Browse the repository at this point in the history
  3. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    c0097c2 View commit details
    Browse the repository at this point in the history
  4. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    b5e6340 View commit details
    Browse the repository at this point in the history
  5. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    a6b5800 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2023

  1. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 30, 2023
    Configuration menu
    Copy the full SHA
    90a74eb View commit details
    Browse the repository at this point in the history
  2. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 30, 2023
    Configuration menu
    Copy the full SHA
    0458533 View commit details
    Browse the repository at this point in the history
  3. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 30, 2023
    Configuration menu
    Copy the full SHA
    14a0ef4 View commit details
    Browse the repository at this point in the history
  4. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    [ghstack-poisoned]
    awgu committed Jun 30, 2023
    Configuration menu
    Copy the full SHA
    e35cc2e View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2023

  1. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    This includes several important fixes:
    - We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
    - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters).
    - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases).
    
    [ghstack-poisoned]
    awgu committed Jul 5, 2023
    Configuration menu
    Copy the full SHA
    f3be93d View commit details
    Browse the repository at this point in the history
  2. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    This includes several important fixes:
    - We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
    - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters).
    - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases).
    
    [ghstack-poisoned]
    awgu committed Jul 5, 2023
    Configuration menu
    Copy the full SHA
    141f2e8 View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2023

  1. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    This includes several important fixes:
    - We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
    - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters).
    - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases).
    
    [ghstack-poisoned]
    voznesenskym committed Jul 6, 2023
    Configuration menu
    Copy the full SHA
    ecbfd72 View commit details
    Browse the repository at this point in the history

Commits on Jul 7, 2023

  1. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    This includes several important fixes:
    - We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
    - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters).
    - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases).
    
    [ghstack-poisoned]
    awgu committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    6a01c9a View commit details
    Browse the repository at this point in the history
  2. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    This includes several important fixes:
    - We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
    - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters).
    - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases).
    
    [ghstack-poisoned]
    awgu committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    954b142 View commit details
    Browse the repository at this point in the history
  3. Update on "[FSDP][3/N] Unify fully_shard auto wrap"

    This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules.
    
    This includes several important fixes:
    - We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
    - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters).
    - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases).
    
    [ghstack-poisoned]
    awgu committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    c501068 View commit details
    Browse the repository at this point in the history