Skip to content

Commit

Permalink
device mesh doc update
Browse files Browse the repository at this point in the history
  • Loading branch information
wz337 authored and pytorchmergebot committed May 23, 2024
1 parent d62b025 commit 863e19d
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions torch/distributed/fsdp/fully_sharded_data_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,14 @@ class FullyShardedDataParallel(nn.Module, _FSDPState):
``ignored_modules`` soon. For backward compatibility, we keep both
``ignored_states`` and `ignored_modules``, but FSDP only allows one
of them to be specified as not ``None``.
device_mesh (Optional[DeviceMesh]): DeviceMesh can be used as an altenative to
process_group. When device_mesh is passed, FSDP will use the underlying process
groups for all-gather and reduce-scatter collective communications. Therefore,
these two args need to be mutually exclusive. For hybrid sharding strategies such as
``ShardingStrategy.HYBRID_SHARD``, users can pass in a 2D DeviceMesh instead
of a tuple of process groups. For 2D FSDP + TP, users are required to pass in
device_mesh instead of process_group. For more DeviceMesh info, please visit:
https://pytorch.org/tutorials/recipes/distributed_device_mesh.html
"""

def __init__(
Expand Down

0 comments on commit 863e19d

Please sign in to comment.