Skip to content

Commit

Permalink
[device mesh] only check when world size > num_devices per host (#111091
Browse files Browse the repository at this point in the history
)

as titled
Pull Request resolved: #111091
Approved by: https://github.com/awgu, https://github.com/wz337
ghstack dependencies: #110898, #110900
  • Loading branch information
wanchaol authored and pytorchmergebot committed Oct 12, 2023
1 parent 9316c8b commit 097defb
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion torch/distributed/_tensor/device_mesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,10 @@ def _get_or_create_default_group(self):
# automatically set the current cuda/cuda-like device base on num of gpu devices available in each host
# NOTE: This device selection would only work for homogeneous hardware.
num_devices_per_host = device_handle.device_count()
if world_size % num_devices_per_host != 0:
if (
world_size > num_devices_per_host
and world_size % num_devices_per_host != 0
):
raise RuntimeError(
f"DeviceMesh only support homogeneous hardware, but found "
f"{world_size} ranks and {num_devices_per_host} {self.device_type} devices!"
Expand Down

0 comments on commit 097defb

Please sign in to comment.