Skip to content

Commit

Permalink
fix batch_isend_irecv example incorrect usage (#110408)
Browse files Browse the repository at this point in the history
mismatched dtypes silently leads to wrong outputs in nccl

```
1:recv_tensor=tensor([0., 0.], device='cuda:1')
0:recv_tensor=tensor([2.8026e-45, 0.0000e+00], device='cuda:0')
```

Pull Request resolved: #110408
Approved by: https://github.com/awgu, https://github.com/Neilblaze
  • Loading branch information
H-Huang authored and pytorchmergebot committed Oct 4, 2023
1 parent 8672d64 commit 0949d97
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions torch/distributed/distributed_c10d.py
Original file line number Diff line number Diff line change
Expand Up @@ -1779,8 +1779,8 @@ def batch_isend_irecv(p2p_op_list):
Examples:
>>> # xdoctest: +SKIP("no rank")
>>> send_tensor = torch.arange(2) + 2 * rank
>>> recv_tensor = torch.randn(2)
>>> send_tensor = torch.arange(2, dtype=torch.float32) + 2 * rank
>>> recv_tensor = torch.randn(2, dtype=torch.float32)
>>> send_op = dist.P2POp(dist.isend, send_tensor, (rank + 1)%world_size)
>>> recv_op = dist.P2POp(dist.irecv, recv_tensor, (rank - 1 + world_size)%world_size)
>>> reqs = batch_isend_irecv([send_op, recv_op])
Expand Down

0 comments on commit 0949d97

Please sign in to comment.