Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DataLoader] Share seed via Distributed Store to get rid of CUDA depe…
…ndency (#79829) (#79890) Fixes #79828 In distributed environment, before this PR, DataLoader would create a Tensor holding the shared seed in RANK 0 and send the Tensor to other processes. However, when `NCCL` is used as the distributed backend, the Tensor is required to be moved to cuda before broadcasted from RANK 0 to other RANKs. And, this causes the Issue where DataLoader doesn't move the Tensor to cuda before sharing using `NCCL`. After offline discussion with @mrshenli, we think the distributed Store is a better solution as the shared seed is just an integer value. Then, we can get rid of the dependency on NCCL and CUDA when sharing info between distributed processes for DataLoader. Pull Request resolved: #79829 Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
- Loading branch information