-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[PyTorch] Modify data_parallel
to work with small tensors
#37704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: c4a9dba62be76b06b8677615ce0f6cb22e552fb3
This pull request was exported from Phabricator. Differential Revision: D21365416 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
💊 Build failures summary and remediationsAs of commit 6839496 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following build failures do not appear to be due to upstream breakages:
|
This pull request has been merged in c0ff085. |
…37704) Summary: Pull Request resolved: pytorch#37704 If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8
…37704) Summary: Pull Request resolved: pytorch#37704 If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8
Summary:
If input tensor can not be chunked, run
parallel_apply
on fewer devicesModfy input tensor dimention in
DataParallelUsesAllAvailableCUDADevices_CUDA
to be chunkable by any number of available CUDA devicesTest Plan: Run
test/cpp/api/parallel
on machine with 6 GPUsDifferential Revision: D21365416