[PyTorch] Modify `data_parallel` to work with small tensors #37704

malfet · 2020-05-02T00:32:55Z

Summary:
If input tensor can not be chunked, run parallel_apply on fewer devices
Modfy input tensor dimention in DataParallelUsesAllAvailableCUDADevices_CUDA to be chunkable by any number of available CUDA devices

Test Plan: Run test/cpp/api/parallel on machine with 6 GPUs

Differential Revision: D21365416

Summary: If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: c4a9dba62be76b06b8677615ce0f6cb22e552fb3

facebook-github-bot · 2020-05-02T00:33:14Z

This pull request was exported from Phabricator. Differential Revision: D21365416

mrshenli

Thanks for fixing!

torch/csrc/api/include/torch/nn/parallel/data_parallel.h

dr-ci · 2020-05-02T01:37:35Z

💊 Build failures summary and remediations

As of commit 6839496 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | pattern match details | 🔁 rerun) <confirmed not flaky by 2 failures>

May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41

May 01 18:34:01   test_debug_info (__main__.DistAutogradTestWithSpawn) ... skip (0.008s) 
May 01 18:34:02   test_dist_autograd_profiling (__main__.DistAutogradTestWithSpawn) ... ok (1.215s) 
May 01 18:34:03   test_embedding_bag_with_no_grad_tensors (__main__.DistAutogradTestWithSpawn) ... [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 ok (1.253s) 
May 01 18:34:04   test_error_in_context (__main__.DistAutogradTestWithSpawn) ... [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 ok (1.039s) 
May 01 18:34:05   test_grad_copy_sparse_indices_extra_ref (__main__.DistAutogradTestWithSpawn) ... /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator())

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 1 time.

facebook-github-bot · 2020-05-04T18:17:06Z

This pull request has been merged in c0ff085.

…37704) Summary: Pull Request resolved: pytorch#37704 If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8

malfet requested review from ebetica, goldsborough and yf225 as code owners May 2, 2020 00:32

facebook-github-bot added the fb-exported label May 2, 2020

mrshenli approved these changes May 2, 2020

View reviewed changes

torch/csrc/api/include/torch/nn/parallel/data_parallel.h Show resolved Hide resolved

facebook-github-bot closed this in c0ff085 May 4, 2020

facebook-github-bot added the merged label May 4, 2020

malfet deleted the export-D21365416 branch May 4, 2020 22:51

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Modify `data_parallel` to work with small tensors #37704

[PyTorch] Modify `data_parallel` to work with small tensors #37704

Uh oh!

malfet commented May 2, 2020

Uh oh!

facebook-github-bot commented May 2, 2020

Uh oh!

mrshenli left a comment

Uh oh!

Uh oh!

dr-ci bot commented May 2, 2020 •

edited

Loading

Uh oh!

facebook-github-bot commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[PyTorch] Modify data_parallel to work with small tensors #37704

[PyTorch] Modify data_parallel to work with small tensors #37704

Uh oh!

Conversation

malfet commented May 2, 2020

Uh oh!

facebook-github-bot commented May 2, 2020

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dr-ci bot commented May 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 Build failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_macos_10_13_py3_test (1/1)

Uh oh!

facebook-github-bot commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[PyTorch] Modify `data_parallel` to work with small tensors #37704

[PyTorch] Modify `data_parallel` to work with small tensors #37704

dr-ci bot commented May 2, 2020 •

edited

Loading