Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented May 2, 2020

Summary:
If input tensor can not be chunked, run parallel_apply on fewer devices
Modfy input tensor dimention in DataParallelUsesAllAvailableCUDADevices_CUDA to be chunkable by any number of available CUDA devices

Test Plan: Run test/cpp/api/parallel on machine with 6 GPUs

Differential Revision: D21365416

Summary:
If input tensor can not be chunked, run `parallel_apply` on fewer devices
Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices

Test Plan: Run `test/cpp/api/parallel` on machine  with 6 GPUs

Differential Revision: D21365416

fbshipit-source-id: c4a9dba62be76b06b8677615ce0f6cb22e552fb3
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D21365416

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@dr-ci
Copy link

dr-ci bot commented May 2, 2020

💊 Build failures summary and remediations

As of commit 6839496 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | pattern match details | 🔁 rerun) <confirmed not flaky by 2 failures>

May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41
May 01 18:34:01   test_debug_info (__main__.DistAutogradTestWithSpawn) ... skip (0.008s) 
May 01 18:34:02   test_dist_autograd_profiling (__main__.DistAutogradTestWithSpawn) ... ok (1.215s) 
May 01 18:34:03   test_embedding_bag_with_no_grad_tensors (__main__.DistAutogradTestWithSpawn) ... [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
May 01 18:34:03 ok (1.253s) 
May 01 18:34:04   test_error_in_context (__main__.DistAutogradTestWithSpawn) ... [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at ../aten/src/TH/generic/THTensorMath.cpp:41 
May 01 18:34:04 ok (1.039s) 
May 01 18:34:05   test_grad_copy_sparse_indices_extra_ref (__main__.DistAutogradTestWithSpawn) ... /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
May 01 18:34:05   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
May 01 18:34:05 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 1 time.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in c0ff085.

@malfet malfet deleted the export-D21365416 branch May 4, 2020 22:51
ShawnZhong pushed a commit to ShawnZhong/pytorch that referenced this pull request May 5, 2020
…37704)

Summary:
Pull Request resolved: pytorch#37704

If input tensor can not be chunked, run `parallel_apply` on fewer devices
Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices

Test Plan: Run `test/cpp/api/parallel` on machine  with 6 GPUs

Differential Revision: D21365416

fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8
bharatr21 pushed a commit to bharatr21/pytorch that referenced this pull request May 5, 2020
…37704)

Summary:
Pull Request resolved: pytorch#37704

If input tensor can not be chunked, run `parallel_apply` on fewer devices
Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices

Test Plan: Run `test/cpp/api/parallel` on machine  with 6 GPUs

Differential Revision: D21365416

fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants