Fix sign warnings in CUDA kernels #66753

r-barnes · 2021-10-16T04:05:47Z

Summary:
Fixes:

stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   49 |   AT_DISPATCH_ALL_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/b60050e96a44e6d068bbfa0c3f61000eaf460404)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() {
      |                                                                      ~~^~~
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                                 ^

I thought I'd fixed this previously using std::is_unsigned, but apparently not.

Differential Revision: D31708173

pytorch-probot · 2021-10-16T04:05:50Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/r-barnes/pytorch/blob/4cccfbe318f82567147cffe424febce10b71b3e0/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
docker-builds	`ciflow/all`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis	`ciflow/all`, `ciflow/linux`, `ciflow/mobile`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-10-16T04:05:53Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/66753
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 4cccfbe (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.4xlarge.nvidia.gpu) (1/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-10-27T21:16:51.3377399Z what(): NCCL er...pNCCL.cpp:1078, invalid usage, NCCL version 21.0.3

2021-10-27T20:46:51.1404494Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testReduce (64 ms)
2021-10-27T20:46:51.1406275Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testAllgather
2021-10-27T20:46:51.2066310Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testAllgather (66 ms)
2021-10-27T20:46:51.2067874Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testAllgatherBase
2021-10-27T20:46:51.2726109Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testAllgatherBase (65 ms)
2021-10-27T20:46:51.2727731Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testReduceScatter
2021-10-27T20:46:51.3368979Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testReduceScatter (64 ms)
2021-10-27T20:46:51.3370917Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testSequenceNumInit
2021-10-27T21:16:51.3374007Z terminate called after throwing an instance of 'c10::Error'
2021-10-27T21:16:51.3375483Z terminate called recursively
2021-10-27T21:16:51.3377399Z   what():  NCCL error in: /var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1078, invalid usage, NCCL version 21.0.3
2021-10-27T21:16:51.3379758Z ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
2021-10-27T21:16:51.3381314Z Exception raised from getNCCLComm at /var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1078 (most recent call first):
2021-10-27T21:16:51.3383529Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7fb19bb4aeab in /opt/conda/lib/python3.6/site-packages/torch/bin/libc10.so)
2021-10-27T21:16:51.3386394Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7fb19bb46afe in /opt/conda/lib/python3.6/site-packages/torch/bin/libc10.so)
2021-10-27T21:16:51.3389725Z frame #2: c10d::ProcessGroupNCCL::getNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<c10::Device, std::allocator<c10::Device> > const&, c10d::OpType, int, bool) + 0x18a8 (0x7fb19c3b6758 in /opt/conda/lib/python3.6/site-packages/torch/bin/libtorch_cuda_cpp.so)
2021-10-27T21:16:51.3391744Z frame #3: <unknown function> + 0x148c35 (0x7fb19c3b6c35 in /opt/conda/lib/python3.6/site-packages/torch/bin/libtorch_cuda_cpp.so)
2021-10-27T21:16:51.3392814Z frame #4: <unknown function> + 0xc9039 (0x7fb1a9283039 in /opt/conda/lib/libstdc++.so.6)
2021-10-27T21:16:51.3393947Z frame #5: <unknown function> + 0x76ba (0x7fb19e0c66ba in /lib/x86_64-linux-gnu/libpthread.so.0)
2021-10-27T21:16:51.3394994Z frame #6: clone + 0x6d (0x7fb18be3851d in /lib/x86_64-linux-gnu/libc.so.6)
2021-10-27T21:16:51.3395502Z

pytorch_linux_xenial_py3_6_gcc5_4_test (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Oct 27 06:12:08 test_add_done_callback_no_arg...arg() takes 0 positional arguments but 1 was given

Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/suite.py(122): run
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/suite.py(84): __call__
Oct 27 06:12:08   /opt/conda/lib/python3.6/site-packages/xmlrunner/runner.py(66): run
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/main.py(256): runTests
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/main.py(95): __init__
Oct 27 06:12:08   /opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py(608): run_tests
Oct 27 06:12:08   test_futures.py(329): <module>
Oct 27 06:12:08 
Oct 27 06:12:08 ok (0.002s)
Oct 27 06:12:08   test_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.002s)
Oct 27 06:12:08   test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given
Oct 27 06:12:08 ok (0.001s)
Oct 27 06:12:08   test_add_done_callback_simple (__main__.TestFuture) ... ok (0.001s)
Oct 27 06:12:08   test_chained_then (__main__.TestFuture) ... ok (0.003s)
Oct 27 06:12:08   test_collect_all (__main__.TestFuture) ... ok (0.102s)
Oct 27 06:12:08   test_done (__main__.TestFuture) ... ok (0.001s)
Oct 27 06:12:08   test_done_exception (__main__.TestFuture) ... ok (0.001s)
Oct 27 06:12:08   test_interleaving_then_and_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.002s)
Oct 27 06:12:08   test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: ValueError: Expected error
Oct 27 06:12:08 
Oct 27 06:12:08 At:

1 job timed out:

pytorch_linux_xenial_py3_6_gcc5_4_test

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2021-10-16T04:06:14Z