Skip to content

Conversation

r-barnes
Copy link
Contributor

Summary:
Fixes:

stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   49 |   AT_DISPATCH_ALL_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/b60050e96a44e6d068bbfa0c3f61000eaf460404)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() {
      |                                                                      ~~^~~
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                                 ^

I thought I'd fixed this previously using std::is_unsigned, but apparently not.

Differential Revision: D31708173

@pytorch-probot
Copy link

pytorch-probot bot commented Oct 16, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/r-barnes/pytorch/blob/4cccfbe318f82567147cffe424febce10b71b3e0/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 16, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 4cccfbe (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.4xlarge.nvidia.gpu) (1/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-10-27T21:16:51.3377399Z what(): NCCL er...pNCCL.cpp:1078, invalid usage, NCCL version 21.0.3
2021-10-27T20:46:51.1404494Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testReduce (64 ms)
2021-10-27T20:46:51.1406275Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testAllgather
2021-10-27T20:46:51.2066310Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testAllgather (66 ms)
2021-10-27T20:46:51.2067874Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testAllgatherBase
2021-10-27T20:46:51.2726109Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testAllgatherBase (65 ms)
2021-10-27T20:46:51.2727731Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testReduceScatter
2021-10-27T20:46:51.3368979Z �[0;32m[       OK ] �[mProcessGroupNCCLTest.testReduceScatter (64 ms)
2021-10-27T20:46:51.3370917Z �[0;32m[ RUN      ] �[mProcessGroupNCCLTest.testSequenceNumInit
2021-10-27T21:16:51.3374007Z terminate called after throwing an instance of 'c10::Error'
2021-10-27T21:16:51.3375483Z terminate called recursively
2021-10-27T21:16:51.3377399Z   what():  NCCL error in: /var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1078, invalid usage, NCCL version 21.0.3
2021-10-27T21:16:51.3379758Z ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
2021-10-27T21:16:51.3381314Z Exception raised from getNCCLComm at /var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1078 (most recent call first):
2021-10-27T21:16:51.3383529Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7fb19bb4aeab in /opt/conda/lib/python3.6/site-packages/torch/bin/libc10.so)
2021-10-27T21:16:51.3386394Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7fb19bb46afe in /opt/conda/lib/python3.6/site-packages/torch/bin/libc10.so)
2021-10-27T21:16:51.3389725Z frame #2: c10d::ProcessGroupNCCL::getNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<c10::Device, std::allocator<c10::Device> > const&, c10d::OpType, int, bool) + 0x18a8 (0x7fb19c3b6758 in /opt/conda/lib/python3.6/site-packages/torch/bin/libtorch_cuda_cpp.so)
2021-10-27T21:16:51.3391744Z frame #3: <unknown function> + 0x148c35 (0x7fb19c3b6c35 in /opt/conda/lib/python3.6/site-packages/torch/bin/libtorch_cuda_cpp.so)
2021-10-27T21:16:51.3392814Z frame #4: <unknown function> + 0xc9039 (0x7fb1a9283039 in /opt/conda/lib/libstdc++.so.6)
2021-10-27T21:16:51.3393947Z frame #5: <unknown function> + 0x76ba (0x7fb19e0c66ba in /lib/x86_64-linux-gnu/libpthread.so.0)
2021-10-27T21:16:51.3394994Z frame #6: clone + 0x6d (0x7fb18be3851d in /lib/x86_64-linux-gnu/libc.so.6)
2021-10-27T21:16:51.3395502Z 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Oct 27 06:12:08 test_add_done_callback_no_arg...arg() takes 0 positional arguments but 1 was given
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/suite.py(122): run
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/suite.py(84): __call__
Oct 27 06:12:08   /opt/conda/lib/python3.6/site-packages/xmlrunner/runner.py(66): run
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/main.py(256): runTests
Oct 27 06:12:08   /opt/conda/lib/python3.6/unittest/main.py(95): __init__
Oct 27 06:12:08   /opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py(608): run_tests
Oct 27 06:12:08   test_futures.py(329): <module>
Oct 27 06:12:08 
Oct 27 06:12:08 ok (0.002s)
Oct 27 06:12:08   test_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.002s)
Oct 27 06:12:08   test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given
Oct 27 06:12:08 ok (0.001s)
Oct 27 06:12:08   test_add_done_callback_simple (__main__.TestFuture) ... ok (0.001s)
Oct 27 06:12:08   test_chained_then (__main__.TestFuture) ... ok (0.003s)
Oct 27 06:12:08   test_collect_all (__main__.TestFuture) ... ok (0.102s)
Oct 27 06:12:08   test_done (__main__.TestFuture) ... ok (0.001s)
Oct 27 06:12:08   test_done_exception (__main__.TestFuture) ... ok (0.001s)
Oct 27 06:12:08   test_interleaving_then_and_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.002s)
Oct 27 06:12:08   test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: ValueError: Expected error
Oct 27 06:12:08 
Oct 27 06:12:08 At:

1 job timed out:

  • pytorch_linux_xenial_py3_6_gcc5_4_test

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

r-barnes added a commit to r-barnes/pytorch that referenced this pull request Oct 20, 2021
Summary:
Pull Request resolved: pytorch#66753

Fixes:
```
stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   49 |   AT_DISPATCH_ALL_TYPES_AND2 (pytorch@44fd312604fff5244e6d4fa1b3e440dd9b9e959f)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() {
      |                                                                      ~~^~~
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                                 ^
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
```
And also:
```
caffe2/c10/util/Half.h(461): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]"
caffe2/aten/src/ATen/native/Resize.h(45): here
caffe2/c10/util/Half.h(459): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]"
caffe2/aten/src/ATen/native/Resize.h(45): here
```
I thought I'd fixed this previously using `std::is_unsigned` in D25256251 (pytorch@cff1ff7), but apparently that was insufficient.

Differential Revision: D31708173

fbshipit-source-id: 6378e934d5571847d96f36623eb39aafb93eac63
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

Summary:
Pull Request resolved: pytorch#66753

Fixes these Wextra compilation errors:
```
stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   49 |   AT_DISPATCH_ALL_TYPES_AND2 (pytorch@44fd312604fff5244e6d4fa1b3e440dd9b9e959f)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() {
      |                                                                      ~~^~~
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                                 ^
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
```
And also these warnings:
```
caffe2/c10/util/Half.h(461): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]"
caffe2/aten/src/ATen/native/Resize.h(45): here
caffe2/c10/util/Half.h(459): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]"
caffe2/aten/src/ATen/native/Resize.h(45): here
```
I thought I'd fixed this previously using `std::is_unsigned` in D25256251 (pytorch@cff1ff7), but apparently that was insufficient.

Test Plan: Sandcastle

Reviewed By: malfet, ngimel

Differential Revision: D31708173

fbshipit-source-id: 220f0a1979c7b7c617b8b4eab97f690d3f518776
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31708173

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 9900310.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants