[v1.5.1] fix asserts in cuda code (#39047) #39418

gchanan · 2020-06-02T22:58:14Z

Summary:
Gets rid of some in-kernel asserts where they can be replaced with static_asserts
Replaces bare in-kernel assert in one case with CUDA_KERNEL_ASSERT where necessary
replaces host code asserts with TORCH_INTERNAL_ASSERT
Another group of asserts is in fractional max pooling kernels which should be fixed regardless #39044, the problems there are not just asserts.
I've audited remaining cases of in-kernel asserts, and they are more like TORCH_INTERNAL_ASSERT, so they should not happen with invalid user data. I think it's ok to leave them as is.
Pull Request resolved: #39047

Differential Revision: D21750392

Pulled By: ngimel

fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719

Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless pytorch#39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: pytorch#39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719

dr-ci · 2020-06-02T23:09:10Z

💊 CI failures summary and remediations

As of commit ab76bba (more details on the Dr. CI page):

1/2 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)
1/2 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_libtorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (1/1)

Step: "Set Up CI Environment After attach_workspace" (full log | diagnosis details | 🔁 rerun) ❄️

E: Failed to fetch https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64/./nvidia-container-runtime-hook_1.4.0-1_amd64.deb  GnuTLS recv error (-54): Error in the pull function.

                                                                   92% [Working]               Get:26 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 intel-microcode amd64 3.20191115.1ubuntu0.16.04.2 [2,408 kB] 
92% [26 intel-microcode 31.2 kB/2,408 kB 1%] 94% [Working]                                                      21.6 MB/s 0s                                                                                 Get:27 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 amd64-microcode amd64 3.20191021.1+really3.20180524.1~ubuntu0.16.04.2 [30.8 kB] 
94% [27 amd64-microcode 30.8 kB/30.8 kB 100%]                      21.6 MB/s 0s 95% [Working]                                                      21.6 MB/s 0s                                                                                 Get:28 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-generic amd64 4.4.0.179.187 [2,528 B] 
95% [28 linux-image-generic 2,528 B/2,528 B 100%]                  21.6 MB/s 0s 95% [Working]                                                      21.6 MB/s 0s                                                                                 Get:29 http://archive.ubuntu.com/ubuntu xenial/universe amd64 moreutils amd64 0.57-1 [55.0 kB] 
95% [29 moreutils 13.9 kB/55.0 kB 25%]                             21.6 MB/s 0s 96% [Working]                                                      21.6 MB/s 0s                                                                                 Get:30 http://archive.ubuntu.com/ubuntu xenial/main amd64 tcl8.6 amd64 8.6.5+dfsg-2 [14.2 kB] 
96% [30 tcl8.6 14.2 kB/14.2 kB 100%]                               21.6 MB/s 0s 97% [Working]                                                      21.6 MB/s 0s                                                                                 Get:31 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 thermald amd64 1.5-2ubuntu4 [187 kB] 
97% [31 thermald 9,581 B/187 kB 5%]                                21.6 MB/s 0s 97% [Working]                                                      21.6 MB/s 0s                                                                                 Get:32 http://archive.ubuntu.com/ubuntu xenial/main amd64 tk8.6 amd64 8.6.5-1 [12.2 kB] 
97% [32 tk8.6 12.2 kB/12.2 kB 100%]                                21.6 MB/s 0s 98% [Working]                                                      21.6 MB/s 0s                                                                                 Get:33 http://archive.ubuntu.com/ubuntu xenial/main amd64 xterm amd64 322-1ubuntu1 [607 kB] 
98% [33 xterm 12.3 kB/607 kB 2%]                                   21.6 MB/s 0s 99% [Working]                                                      21.6 MB/s 0s                                                                                 Fetched 131 MB in 6s (19.3 MB/s) 
W: --force-yes is deprecated, use one of the options starting with --allow instead. 
E: Failed to fetch https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64/./nvidia-container-runtime-hook_1.4.0-1_amd64.deb  GnuTLS recv error (-54): Error in the pull function. 
 
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

ci.pytorch.org: 1 failed

Failed: pr/caffe2-py3.6-devtoolset7-rocmrpm-centos7-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

gchanan mentioned this pull request Jun 2, 2020

[v1.5.1] Release Tracker #39104

Closed

gchanan added this to the 1.5.1 milestone Jun 2, 2020

gchanan merged commit 5d01f87 into pytorch:release/1.5 Jun 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.5.1] fix asserts in cuda code (#39047) #39418

[v1.5.1] fix asserts in cuda code (#39047) #39418

gchanan commented Jun 2, 2020

dr-ci bot commented Jun 2, 2020 •

edited

[v1.5.1] fix asserts in cuda code (#39047) #39418

[v1.5.1] fix asserts in cuda code (#39047) #39418

Conversation

gchanan commented Jun 2, 2020

dr-ci bot commented Jun 2, 2020 • edited

💊 CI failures summary and remediations

❄️ 1 failure tentatively classified as flaky

pytorch_libtorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (1/1)

ci.pytorch.org: 1 failed

dr-ci bot commented Jun 2, 2020 •

edited