Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.5.1] fix asserts in cuda code (#39047) #39418

Merged
merged 1 commit into from
Jun 3, 2020

Conversation

gchanan
Copy link
Contributor

@gchanan gchanan commented Jun 2, 2020

Summary:
Gets rid of some in-kernel asserts where they can be replaced with static_asserts
Replaces bare in-kernel assert in one case with CUDA_KERNEL_ASSERT where necessary
replaces host code asserts with TORCH_INTERNAL_ASSERT
Another group of asserts is in fractional max pooling kernels which should be fixed regardless #39044, the problems there are not just asserts.
I've audited remaining cases of in-kernel asserts, and they are more like TORCH_INTERNAL_ASSERT, so they should not happen with invalid user data. I think it's ok to leave them as is.
Pull Request resolved: #39047

Differential Revision: D21750392

Pulled By: ngimel

fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719

Summary:
Gets rid of some in-kernel asserts where they can be replaced with static_asserts
Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary
replaces host code `assert`s with `TORCH_INTERNAL_ASSERT`
Another group of asserts is in fractional max pooling kernels which should be fixed regardless pytorch#39044, the problems there are not just asserts.
I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is.
Pull Request resolved: pytorch#39047

Differential Revision: D21750392

Pulled By: ngimel

fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719
@gchanan gchanan added this to the 1.5.1 milestone Jun 2, 2020
@dr-ci
Copy link

dr-ci bot commented Jun 2, 2020

💊 CI failures summary and remediations

As of commit ab76bba (more details on the Dr. CI page):



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_libtorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (1/1)

Step: "Set Up CI Environment After attach_workspace" (full log | diagnosis details | 🔁 rerun) ❄️

E: Failed to fetch https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64/./nvidia-container-runtime-hook_1.4.0-1_amd64.deb GnuTLS recv error (-54): Error in the pull function.
                                                                   92% [Working]               Get:26 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 intel-microcode amd64 3.20191115.1ubuntu0.16.04.2 [2,408 kB] 
92% [26 intel-microcode 31.2 kB/2,408 kB 1%] 94% [Working]                                                      21.6 MB/s 0s                                                                                 Get:27 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 amd64-microcode amd64 3.20191021.1+really3.20180524.1~ubuntu0.16.04.2 [30.8 kB] 
94% [27 amd64-microcode 30.8 kB/30.8 kB 100%]                      21.6 MB/s 0s 95% [Working]                                                      21.6 MB/s 0s                                                                                 Get:28 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-generic amd64 4.4.0.179.187 [2,528 B] 
95% [28 linux-image-generic 2,528 B/2,528 B 100%]                  21.6 MB/s 0s 95% [Working]                                                      21.6 MB/s 0s                                                                                 Get:29 http://archive.ubuntu.com/ubuntu xenial/universe amd64 moreutils amd64 0.57-1 [55.0 kB] 
95% [29 moreutils 13.9 kB/55.0 kB 25%]                             21.6 MB/s 0s 96% [Working]                                                      21.6 MB/s 0s                                                                                 Get:30 http://archive.ubuntu.com/ubuntu xenial/main amd64 tcl8.6 amd64 8.6.5+dfsg-2 [14.2 kB] 
96% [30 tcl8.6 14.2 kB/14.2 kB 100%]                               21.6 MB/s 0s 97% [Working]                                                      21.6 MB/s 0s                                                                                 Get:31 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 thermald amd64 1.5-2ubuntu4 [187 kB] 
97% [31 thermald 9,581 B/187 kB 5%]                                21.6 MB/s 0s 97% [Working]                                                      21.6 MB/s 0s                                                                                 Get:32 http://archive.ubuntu.com/ubuntu xenial/main amd64 tk8.6 amd64 8.6.5-1 [12.2 kB] 
97% [32 tk8.6 12.2 kB/12.2 kB 100%]                                21.6 MB/s 0s 98% [Working]                                                      21.6 MB/s 0s                                                                                 Get:33 http://archive.ubuntu.com/ubuntu xenial/main amd64 xterm amd64 322-1ubuntu1 [607 kB] 
98% [33 xterm 12.3 kB/607 kB 2%]                                   21.6 MB/s 0s 99% [Working]                                                      21.6 MB/s 0s                                                                                 Fetched 131 MB in 6s (19.3 MB/s) 
W: --force-yes is deprecated, use one of the options starting with --allow instead. 
E: Failed to fetch https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64/./nvidia-container-runtime-hook_1.4.0-1_amd64.deb  GnuTLS recv error (-54): Error in the pull function. 
 
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@gchanan gchanan merged commit 5d01f87 into pytorch:release/1.5 Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants