RuntimeError: CUDA error: unspecified launch failure at random places #39872
Labels
module: cuda
Related to torch.cuda, and CUDA support in general
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Bug
I am getting
RuntimeError: CUDA error: unspecified launch failure at random places
while theCUDA_LAUNCH_BLOCKING
flag is set to0
. However, if it was set to1
, everything is fine except a huge performance decrease.To Reproduce
Steps to reproduce the behavior:
There is no a specific way to reproduce the behavior. Sometimes, it happens at:
Sometimes at:
Sometimes, it happens from the first epochs and other times it takes 3-4 epochs...
Environment
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
PyTorch version: 1.6.0.dev20200610
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Microsoft Windows 10 Pro
GCC version: Could not collect
CMake version: version 3.17.3
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.6.0.dev20200610
[pip3] torchvision==0.6.0+cu101
[conda] Could not collect
Additional context
P.S. the driver version has not been shown by the python script you provide. Using NVIDIA control panel, it is 446.14.
I tired the stable version of PyTorch, I have the exact same behaviour. I tried eariler version of the GPU driver with no luck. I tried CUDA 10.1 instead of 10.2, nothing changed. The only thing that makes difference is
CUDA_LAUNCH_BLOCKING
.cc @ngimel
The text was updated successfully, but these errors were encountered: