Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running waveglow.infer, getting 'RuntimeError: CUDA error: invalid device function' #62

Closed
artificalnouveau opened this issue Oct 20, 2019 · 6 comments

Comments

@artificalnouveau
Copy link

@artificalnouveau artificalnouveau commented Oct 20, 2019

When running:
with torch.no_grad():
_, mel, _, _ = tacotron2.infer(sequence)
audio = waveglow.infer(mel)

I get the following error:

RuntimeError Traceback (most recent call last)

in ()
5 with torch.no_grad():
6 _, mel, _, _ = tacotron2.infer(sequence)
----> 7 audio = waveglow.infer(mel)
8 audio_numpy = audio[0].data.cpu().numpy()
9 rate = 22050

2 frames

/root/.cache/torch/hub/nvidia_DeepLearningExamples_torchhub/PyTorch/SpeechSynthesis/Tacotron2/waveglow/model.py in forward(self, z, reverse)
75 W_inverse = W_inverse.half()
76 self.W_inverse = W_inverse
---> 77 z = F.conv1d(z, self.W_inverse, bias=None, stride=1, padding=0)
78 return z
79 else:

RuntimeError: CUDA error: invalid device function

@Kh4L

This comment has been minimized.

Copy link

@Kh4L Kh4L commented Oct 29, 2019

The example itself works well on my machine and the error seem to be specific to the Google Colab env.

Actually, the error indicates that the PyTorch build is not compatible (more precisely the CUDA capability) with the GPU (which is a Tesla K80 according to torch.cuda.get_device_name(0))

This might come from the discranpancy between torch.version.cuda:

'10.0.130'

and the CUDA runtime on the system:

 NVIDIA-SMI 430.50       Driver Version: 418.67       CUDA Version: 10.1 
@soumith

This comment has been minimized.

Copy link
Member

@soumith soumith commented Oct 29, 2019

i've changed some things that might help. I'll wait for the nightlies tonight to verify. Let's see.

@soumith

This comment has been minimized.

Copy link
Member

@soumith soumith commented Oct 31, 2019

i think i might have nailed it down, let's see.
I suspect this was because of a magma upgrade, and the W.inverse() call is failing on missing sm37 architecture.

@soumith

This comment has been minimized.

Copy link
Member

@soumith soumith commented Nov 3, 2019

I am moving the technical details of issuing a fix to here: pytorch/pytorch#29096

None of my fixes worked as of yet, and I've been trying.

@soumith

This comment has been minimized.

Copy link
Member

@soumith soumith commented Nov 4, 2019

Found a fix, the quoted issue has context. Tomorrow's nightlies, and v1.3.1 will have a fix for this.

facebook-github-bot added a commit to pytorch/pytorch that referenced this issue Nov 8, 2019
Summary:
as part of pytorch/hub#62 I found that the stack-trace of a failed kernel launch was being recorded elsewhere, even with CUDA_LAUNCH_BLOCKING=1.

So, I started debugging, and found that magma launches don't do error checking.

I eventually found the issue to be that I didn't compile-in sm37 SASS into the magma binary and the failure was on `x.inverse()`, and that's somehow a problem for magma 2.5.1 (but not 2.5.0).
Pull Request resolved: #29003

Differential Revision: D18397358

Pulled By: soumith

fbshipit-source-id: 04baca68eac209d7af773daddd0193697d4ab0d9
zdevito pushed a commit to zdevito/ATen that referenced this issue Nov 8, 2019
Summary:
as part of pytorch/hub#62 I found that the stack-trace of a failed kernel launch was being recorded elsewhere, even with CUDA_LAUNCH_BLOCKING=1.

So, I started debugging, and found that magma launches don't do error checking.

I eventually found the issue to be that I didn't compile-in sm37 SASS into the magma binary and the failure was on `x.inverse()`, and that's somehow a problem for magma 2.5.1 (but not 2.5.0).
Pull Request resolved: pytorch/pytorch#29003

Differential Revision: D18397358

Pulled By: soumith

fbshipit-source-id: 04baca68eac209d7af773daddd0193697d4ab0d9
@soumith

This comment has been minimized.

Copy link
Member

@soumith soumith commented Nov 12, 2019

fixed now on Colab!

@soumith soumith closed this Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.