New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
replication_pad1d
raising "CUDA error: invalid configuration argument" on large inputs
#49601
Comments
Hi @jcaw Thanks for the report and the detailed description. Looks like the fix should be applied in PyTorch, I will move the issue to PyTorch. |
compute_deltas
raising "CUDA error: invalid configuration argument" on large inputsreplication_pad1d
raising "CUDA error: invalid configuration argument" on large inputs
Can you please provide a reproducer with replication_pad1d instead of |
Sure. It seems that This will trigger it on my 970: >>> torch._C._nn.replication_pad1d(torch.rand([100000, 1000], device="cuda"), 3)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-21-9ed6fa8428fe> in <module>
----> 1 torch._C._nn.replication_pad1d(torch.rand([100000, 1000], device="cuda"), 3)
RuntimeError: CUDA error: invalid configuration argument Pushing it further will eventually cause OOM: >>> torch._C._nn.replication_pad1d(torch.rand([500000, 1000], device="cuda"), 3)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-6-6c910ea7d7c8> in <module>
----> 1 torch._C._nn.replication_pad1d(torch.rand([500000, 1000], device="cuda"), 3)
RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 3.95 GiB total capacity; 381.70 MiB already allocated; 1.17 GiB free; 1.86 GiB reserved in total by PyTorch) If I balance the dimensions, there's no issue: >>> torch._C._nn.replication_pad1d(torch.rand([10000, 10000], device="cuda"), 3)
tensor([[0.8374, 0.8374, 0.8374, ..., 0.9226, 0.9226, 0.9226],
[0.8571, 0.8571, 0.8571, ..., 0.2462, 0.2462, 0.2462],
[0.0252, 0.0252, 0.0252, ..., 0.7778, 0.7778, 0.7778],
...,
[0.0578, 0.0578, 0.0578, ..., 0.3262, 0.3262, 0.3262],
[0.4410, 0.4410, 0.4410, ..., 0.1778, 0.1778, 0.1778],
[0.1558, 0.1558, 0.1558, ..., 0.4674, 0.4674, 0.4674]],
device='cuda:0') |
Ok, a bit more digging. This triggers the error: >>> torch._C._nn.replication_pad1d(torch.rand([65536, 1], device="cuda"), 3)
...
RuntimeError: CUDA error: invalid configuration argument But 65535 is fine: >>> torch._C._nn.replication_pad1d(torch.rand([65535, 1], device="cuda"), 3)
tensor([[0.9838, 0.9838, 0.9838, ..., 0.9838, 0.9838, 0.9838],
...,
[0.8009, 0.8009, 0.8009, ..., 0.8009, 0.8009, 0.8009]],
device='cuda:0') Is something exceeding a 16 bit limit? Interestingly, if I change the order of the dimensions things also work fine, with much higher numbers: >>> torch._C._nn.replication_pad1d(torch.rand([1, 10000000], device="cuda"), 3)
tensor([[0.9373, 0.9373, 0.9373, ..., 0.4218, 0.4218, 0.4218]],
device='cuda:0') |
Thanks for a reproduction! @xwang233 can you please look into this? |
I've reproduced the same bug on a Tesla T4 on Google Colab, which might be easier to access. |
@xwang233 having the same issue in reflection padding. Has it been fixed? |
@xwang233 Thanks! Waiting for the final merge. |
馃悰 Bug
torchaudio.functional.compute_deltas
is raising aCUDA error: invalid configuration argument
when the batch size is too large. (Edit: the underlying issue coming fromreplication_pad1d
)To Reproduce
Steps to reproduce the behavior:
compute_deltas
with a large enough spectrogram and batch size triggers this error, e.g.compute_deltas(torch.rand([64, 2, 1000, 1000], device="cuda"))
.Googling other triggers for this error, it seems this is usually caused by exceeding the maximum CUDA block size (e.g. here). Since this varies by GPU, I'm not sure whether it will reproduce on newer cards.
Examples:
This appears to be distinct from an outright out-of-memory error. If I push the parameters further:
Reducing the size of the input tensor solves the issue:
Expected behavior
compute_deltas
should return successfully (or explicitly produce an out-of-memory error, if that's the real issue).Environment
Additional context
It appears to be CUDA-specific. I can't trigger this error on CPU.
cc @ngimel
The text was updated successfully, but these errors were encountered: