RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input #32564

imransai · 2020-01-23T23:33:03Z

Seems like most of the issues regarding this subject matter is closed. So I am opening this issue again.

Issue description

I am facing this issue when using loss.backward() on my loss function.
Some of the loss function classes I am using are the following.

class CELoss_auxilary(nn.Module):

def __init__(self, auxloss = True):
    
    super(CELoss_auxilary, self).__init__()
    self.logsoftmax = nn.LogSoftmax(dim = 1)                
    self.crossentropyloss = nn.KLDivLoss(reduction = 'batchmean')

def forward(self, pred_auxdc, gt_target, nspatialscales = 16, device = None, params = None):        
    
    if device is None:
        device = torch.device("cpu")        
    nChannels = params['dce_nChannels'] + 2
    gt_resizeddc_target = depth2dc(gt_target, method = 'gaussian3hot', device = device)
    gt_resizeddc_target = gt_resizeddc_target.detach().permute(0, 2, 3, 1).view(-1, nChannels)
    valid_auxmask = (gt_target > 0).detach()
    valid_auxmask = valid_auxmask.view(-1)
    """
    gt_resizeddc_target = gt_resizeddc_target.view(1, nChannels, -1)
    valid_auxmask = (gt_target > 0).detach()
    valid_auxmask = valid_auxmask.view(-1)
    gt_resizeddc_targetsel = gt_resizeddc_target[..., valid_auxmask]
    pred_auxdc = pred_auxdc.view(1, nChannels, -1)
    pred_auxdcsel = pred_auxdc[..., valid_auxmask]
    """
    gt_resizeddc_targetsel = gt_resizeddc_target[valid_auxmask, ...]
    biased_aux = torch.sum(gt_resizeddc_targetsel*torch.log(gt_resizeddc_targetsel + 1e-7), 1)
    
    pred_auxdc = pred_auxdc.permute(0, 2, 3, 1).view(-1, nChannels)
    #valid_auxmask = (torch.sum(gt_resizeddc_target, 1) > 0.0).detach()
    gt_resizeddc_targetsel = gt_resizeddc_target[valid_auxmask, ...]  
    biased_aux = torch.sum(gt_resizeddc_targetsel*torch.log(gt_resizeddc_targetsel + 1e-7), 1)
    pred_auxdcsel = pred_auxdc[valid_auxmask, ...]                
    self.lossaux = self.crossentropyloss(self.logsoftmax(pred_auxdcsel), gt_resizeddc_targetsel)
    #self.lossaux = torch.mean(-torch.sum(gt_resizeddc_targetsel*self.logsoftmax(pred_auxdcsel), 1) + biased_aux)

    return self.lossaux

class MaskedMSELoss(nn.Module):
def init(self):
super(MaskedMSELoss, self).init()

def forward(self, pred, target):
    assert pred.dim() == target.dim(), "inconsistent dimensions"
    valid_mask = (target > 0).detach()
    diff = target - pred
    diff = diff[valid_mask]
    self.loss = (diff**2).mean()        
    return self.loss

I also checked with other loss functions.

System Info

Pytorch 1.4 cuda-toolkit 10.1 ubuntu 16.0.4.

I am facing this problem only during backward computation in training. My evaluation code runs fine.

Interestingly my code runs fine with this combination:
pytorch1.3 cuda-toolkit 10.0
pytorch1.1 cuda-toolkit 9.0

But I need to use the aforementioned combination pytorch 1.4 cuda-toolkit 10.1 for accessing some sparse convolution tools which are only available in CUDA 10.1>= higher. Can anyone help in this regard?

The text was updated successfully, but these errors were encountered:

peterjc123 · 2020-01-24T08:19:20Z

Maybe related to #32395.

imransai · 2020-01-24T14:34:51Z

For now, pytorch 1.4 nightly seems to have solved this problem! Thanks!

BoltzmannBrain · 2020-05-25T23:16:32Z

Previous suggestions in this thread did not resolve my problem; currently on pytorch-nightly (1.6.0.dev20200525) w/ cuda 10.1.243. Oddly reducing my batch size from 64 to 36 worked 🤷‍♂️

Hadrien-Cornier · 2020-05-28T02:48:37Z

It happens to me as well with
pytorch=1.5.0
py3.8
cuda10.1.243
cudnn7.6.3_0

It happens when I reach a batch normalization layer with a huge batch size, but when I decrease the batch size the error is gone. It is probably a memory issue that happens when a batch is too big.

DuckJ · 2020-06-22T02:12:53Z

It happens to me as well with
pytorch=1.2.0
py3
cuda 10.0
It happens when the network forwards the conv layer. I use 6 V100 and horovod to distributed training， batchsize is not huge (128) .
It's strange that this error didn't happen before I expanded the dataset. After I expanded the dataset, the batchsize didn't change, but this error happened

TangDL · 2020-07-22T04:11:58Z

maybe input size too large

ShoufaChen · 2020-08-10T15:32:20Z

Just reduce the batch size and try again. It works for me.

yarkable · 2020-09-10T14:27:11Z

Thanks, reducing the batch_size is OK. But it seems a bug?

leopd · 2020-10-07T03:03:06Z

I'm getting this with pytorch 1.6.0. Definitely seems like a bug. Reducing batch size is not a good workaround. Getting the right batch size is critical to certain algorithms & loss functions, such as when doing negative sampling for contrastive learning.

Perhaps related, sometimes my code hits this error instead THCudaTensor sizes too large for THCDeviceTensor conversion which is logged here #24401

CoderHHX · 2021-05-13T07:00:28Z

It also happens to me
pytorch=1.4.0
py3
cuda 10.1.
Reduce the batch size was not working.
Finally, I change pytorch to 1.7 cuda 11.0, it works. lol~~~

absorbguo · 2021-06-30T07:41:07Z

reduce batch works for me

nadir121 · 2021-07-09T02:15:31Z

Why is this closed? Reducing the batch size does not solve it for me. It is still a bug.

tdchua · 2021-11-19T01:42:51Z

I was experiencing this problem as well from pytorch=1.4.0... upgrading to pytorch=1.5 solved the issue 😄

rongduo · 2021-12-01T14:05:10Z

Just reduce the batch size and try again. It works for me.

Thanks for your suggestion. It resolves my issue.

ooodragon94 · 2021-12-09T03:51:56Z

Why is this closed? Reducing the batch size does not solve it for me. It is still a bug.

totally agreed. We ALWAYS need more batches

Sanqiang · 2022-04-21T20:25:46Z

It looks like OOM causes this problem too

meisa233 · 2023-06-30T06:15:02Z

add this line after import torch

torch.backends.cudnn.enabled = False

Sil3ntKn1ght · 2024-06-13T07:41:56Z

any solution for Python 3.10 +
so slow without it.

imransai mentioned this issue Jan 24, 2020

pytorch 1.4 bug issue NVIDIA/MinkowskiEngine#79

Closed

albanD closed this as completed Jan 24, 2020

ds2268 mentioned this issue Mar 16, 2020

Error when training in StyleGAN mode using a database of npy files facebookresearch/pytorch_GAN_zoo#102

Open

Equationliu mentioned this issue May 16, 2020

RuntimeError: cuDNN error Equationliu/Attack-ImageNet#2

Closed

lkeab mentioned this issue May 31, 2021

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input. lkeab/BCNet#25

Closed

shivammehta25 mentioned this issue May 4, 2024

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor shivammehta25/Matcha-TTS#72

Closed

mashb1t mentioned this issue May 5, 2024

[Bug]: Cudnn error while generation lllyasviel/Fooocus#2871

Closed

5 tasks

sdatkinson mentioned this issue Aug 17, 2024

[BUG] cuDNN error when training parametric LSTM model sdatkinson/neural-amp-modeler#450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input #32564

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input #32564

imransai commented Jan 23, 2020

peterjc123 commented Jan 24, 2020

imransai commented Jan 24, 2020

BoltzmannBrain commented May 25, 2020

Hadrien-Cornier commented May 28, 2020

DuckJ commented Jun 22, 2020

TangDL commented Jul 22, 2020

ShoufaChen commented Aug 10, 2020

yarkable commented Sep 10, 2020

leopd commented Oct 7, 2020 •

edited

Loading

CoderHHX commented May 13, 2021

absorbguo commented Jun 30, 2021

nadir121 commented Jul 9, 2021

tdchua commented Nov 19, 2021

rongduo commented Dec 1, 2021

ooodragon94 commented Dec 9, 2021

Sanqiang commented Apr 21, 2022

meisa233 commented Jun 30, 2023

Sil3ntKn1ght commented Jun 13, 2024

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input #32564

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input #32564

Comments

imransai commented Jan 23, 2020

Issue description

System Info

peterjc123 commented Jan 24, 2020

imransai commented Jan 24, 2020

BoltzmannBrain commented May 25, 2020

Hadrien-Cornier commented May 28, 2020

DuckJ commented Jun 22, 2020

TangDL commented Jul 22, 2020

ShoufaChen commented Aug 10, 2020

yarkable commented Sep 10, 2020

leopd commented Oct 7, 2020 • edited Loading

CoderHHX commented May 13, 2021

absorbguo commented Jun 30, 2021

nadir121 commented Jul 9, 2021

tdchua commented Nov 19, 2021

rongduo commented Dec 1, 2021

ooodragon94 commented Dec 9, 2021

Sanqiang commented Apr 21, 2022

meisa233 commented Jun 30, 2023

Sil3ntKn1ght commented Jun 13, 2024

leopd commented Oct 7, 2020 •

edited

Loading