Skip to content

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED  #32921

@carlodavid012

Description

@carlodavid012

Issue description

I am using google colab to train a Bidirectional RNN model and I get the error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-34-0029e71ae99b> in <module>()
     20             inputs, labels = inputs.to(device), labels.to(device)
     21 
---> 22             output = model(inputs)
     23             loss = criterion(output.squeeze(), labels.float())
     24             optimizer.zero_grad()

5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
    524         if batch_sizes is None:
    525             result = _VF.lstm(input, hx, self._get_flat_weights(), self.bias, self.num_layers,
--> 526                               self.dropout, self.training, self.bidirectional, self.batch_first)
    527         else:
    528             result = _VF.lstm(input, batch_sizes, hx, self._get_flat_weights(), self.bias,

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Here’s my BiRnn Model code:

class BiRNN(nn.Module):
    def __init__(self, n_vocab, n_embed, hidden_size, seq_len, num_layers, output_size, drop_prob):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.seq_len = seq_len
        self.num_layers = num_layers
        
        self.embedding = nn.Embedding(n_vocab, n_embed)
        self.lstm = nn.LSTM(n_embed, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.dropout = nn.Dropout(drop_prob)
        self.fc = nn.Linear(hidden_size*2, output_size)

    def forward(self, x):
       
         # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)  
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        x = self.embedding(x).to(device)
        # Forward propagate LSTM
        lstm_out, _ = self.lstm(x, (h0, c0))  
        lstm_out = lstm_out.contiguous().view(-1, self.seq_len, 2, self.hidden_size)
        # get backward output in first node
        lstm_out_bw = lstm_out[:, 0, 1, :]
        # get forward output in last node
        lstm_out_fw = lstm_out[:, -1, 0, :]
        lstm_out = torch.cat((lstm_out_fw, lstm_out_bw), -1)
        drop_out = self.dropout(lstm_out)
        logits = self.fc(drop_out)

        return logits

I tried this solution 1 and this 2 and still get the error.

cc @csarofeen @ptrblck

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudnnRelated to torch.backends.cudnn, and CuDNN supporttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions