RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output. #57

fede-vaccaro · 2020-12-11T14:47:56Z

Hi there, i'm currently training an Artifact Removal/Super Resolution model, a multilayer ESPCN, but i'm having this issue after few iterations of training:

This is the code how i instantiate the loss:

lpips = lpips.LPIPS(net='vgg')

This is the code about the model:

class ESPCNResBlock(nn.Module):
    def __init__(self, nf=64):
        super(ESPCNResBlock, self).__init__()
        self.conv1 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)
        self.conv2 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)

    def forward(self, input):
        x = self.conv1(input)
        x = F.hardtanh(x, min_val=-1, max_val=1.0)
        x = self.conv2(x)
        x = F.hardtanh(x, min_val=-1, max_val=1.0)
        return x + input

class ESPCN(nn.Module):
    def __init__(self, scale_factor=2, n_blocks=4, nf=64, in_channels=3, out_channels=3):
        super(ESPCN, self).__init__()
        self.scale_factor = scale_factor
        layers = [nn.Conv2d(in_channels, nf, kernel_size=5, padding=5 // 2),
                  nn.Hardtanh()]
        for _ in range(n_blocks//2):
            layers += [ESPCNResBlock(),
                       ]

        layers += [
            nn.Conv2d(nf, 32, kernel_size=3, padding=3 // 2),
            nn.Hardtanh(),
        ]
        self.first_part = nn.Sequential(*layers)
        self.last_part = nn.Sequential(
            nn.Conv2d(32, out_channels * (scale_factor ** 2), kernel_size=3, padding=3 // 2),
            nn.PixelShuffle(scale_factor) if scale_factor > 1 else nn.Identity(),
            nn.Tanh()
        )

        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m.in_channels == 32:
                    nn.init.normal_(m.weight.data, mean=0.0, std=0.001)
                    nn.init.zeros_(m.bias.data)
                else:
                    nn.init.normal_(m.weight.data, mean=0.0,
                                    std=math.sqrt(2 / (m.out_channels * m.weight.data[0][0].numel())))
                    nn.init.zeros_(m.bias.data)

    def forward(self, input):
        x = self.first_part(input)
        x = self.last_part(x)

        x = x + F.interpolate(input,
                              scale_factor=self.scale_factor,
                              mode='bilinear')

        x = torch.clamp(x, min=-1, max=1)
        return x

I've localized the error in the normalize function, however i'm still looking for a fix.
The model is trained with Adam on batch of 64x64 images.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output. #57

RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output. #57

fede-vaccaro commented Dec 11, 2020 •

edited

RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output. #57

RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output. #57

Comments

fede-vaccaro commented Dec 11, 2020 • edited

fede-vaccaro commented Dec 11, 2020 •

edited