Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradients blowing up when F.dropout is used with Conv1D #18169

Closed
shenkev opened this issue Mar 19, 2019 · 2 comments
Closed

Gradients blowing up when F.dropout is used with Conv1D #18169

shenkev opened this issue Mar 19, 2019 · 2 comments
Labels
awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it module: nn Related to torch.nn

Comments

@shenkev
Copy link

shenkev commented Mar 19, 2019

馃悰 Bug

The gradients of the Conv1D layer are blowing up when dropout is used after the convolution. I noticed this after migrating my code. The problem occurs on Pytorch 1.0.0 and 1.0.1.

The problem does not occur if:

  1. dropout is taken out
  2. in Pytorch 0.3.1

To Reproduce

Steps to reproduce the behavior:

  1. initialize myModel with dropout set to True for ConvBlock modules
  2. do .backward() pass
  3. print out grad norms

Conv class:

class ConvBlock(nn.Module):
    def __init__(self, ic, oc, ks, dropout=True):
        super(ConvBlock, self).__init__()

        self.dropout = dropout
        self.conv = torch.nn.Conv1d(in_channels=ic, out_channels=oc, kernel_size=ks)
        self.nl = nn.PReLU()

    def forward(self, input):

        output = self.conv(input)
        output = self.nl(output)
        if self.dropout:
            output = F.dropout(output)
        return output

Model class:

class myModel ...

        layers = []
        layers.append(ConvBlock(ic=4, oc=self.dm_cnn_size / 4, ks=3, dropout=True))
        layers.append(ConvBlock(ic=self.dm_cnn_size / 4, oc=self.dm_cnn_size / 2, ks=3, dropout=True))
        layers.append(ConvBlock(ic=self.dm_cnn_size / 2, oc=self.dm_cnn_size, ks=2, dropout=True))
        self.CNN = nn.Sequential(*layers)

Printing out grad norms:

for tag, value in model.named_parameters():
    if value.grad is not None :
        print("{} : {} : {}".format(tag, torch.norm(value.grad), torch.std(value.grad)))

Expected behavior

With dropout=False for the ConvBlocks, the gradients for CNN.x look okay.

CNN.0.conv.weight : **0.0824664384127** : 0.00210229074582
CNN.0.conv.bias : 0.00828891899437 : 0.000735152629204
CNN.0.nl.weight : 0.258651942015
CNN.1.conv.weight : **0.16995434463** : 0.000541777233593
CNN.1.conv.bias : 0.00491173239425 : 0.000307150039589
CNN.1.nl.weight : 0.258507758379
CNN.2.conv.weight : **0.0553637072444** : 0.000107925392513
CNN.2.conv.bias : 0.00109431473538 : 4.76980530948e-05
CNN.2.nl.weight : 0.261523544788
caption_embedding.weight : 0.00663549778983 : 2.7642997793e-06
pos_embedding.weight : 0.00593235390261 : 0.00018768433074
caption_encoder.weight_ih_l0 : 0.000243749353103 : 3.88709878507e-07
caption_encoder.weight_hh_l0 : 0.00148745242041 : 3.34026594828e-06
caption_encoder.bias_ih_l0 : 0.00531084183604 : 0.00018980918685
caption_encoder.bias_hh_l0 : 0.00269527523778 : 9.62812337093e-05
caption_encoder.weight_ih_l0_reverse : 0.0127974180505 : 2.04083044082e-05
caption_encoder.weight_hh_l0_reverse : 0.00572102330625 : 1.29021454995e-05
caption_encoder.bias_ih_l0_reverse : 0.0109464693815 : 0.00039494028897
caption_encoder.bias_hh_l0_reverse : 0.00574303651229 : 0.000207231758395
pBlock.fc.weight : 0.101412273943 : 0.000896347570233
pBlock.fc.bias : 0.00601348327473 : 0.000363323837519
pBlock.nl.weight : 0.251136034727
iBlock.fc.weight : 0.325891286135 : 0.000318135833368
iBlock.fc.bias : 0.0104938279837 : 0.000456605921499
iBlock.nl.weight : 0.255489110947
fcO.weight : 0.195159777999 : 0.0060681742616
fcO.bias : 0.000131452688947
hBlock.fc.weight : 0.325331568718 : 0.00089861190645
hBlock.fc.bias : 0.0133226588368 : 0.000830166041851
hBlock.nl.weight : 0.253295183182
cBlock.fc.weight : 0.294092535973 : 0.000331619638018
cBlock.fc.bias : 0.00828121881932 : 0.000365915911971
cBlock.nl.weight : 0.241702094674

With dropout=True for the ConvBlocks, the gradients for CNN.x look a lot bigger. (Note, I tried setting dropout=True for the "fcX" (fully connected) layers but gradients did not blow up in the same way).

CNN.0.conv.weight : **0.191433534026** : 0.00488360971212
CNN.0.conv.bias : 0.0371312983334 : 0.00328596774489
CNN.0.nl.weight : 0.00771592929959
CNN.1.conv.weight : **0.687223911285** : 0.00219102622941
CNN.1.conv.bias : 0.0338519923389 : 0.00211000745185
CNN.1.nl.weight : 0.00506342295557
CNN.2.conv.weight : **0.35084721446** : 0.000685138453264
CNN.2.conv.bias : 0.0203177817166 : 0.000895368750207
CNN.2.nl.weight : 0.00926813390106
caption_embedding.weight : 0.00728694908321 : 3.03568822346e-06
pos_embedding.weight : 0.00523405428976 : 0.000165584060596
caption_encoder.weight_ih_l0 : 0.000671249814332 : 1.07044900233e-06
caption_encoder.weight_hh_l0 : 0.00380119471811 : 8.54444351717e-06
caption_encoder.bias_ih_l0 : 0.013382458128 : 0.000479589944007
caption_encoder.bias_hh_l0 : 0.00679147057235 : 0.000243298694841
caption_encoder.weight_ih_l0_reverse : 0.0190020110458 : 3.03029191855e-05
caption_encoder.weight_hh_l0_reverse : 0.00748729193583 : 1.68848800968e-05
caption_encoder.bias_ih_l0_reverse : 0.0137226246297 : 0.000495043175761
caption_encoder.bias_hh_l0_reverse : 0.00733440229669 : 0.000264534697635
pBlock.fc.weight : 0.14434145391 : 0.00127569248434
pBlock.fc.bias : 0.0155886700377 : 0.000962826830801
pBlock.nl.weight : 0.0125334719196
iBlock.fc.weight : 0.559426605701 : 0.000546285416931
iBlock.fc.bias : 0.0302314385772 : 0.00133550027385
iBlock.nl.weight : 0.0120806191117
fcO.weight : 0.450666457415 : 0.0140901934355
fcO.bias : 9.31322574615e-10
hBlock.fc.weight : 0.440797358751 : 0.00121754407883
hBlock.fc.bias : 0.0252526719123 : 0.00158137583639
hBlock.nl.weight : 0.00768384197727
cBlock.fc.weight : 0.594391405582 : 0.000670248351526
cBlock.fc.bias : 0.0232854578644 : 0.00102955580223
cBlock.nl.weight : 0.0208837240934

Environment

PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 2.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: TITAN Xp
GPU 1: Quadro P400

Nvidia driver version: 384.130
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.15.4
[pip] torch==1.0.1.post2
[pip] torchvision==0.2.2
[conda] blas 1.0 mkl
[conda] mkl 2019.1 144
[conda] mkl_fft 1.0.6 py27hd81dba3_0
[conda] mkl_random 1.0.2 py27hd81dba3_0
[conda] pytorch 1.0.1 py2.7_cuda9.0.176_cudnn7.4.2_2 pytorch
[conda] torchvision 0.2.2 py_3 pytorch

Additional context

@vishwakftw vishwakftw added the module: nn Related to torch.nn label Mar 19, 2019
@fmassa
Copy link
Member

fmassa commented Mar 25, 2019

Can you provide a full, self-contained and small example that reproduces the behavior?
With dummy data, or by using a single small file for data?

@fmassa fmassa added the awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it label Mar 25, 2019
@soumith
Copy link
Member

soumith commented Mar 26, 2019

@shenkev I read through your example. The gradients dont look a lot bigger, they look about 2x to 3x bigger, and even that not consistently. You must just be dealing with gradients that happen to be bigger by chance, nothing more.

If you think this is actually a bug, describe why it's a bug -- anecdotally it looks fine.

@soumith soumith closed this as completed Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it module: nn Related to torch.nn
Projects
None yet
Development

No branches or pull requests

4 participants