Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle None gradients in nn.utils.clip_grad_norm #5650

Closed
monajalal opened this issue Mar 8, 2018 · 7 comments
Closed

Handle None gradients in nn.utils.clip_grad_norm #5650

monajalal opened this issue Mar 8, 2018 · 7 comments

Comments

@monajalal
Copy link

monajalal commented Mar 8, 2018

I get this error:

python train.py --batch-size 20 --rnn_type GRU --cuda --gpu 1 --lr 0.0001 --mdl RNN --clip_norm 1 --opt Adam
/scratch/sjn-p2/anaconda/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
There are 2 CUDA devices
Setting torch GPU to 1
Using device:1 
Stored Environment:['term_len', 'word_index', 'glove', 'max_len', 'train', 'dev', 'test', 'index_word']
Loaded environment
Creating Model...
Setting Pretrained Embeddings
Initialized GRU model
Starting training
Namespace(aggregation='mean', attention_width=5, batch_size=20, clip_norm=1, cuda=True, dataset='Restaurants', dev=1, dropout_prob=0.5, embedding_size=300, epochs=50, eval=1, gpu=1, hidden_layer_size=300, l2_reg=0.0, learn_rate=0.0001, log=1, maxlen=0, mode='term', model_type='RNN', opt='Adam', pretrained=1, rnn_direction='uni', rnn_layers=1, rnn_size=300, rnn_type='GRU', seed=1111, term_model='mean', toy=False, trainable=1)
/scratch2/debate_tweets/sentiment/pytorch_sentiment_rnn/models/rnn.py:51: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  decoded = self.softmax(decoded)
Traceback (most recent call last):
  File "train.py", line 343, in <module>
    exp.train()
  File "train.py", line 326, in train
    loss = self.train_batch(i)
  File "train.py", line 303, in train_batch
    coeff = clip_gradient(self.mdl, self.args.clip_norm)
  File "train.py", line 35, in clip_gradient
    modulenorm = p.grad.data.norm()
AttributeError: 'NoneType' object has no attribute 'data'
[jalal@goku pytorch_sentiment_rnn]$ 


for train.py file in https://github.com/vanzytay/pytorch_sentiment_rnn I have followed all the steps in the readme up to here. What do you think should be fixed?

When submitting a bug report, please include the following information (where relevant):

  • OS: CentOS Linux release 7.4.1708 (Core)
  • PyTorch version: 0.3.1.post2
  • How you installed PyTorch (conda, pip, source): conda install -c pytorch pytorch
  • Python version: Python 2.7.14 |Anaconda custom (64-bit)| (default, Dec 7 2017, 17:05:42)
  • CUDA/cuDNN version: CUDA Version 8.0.61
  • GPU models and configuration: GP102 [GeForce GTX 1080 Ti], driver=nvidia latency=0
  • GCC version (if compiling from source): [GCC 7.2.0] on linux2
@monajalal
Copy link
Author

@zou3519
Copy link
Contributor

zou3519 commented Mar 8, 2018

What this says is p.grad is None. It's possible that p (whatever it is) wasn't used in the gradient computation, or there was no backwards pass applied.

@monajalal
Copy link
Author

Well, I understand that however, this seems like a problem with PyTorch as the people who have used the repo following the commands provided didn't have this error. Possibly a recent problem with PyTorch actually.

@zou3519
Copy link
Contributor

zou3519 commented Mar 9, 2018

The last commit on the page is from Jan 24, 2017. Pytorch definitely has changed a lot since then. If you have specific questions about how to use pytorch, please ask on our forums: https://discuss.pytorch.org/

@soumith
Copy link
Member

soumith commented Mar 9, 2018

closed via @zou3519 's comment.

@soumith soumith closed this as completed Mar 9, 2018
@apaszke apaszke reopened this Mar 10, 2018
@apaszke
Copy link
Contributor

apaszke commented Mar 10, 2018

I think the error is still legitimate. We should handle None .grad attributes correctly in clip_grad_norm (by assuming the norm is 0). Right now we fail like in the posted stack trace.

@apaszke apaszke changed the title modulenorm = p.grad.data.norm() AttributeError: 'NoneType' object has no attribute 'data' Handle None gradients in nn.utils.clip_grad_norm Mar 10, 2018
@zou3519
Copy link
Contributor

zou3519 commented Mar 12, 2018

@apaszke I think we do handle None .grad attributes correctly in clip_grad_norm: https://github.com/pytorch/pytorch/blob/master/torch/nn/utils/clip_grad.py#L18.

The traceback @monajalal posted implies that the code uses its own clip_gradient method:

Traceback (most recent call last):
  File "train.py", line 343, in <module>
    exp.train()
  File "train.py", line 326, in train
    loss = self.train_batch(i)
  File "train.py", line 303, in train_batch
    coeff = clip_gradient(self.mdl, self.args.clip_norm)
  File "train.py", line 35, in clip_gradient
    modulenorm = p.grad.data.norm()

@monajalal If you replace clip_gradient with some usage of torch.nn.utils.clip_grad_norm this particular error should go away.

@soumith soumith closed this as completed Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants