New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN gradient when using BatchNormalization with sequential input #5377
Comments
I have found that the problem lies with Theano 0.9.0b1. In the Theano backend, Keras calls the following batch normalization functions in Theano, if they are available:
or the following functions, implemented in Keras (backend/theano_backend.py), if Theano does not provide the functions above:
Theano 0.9.0b1 provides these functions, but it seems that they produce NaN gradients unless the "axis" argument is 1. Switching back to Theano 0.8.2, which does not provide these functions, makes Keras use its own implementation, and solves the problem. |
The issue in Theano was closed. So I think this issue can also be closed. |
Yes. I resolved it by disabling fastmath. |
@MaigoAkisame I have exactly the same Nan problem when employing batch normalization layers with Tensorflow backend. Would you please open this issue again? |
@fengwang OK, I'm reopening it. |
I was training a neural network that works with sequential data. It consisted of convolutional and recurrent layers, as well as BatchNormalization layers in between. When I trained the model, I got NaN loss on the first minibatch. A closer inspection revealed that the gradient started becoming NaN for the gamma parameter of the topmost BatchNormalization layer.
I have stripped all the convolutional and recurrent layers off the model, and ended up with the following minimal code that reproduces the problem:
The output is:
The gradient for the gamma parameter is NaN.
By the way, if I do not use sequential data -- i.e. replace the input_shape of the BatchNormalization layer with (1,), and change the shape of x and y to (40, 1) -- the NaN gradient problem disappears.
My questions:
I am using Keras 1.2.1, Theano 0.9.0b1 (0.9.0beta1.dev-e5d51daf5ac03bfd8bd076075ee587311dff6f48), and CUDA 8.0.
The text was updated successfully, but these errors were encountered: