Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. #12521

Closed
Utsav-Patel opened this issue Mar 20, 2019 · 10 comments

Comments

@Utsav-Patel
Copy link

I asked my question on StackOverflow. Link.

I tried to make a custom layer using keras. I only want to implement the following 2 lines of code in call function which should be trainable.

AV = K.dot(A, Vin)
Vout = K.dot(AV, W)

dimensions of A, Vin and W are (n, n), (?, n, c) and (c, f) respectively.
I would like to train my network on mnist or cifar10 dataset.
Sharky said in her answer that it depends on the dataset and data shapes.
I don't get exactly what is the problem here.
Please, someone, help me to overcome this problem.
Thank you.

@cottrell
Copy link

cottrell commented Apr 8, 2019

Which version are you on? I am hitting this on some generic math manipulations in tf 2.0. I think it is weird error message.

@luozhouyang
Copy link

@cottrell same issue

@abaxi
Copy link

abaxi commented May 29, 2019

@Utsav-Patel, @luozhouyang, @cottrell

Does your code have any weights that were defined but left unused? That may be the reason for that error. My guess is that since it is not being used, its gradient can not be computed wr.t. loss. Thus gradient is None.

This is more difficult to identify if your layer is inheriting from another layer. Calling the super constructor will add weights that you probably don't use. In which case, don't call the super().

I've coded out an example to show this in action (tf version 1.13.1, keras 2.2.4). Comment out the line

 v = v+K.dot(x, self.kernelB)       ### comment out this line to get NONE gradient error

inside call(), to get the error. If commented out that, then self.kernelB is never used, and keras gives you an error.

from keras import backend as K
from keras.layers import Layer, Activation
from keras.engine.base_layer import InputSpec
import numpy as np
from keras.models import Sequential

class CustomDense(Layer):

    def __init__(self, units, bias_constraint=None, **kwargs):
        
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] =  (kwargs.pop('input_dim'),)
        
        super(CustomDense, self).__init__(**kwargs)
        self.num_outputs = units
        self.input_spec = InputSpec(min_ndim=2)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.kernelA = self.add_weight(name='kernelA',
                                       shape=(input_shape[1], self.num_outputs),
                                       initializer='uniform')

        ##This weight is defined here, but its usage can
        ##be controlled by commenting out a line in call
        self.kernelB = self.add_weight(name='kernelB',
                                       shape=(input_shape[1], self.num_outputs),
                                       initializer='uniform')
        
        self.built = True
        super(CustomDense, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        v = K.dot(x, self.kernelA)
        v = v+K.dot(x, self.kernelB)       ### comment out this line to get 
                                                      ### NONE gradient error
        return v
        
    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.num_outputs)

if __name__ == '__main__':
    n_units = in_dim = 10
    test = np.random.random((100,in_dim))
    model = Sequential()
    layer = CustomDense(units=n_units, input_dim=in_dim)
    model.add(layer)
    model.add(Activation("elu"))
    model.compile("adam", "mae")
    model.fit(test, test)

@nbro
Copy link

nbro commented Oct 24, 2019

@abaxi I could reproduce this error with another example similar to yours. You can find the example here: https://stackoverflow.com/a/58533503/3924118. Just remove the usage of shared_variable in the method call.

@giridhar-pamisetty
Copy link

@Utsav-Patel This error arises when some of your weights in the model are not used. So, it shows that it is not differentiable. Make sure you use all the weights in the model to overcome this error.

@chizala
Copy link

chizala commented Mar 31, 2020

@Utsav-Patel This error arises when some of your weights in the model are not used. So, it shows that it is not differentiable. Make sure you use all the weights in the model to overcome this error.

How do you ensure using all weights in the model

@giridhar-pamisetty
Copy link

In my case I just used the left over weights by multiplying by 0, so that all weights are covered. This solved the issue.

@chizala
Copy link

chizala commented Mar 31, 2020

In my case I just used the left over weights by multiplying by 0, so that all weights are covered. This solved the issue.

Thank you Sir

@jaswanthbjk
Copy link

@giridhar-pamisetty

Can you please suggest me a way to check for unused weights ?

@giridhar-pamisetty
Copy link

I was using only a part of hidden node weights to calculate the output. So, after getting this error, I multiplied the remaining hidden node weights with zero. So, all the weights are covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants