backend argmax has none for gradients. Can you even define one? #11157

lcukerd · 2018-09-16T21:24:20Z

I am using Keras.Backend.armax() in a gamma layer. The model compiles fine but throws an error during fit().

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My model:

latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)

encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')

decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()

Model summary for easy visualizing:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 32)                0         
_________________________________________________________________
embedding_13 (Embedding)     (None, 32, 512)           2018816   
_________________________________________________________________
lstm_19 (LSTM)               (None, 32, 512)           2099200   
_________________________________________________________________
dropout_10 (Dropout)         (None, 32, 512)           0         
_________________________________________________________________
dense_19 (Dense)             (None, 32, 3943)          2022759   
_________________________________________________________________
lambda_5 (Lambda)            (None, 32)                0         
_________________________________________________________________
lambda_6 (Lambda)            (None, 32)                0         
_________________________________________________________________
dense_20 (Dense)             (None, 501)               16533     
_________________________________________________________________
embedding_14 (Embedding)     (None, 501, 512)          2018816   
_________________________________________________________________
lstm_20 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
lstm_21 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
dropout_11 (Dropout)         (None, 501, 512)          0         
_________________________________________________________________
dense_21 (Dense)             (None, 501, 3943)         2022759   
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________

I googled for the solution but almost all were about a faulty model. Some recommended to not use functions causing that are causing issues. However, as you can see, I cannot create this model without K.argmax (If you know any other way then do tell me).

Also, how can you even define gradient of argmax!
I am guessing its an issue in Keras, if not, pls tell me how to define its gradient.

The text was updated successfully, but these errors were encountered:

gabrieldemarmiesse · 2018-09-16T21:48:44Z

Hello! The argmax function has no gradient. Or at least, its gradient is equal to zero all the time. This is not specific to keras. It's the same in all deep learning frameworks because this is the mathematical definition of the gradient of argmax.

If you wish to create your own operation, with a custom gradient, you need to access the backend directly and create a new op. But most of the time, it's not a walk in the park. See https://www.tensorflow.org/extend/adding_an_op

lcukerd · 2018-09-16T21:53:27Z

Yes, I know argmax has no gradient. But the error is clearly asking me to define one for argmax. How do I get to fix this error then?

gabrieldemarmiesse · 2018-09-16T22:06:34Z

The error message is maybe not clear. It's saying that you should only use backend functions which have a gradient. So something else than argmax. The message is not saying that you should define argmax's gradient. Maybe this message is not explicit enough.

lcukerd · 2018-09-16T22:15:35Z

Okay. So is there any alternative for argmax (as my model cannot work without one) that I can use?

Btw, why does backend have argmax function when we can't use it in model?

gabrieldemarmiesse · 2018-09-16T22:29:14Z

I don't know any alternative for argmax, I've never worked with a model requiring one.

Argmax is there to perform operations whenever the gradient is not needed. For example, when computing a metric.

I suppose you can try to use the argmax from tensorflow directly and see if you get the error. But you must know what you are doing because if there is no error, it is implied that the gradient is null (like tf.around)

lcukerd · 2018-09-16T22:36:52Z

Okay, thanks for helping me out. I will give tensorflow a go.

I will let this issue open for a day and wait for someone who knows an alternative to argmax. I hope no one has problem with this (else they can close it).

MansiAgarwal11 · 2018-10-14T09:19:36Z

Did you find any solution to this? @lcukerd

lcukerd · 2018-10-14T09:33:01Z

@MansiAgarwal11 Yes, I did. You will have to use Keras in Tensorflow model. For training, you will have to define a loss function like in this article. In the model shown in the article if you include argmax, it will still work. You should be able to do this using only Keras but I haven't tried yet.

MansiAgarwal11 · 2018-10-14T16:17:57Z

But if there is no gradient for argmax function, how does the model backpropagate?

lcukerd · 2018-10-15T03:51:48Z

I am not sure myself but I think the tensorflow code was written to bypass it in a clever way. Probably someone tensorflow team can clear this up? Btw Did your model converge?

MansiAgarwal11 · 2018-10-15T15:59:14Z

I didn't make use of argmax and came up with a different loss function for my problem.

mycal-tucker · 2018-12-04T22:18:53Z

FYI, in my experience with a different tensorflow function that didn't have a gradient, I found that I could run and train the model without any errors, but because there was no gradient, there was no actual learning taking place. It's something to look out for if you try to use argmax.

sunwei317 · 2018-12-08T16:32:44Z

I have the same problem. There is no any problem for train and evaluation, and Ok for saving the model in H5. However, when loading the saved model, the error message pops up: ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Do you have idea to fix this issue. Otherwise, the model cannot be used for prediction. Thank you.

mycal-tucker · 2018-12-09T14:54:21Z

^ You're saying that you can train a model successfully with argmax? That surprises me. What I was trying to say in my earlier comment is that you can sometime run the training with arguments that don't have a gradient and no errors will be thrown, but your model won't actually get better.

How confident are you that the model you're training is actually getting better as you train it?

sunwei317 · 2018-12-09T15:20:07Z

I monitored the precision, recall and accuracy while training, the model was getting better. If the model was saved with Keras.save, then the error above appears with Keras.load_model. However, if the model was saved with Keras.mode_to_json and Keras.save_weights, then everything is fine when loading the saved model.

mycal-tucker · 2018-12-10T00:32:38Z

Well, thanks for the update, but you've stumped me. I don't understand:

How you're training the model with argmax in the cost function and
How to solve the question you're actually asking about loading the weights again.

Sorry I couldn't be more help.

e4exp · 2019-06-22T22:28:20Z

Gumbel-softmax may solve the problem of argmax.
http://anotherdatum.com/gumbel-gan.html
And this states another way to solve that(I got an error currently).
https://stackoverflow.com/questions/46926809/getting-around-tf-argmax-which-is-not-differentiable

mycal-tucker · 2019-06-24T12:33:00Z

Yeah, or SeqGAN-based idea of policy updates: https://arxiv.org/abs/1609.05473

chikubee · 2019-07-22T13:58:27Z

I faced the same problem with GPU. with runtime as None, it seems the problem no longer persists.

fezancs · 2019-09-25T01:32:51Z

I'm facing the same issued , I define new layer in Lambda

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Can any body help into the matter

yli192 · 2020-01-02T17:48:27Z

I monitored the precision, recall and accuracy while training, the model was getting better. If the model was saved with Keras.save, then the error above appears with Keras.load_model. However, if the model was saved with Keras.mode_to_json and Keras.save_weights, then everything is fine when loading the saved model.

I implemented this solution and it worked for me.
This is all you will need:

Save model to JSON

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)

serialize weights to HDF5

model.save_weights("model.h5")
print("Saved model to disk")

Load JSON model

json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

load weights into new model

loaded_model.load_weights("model.h5")
print("Loaded model from disk")

gabrieldemarmiesse added the type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. label Sep 16, 2018

lcukerd closed this as completed Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend argmax has none for gradients. Can you even define one? #11157

backend argmax has none for gradients. Can you even define one? #11157

lcukerd commented Sep 16, 2018

gabrieldemarmiesse commented Sep 16, 2018

lcukerd commented Sep 16, 2018

gabrieldemarmiesse commented Sep 16, 2018

lcukerd commented Sep 16, 2018

gabrieldemarmiesse commented Sep 16, 2018

lcukerd commented Sep 16, 2018

MansiAgarwal11 commented Oct 14, 2018 •

edited

lcukerd commented Oct 14, 2018

MansiAgarwal11 commented Oct 14, 2018

lcukerd commented Oct 15, 2018

MansiAgarwal11 commented Oct 15, 2018

mycal-tucker commented Dec 4, 2018

sunwei317 commented Dec 8, 2018

mycal-tucker commented Dec 9, 2018

sunwei317 commented Dec 9, 2018

mycal-tucker commented Dec 10, 2018

e4exp commented Jun 22, 2019

mycal-tucker commented Jun 24, 2019

chikubee commented Jul 22, 2019

fezancs commented Sep 25, 2019

yli192 commented Jan 2, 2020

backend argmax has none for gradients. Can you even define one? #11157

backend argmax has none for gradients. Can you even define one? #11157

Comments

lcukerd commented Sep 16, 2018

gabrieldemarmiesse commented Sep 16, 2018

lcukerd commented Sep 16, 2018

gabrieldemarmiesse commented Sep 16, 2018

lcukerd commented Sep 16, 2018

gabrieldemarmiesse commented Sep 16, 2018

lcukerd commented Sep 16, 2018

MansiAgarwal11 commented Oct 14, 2018 • edited

lcukerd commented Oct 14, 2018

MansiAgarwal11 commented Oct 14, 2018

lcukerd commented Oct 15, 2018

MansiAgarwal11 commented Oct 15, 2018

mycal-tucker commented Dec 4, 2018

sunwei317 commented Dec 8, 2018

mycal-tucker commented Dec 9, 2018

sunwei317 commented Dec 9, 2018

mycal-tucker commented Dec 10, 2018

e4exp commented Jun 22, 2019

mycal-tucker commented Jun 24, 2019

chikubee commented Jul 22, 2019

fezancs commented Sep 25, 2019

yli192 commented Jan 2, 2020

Save model to JSON

serialize weights to HDF5

Load JSON model

load weights into new model

MansiAgarwal11 commented Oct 14, 2018 •

edited