Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend argmax has none for gradients. Can you even define one? #11157

Closed
lcukerd opened this issue Sep 16, 2018 · 21 comments
Closed

backend argmax has none for gradients. Can you even define one? #11157

lcukerd opened this issue Sep 16, 2018 · 21 comments
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.

Comments

@lcukerd
Copy link

lcukerd commented Sep 16, 2018

I am using Keras.Backend.armax() in a gamma layer. The model compiles fine but throws an error during fit().

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My model:

latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)

encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')

decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()

Model summary for easy visualizing:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 32)                0         
_________________________________________________________________
embedding_13 (Embedding)     (None, 32, 512)           2018816   
_________________________________________________________________
lstm_19 (LSTM)               (None, 32, 512)           2099200   
_________________________________________________________________
dropout_10 (Dropout)         (None, 32, 512)           0         
_________________________________________________________________
dense_19 (Dense)             (None, 32, 3943)          2022759   
_________________________________________________________________
lambda_5 (Lambda)            (None, 32)                0         
_________________________________________________________________
lambda_6 (Lambda)            (None, 32)                0         
_________________________________________________________________
dense_20 (Dense)             (None, 501)               16533     
_________________________________________________________________
embedding_14 (Embedding)     (None, 501, 512)          2018816   
_________________________________________________________________
lstm_20 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
lstm_21 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
dropout_11 (Dropout)         (None, 501, 512)          0         
_________________________________________________________________
dense_21 (Dense)             (None, 501, 3943)         2022759   
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________

I googled for the solution but almost all were about a faulty model. Some recommended to not use functions causing that are causing issues. However, as you can see, I cannot create this model without K.argmax (If you know any other way then do tell me).

Also, how can you even define gradient of argmax!
I am guessing its an issue in Keras, if not, pls tell me how to define its gradient.

@gabrieldemarmiesse
Copy link
Contributor

Hello! The argmax function has no gradient. Or at least, its gradient is equal to zero all the time. This is not specific to keras. It's the same in all deep learning frameworks because this is the mathematical definition of the gradient of argmax.

If you wish to create your own operation, with a custom gradient, you need to access the backend directly and create a new op. But most of the time, it's not a walk in the park. See https://www.tensorflow.org/extend/adding_an_op

@gabrieldemarmiesse gabrieldemarmiesse added the type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. label Sep 16, 2018
@lcukerd
Copy link
Author

lcukerd commented Sep 16, 2018

Yes, I know argmax has no gradient. But the error is clearly asking me to define one for argmax. How do I get to fix this error then?

@gabrieldemarmiesse
Copy link
Contributor

The error message is maybe not clear. It's saying that you should only use backend functions which have a gradient. So something else than argmax. The message is not saying that you should define argmax's gradient. Maybe this message is not explicit enough.

@lcukerd
Copy link
Author

lcukerd commented Sep 16, 2018

Okay. So is there any alternative for argmax (as my model cannot work without one) that I can use?

Btw, why does backend have argmax function when we can't use it in model?

@gabrieldemarmiesse
Copy link
Contributor

I don't know any alternative for argmax, I've never worked with a model requiring one.

Argmax is there to perform operations whenever the gradient is not needed. For example, when computing a metric.

I suppose you can try to use the argmax from tensorflow directly and see if you get the error. But you must know what you are doing because if there is no error, it is implied that the gradient is null (like tf.around)

@lcukerd
Copy link
Author

lcukerd commented Sep 16, 2018

Okay, thanks for helping me out. I will give tensorflow a go.

I will let this issue open for a day and wait for someone who knows an alternative to argmax. I hope no one has problem with this (else they can close it).

@lcukerd lcukerd closed this as completed Sep 18, 2018
@MansiAgarwal11
Copy link

MansiAgarwal11 commented Oct 14, 2018

Did you find any solution to this? @lcukerd

@lcukerd
Copy link
Author

lcukerd commented Oct 14, 2018

@MansiAgarwal11 Yes, I did. You will have to use Keras in Tensorflow model. For training, you will have to define a loss function like in this article. In the model shown in the article if you include argmax, it will still work. You should be able to do this using only Keras but I haven't tried yet.

@MansiAgarwal11
Copy link

But if there is no gradient for argmax function, how does the model backpropagate?

@lcukerd
Copy link
Author

lcukerd commented Oct 15, 2018

I am not sure myself but I think the tensorflow code was written to bypass it in a clever way. Probably someone tensorflow team can clear this up? Btw Did your model converge?

@MansiAgarwal11
Copy link

I didn't make use of argmax and came up with a different loss function for my problem.

@mycal-tucker
Copy link

FYI, in my experience with a different tensorflow function that didn't have a gradient, I found that I could run and train the model without any errors, but because there was no gradient, there was no actual learning taking place. It's something to look out for if you try to use argmax.

@sunwei317
Copy link

I have the same problem. There is no any problem for train and evaluation, and Ok for saving the model in H5. However, when loading the saved model, the error message pops up: ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Do you have idea to fix this issue. Otherwise, the model cannot be used for prediction. Thank you.

@mycal-tucker
Copy link

^ You're saying that you can train a model successfully with argmax? That surprises me. What I was trying to say in my earlier comment is that you can sometime run the training with arguments that don't have a gradient and no errors will be thrown, but your model won't actually get better.

How confident are you that the model you're training is actually getting better as you train it?

@sunwei317
Copy link

I monitored the precision, recall and accuracy while training, the model was getting better. If the model was saved with Keras.save, then the error above appears with Keras.load_model. However, if the model was saved with Keras.mode_to_json and Keras.save_weights, then everything is fine when loading the saved model.

@mycal-tucker
Copy link

Well, thanks for the update, but you've stumped me. I don't understand:

  1. How you're training the model with argmax in the cost function and
  2. How to solve the question you're actually asking about loading the weights again.

Sorry I couldn't be more help.

@e4exp
Copy link

e4exp commented Jun 22, 2019

Gumbel-softmax may solve the problem of argmax.
http://anotherdatum.com/gumbel-gan.html
And this states another way to solve that(I got an error currently).
https://stackoverflow.com/questions/46926809/getting-around-tf-argmax-which-is-not-differentiable

@mycal-tucker
Copy link

Yeah, or SeqGAN-based idea of policy updates: https://arxiv.org/abs/1609.05473

@chikubee
Copy link

I faced the same problem with GPU. with runtime as None, it seems the problem no longer persists.

@fezancs
Copy link

fezancs commented Sep 25, 2019

I'm facing the same issued , I define new layer in Lambda

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Can any body help into the matter

@yli192
Copy link

yli192 commented Jan 2, 2020

I monitored the precision, recall and accuracy while training, the model was getting better. If the model was saved with Keras.save, then the error above appears with Keras.load_model. However, if the model was saved with Keras.mode_to_json and Keras.save_weights, then everything is fine when loading the saved model.

I implemented this solution and it worked for me.
This is all you will need:

Save model to JSON

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)

serialize weights to HDF5

model.save_weights("model.h5")
print("Saved model to disk")

Load JSON model

json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

load weights into new model

loaded_model.load_weights("model.h5")
print("Loaded model from disk")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.
Projects
None yet
Development

No branches or pull requests

9 participants