Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deep control gradient back propagation with Keras #956

Closed
jerryli1981 opened this issue Nov 6, 2015 · 21 comments
Closed

How to deep control gradient back propagation with Keras #956

jerryli1981 opened this issue Nov 6, 2015 · 21 comments

Comments

@jerryli1981
Copy link

Hi All,
I would like to know how to write code to conduct gradient back propagation. Like Lua does below,

local sim_grad = self.criterion:backward(output, targets[j])
local rep_grad = self.MLP:backward(rep, sim_grad)


Keras's example teach me how to construct sequential model like below,
model = Sequential()
model.add(Dense(128, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))


However, it is not enough for me. I need generate gradient for this model. How can write code to control sequential model backward propagation?
Thanks

@EderSantana
Copy link
Contributor

You want to train the model or you need the gradients to do something else? If you want to train the model, just keep reading the docs and see the fit method it will calculate gradients and train everything for your.

If you need the gradients to do other things you have to use Theano. You have to get the output of your model and, define a cost function and calculate the gradients with respect to each parameter. For example:

D = T.matrix() # desired
Y = model.get_output()
Cost = ((D-Y)**2).mean()
gradients = [T.grad(Cost, p) for p in model.get_params()]

@jerryli1981
Copy link
Author

My model is Recurive Neutral Network(RNN) + MLP. Based on your suggestion. I have two choices.
One is focus on training MLP. and generate gradients to train RNN.
The other is I build a sequence model contains RNN + MLP. And then, train together.
The second choice seems like below

model.add(MyRNN)
model.add(Dense(128, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

Is that possible?

@EderSantana
Copy link
Contributor

run and understand this example, they do something like what you are doing https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

@NightFury13
Copy link

@EderSantana : I don't think thats what Jerry asked. Is there some way to compute the gradients of backpropogation w.r.t each hidden layer (or input layer?). An equivalent of this in Caffe for example would be something like :

net.blobs[last_layer].diff[0][target_class]=1 #Setting diff of last layer to 1 (i.e. grad considering target class is obtained)
back_pass = net.backward()
jacobian = back_pass[desired_layer].copy() #Gives gradient update for the desired_class.

@jerryli1981 : Were you able to find a way to do this?

@johnny5550822
Copy link

@jerryli1981 Were you able to identify the way to calculate the gradient in a layer? (I am also originally a torch7 user and it is straightforward to do that. I am not sure about in Keras...)

@jemshit
Copy link

jemshit commented Apr 15, 2017

I'm trying to do backpropagation with MLP, do we have a way to do backward pass in Keras (usng Tensorflow)?

@hamzamerzic
Copy link

@jemshit TensorFlow allows that using opt.apply_gradients method, as shown here: https://www.tensorflow.org/api_docs/python/tf/train/Optimizer or here: https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L592
Is there a backend agnostic way of doing this though? @fchollet

@ROZBEH
Copy link

ROZBEH commented Jul 4, 2017

Were you guys able to resolve this issue? I have to back propagate the error but at each time step the derivative is different and I have to manipulate that. How is that possible in Keras/Tensor flow?

@stale
Copy link

stale bot commented Oct 2, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the stale label Oct 2, 2017
@stale stale bot closed this as completed Nov 1, 2017
@mongoose54
Copy link

More or less the same question here: How can I backpropagate a specific error value in a Keras model? Thanks

@ROZBEH
Copy link

ROZBEH commented Jan 16, 2018

I couldn't figure this out. I ended up using Pytorch. Pytorch gives you this capability.

@jnhelen
Copy link

jnhelen commented Oct 3, 2018

@jemshit Hi! Have you solved this problem?

@jemshit
Copy link

jemshit commented Oct 3, 2018 via email

@eliethesaiyan
Copy link

@jemshit ,i think what @jerryli1981 meant is to be able to apply a function on gradient at each stage of backprop or forwardpass. for example what if you want to binarize(quantize) gradient on each backprob which is widely used in quantized model

@birdmw
Copy link

birdmw commented Jan 30, 2019

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error
from math import sqrt

model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

inputs = np.random.random((1, 8))
outputs = model.predict(inputs)
targets = np.random.random((1, 8))
rmse = sqrt(mean_squared_error(targets, outputs))
loss = losses.mean_squared_error(targets, model.output)

#  ===== Symbolic Gradient =====
gradients = k.gradients(loss, model.trainable_weights)

print("===BEFORE WALKING DOWN GRADIENT===")
print("outputs:\n", outputs)
print("targets:\n", targets)

# Begin TensorFlow
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())

steps = 100  # steps of gradient descent
for s in range(steps):

    # ===== Numerical gradient =====
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})

    # Step down the gradient for each layer
    for i in range(len(model.trainable_weights)):
        sess.run(tf.assign_sub(model.trainable_weights[i], evaluated_gradients[i]))

    # Every 10 steps print the RMSE
    if s % 10 == 0:
        outputs = model.predict(inputs)
        rmse = sqrt(mean_squared_error(targets, outputs))
        print("step " + str(s) + " rmse:", rmse)

final_outputs = model.predict(inputs)
final_rmse = sqrt(mean_squared_error(targets, final_outputs))

print("===AFTER STEPPING DOWN GRADIENT===")
print("outputs:\n", outputs)
print("targets:\n", targets)

@theceday
Copy link

theceday commented Apr 4, 2019

Is there any way to do this? or with tf.keras?

@maulberto3
Copy link

Hi @theceday I also need to manually backprop gradients in keras. Did you managed?

@birdmw
Copy link

birdmw commented Apr 24, 2019 via email

@maulberto3
Copy link

@theceday I am in the process of. Just calculated gradients outside the computation graph (I can see them in my termnail). Now I need to update each weight accordingly, I guess I'm doing what an optimizer does for you, however, as you know RL models differ a bit from keras internals, so that's why I am 'on foot' here. That's also why model.train_on_batch() does not fit my needs either.

@birdmw
Copy link

birdmw commented Apr 25, 2019 via email

@theceday
Copy link

theceday commented Apr 25, 2019

I am not sure everyone has the same case, but I was trying to backrop a custom loss value (numpy array/input tensor), I have used K.switch/tf.cond with no luck. As far I understand so far, tf doesnt backprop those seperate branches.
In order to that loss function should be "explicitly" defined as "loss" function, so that maybe some operators could be used for that.

Maybe instead of using K.switch, returning a loss expression containing both tensors (actual and custom) might work, but I am not sure if it allows such a expression

I might give another try for this, if I have time

Edit: There is a listed change in tensorflow 2.0 alpha release:
Adding clear_losses API to be able to clear losses at the end of forward pass in a custom training loop in eager.

That hints there could be some changes in tf&tf.keras that could help the issue, but I am not sure atm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests