How can i use keras optimizer for backprop-ing on my own loss functions #4746

raghakot · 2016-12-16T18:05:56Z

I am working on guided backprop for activation maximization. Instead of implementing rmsprop, Adam etc., I want to reuse optimizers defined in keras.

fchollet · 2016-12-16T22:05:56Z

You should check out the optimizer API as defined in keras/optimizers.py.

raghakot · 2016-12-17T00:56:59Z

I did.
normally i would compute grads as
grads_fn = K.gradients(loss_fn, input_tensor)[0]
loss_grads_fn =K.function([input_tensor], [loss_fn, grads_fn])

My backprop would be:
loss, grads = loss_grads_fn([numpy_array])
numpy_array -= grads * lr

get_gradients (https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L61) seems to be called by get_updates() in Adam. Do i just call get_updates() once to build the update function? I am not sure how to use that function either. Specifically, I am confused about parts that are building a function vs functions where i could pass my numpy array to compute updates.

raghakot · 2016-12-22T11:02:57Z

This is the relevant portion: https://github.com/raghakot/keras-vis/blob/master/vis/optimizer.py#L163
Instead of rolling my custom rmsprop. It would be nicer if I used keras optimizers. Would appreciate if you could look through that code and advise. It is a keras visualization library :)

bstriner · 2016-12-25T01:35:33Z

You can use Keras optimizers outside of Keras if you really can't do whatever you're doing within Keras.

Yes, it is important to call get_updates() once and only once and hang on to the returned updates. For example, the Adam optimizer locally creates momentum variables in the get_updates() function. Calling get_updates() multiple times for the same set of parameters will cause chaos.

If you have some custom loss function and a list of shared variables:

updates = opt.get_updates(params, constraints, loss)
fun = K.function([input],[], updates=updates)

You're better off doing backprop on the GPU instead of back-and-forth with numpy. Store your weights as GPU variables and update them with functions. When you need the weights in numpy, use get_value and set_value.

Cheers,
Ben

raghakot · 2016-12-25T06:09:10Z

Thanks. the input (model.input) has shape (?, channels, rows, cols). When i try to create the update function using:

updates = opt.get_updates([input], [], [loss_fn])

it complains about None. Any ideas on how to handle that?

bstriner · 2016-12-25T21:14:20Z

Please always post a stack trace or something if you have specific issues.

I put together a Gist showing how to use Keras optimizers. It should teach you the basic style of how everything goes together.

https://gist.github.com/bstriner/e1e011652b297d13b3ac3f99fd11b2bc

The standard in Keras is that model parameters are variables that live on the GPU and inputs and targets are placeholders that get passed in for each batch.

A training function is created with inputs: batch inputs, batch targets; and outputs: loss, accuracy, other metrics. The function also performs updates on the model parameters on the GPU each time it is executed.

To train, you just pass batch inputs and batch targets to the training function and print out the current loss.

At the end, if you want to get the trained parameters, use K.get_value.

from keras.optimizers import Adam
from keras import backend as K
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.metrics import categorical_accuracy
import numpy as np

# inputs and targets are placeholders
x = K.placeholder(name="x", shape=(None, 28*28))
ytrue = K.placeholder(name="y", shape=(None, 10))

# model parameters are variables
W = K.variable(np.random.random((28*28,10)).astype(np.float32))
b = K.variable(np.random.random((10,)).astype(np.float32))
params = [W, b]

# single layer model: softmax(xW+b) 
ypred = K.softmax(K.dot(x,W)+b)

# categorical cross entropy loss
loss = K.mean(K.categorical_crossentropy(ytrue, ypred),axis=None)

# categorical accuracy
accuracy = categorical_accuracy(ytrue, ypred)

# Train function
opt = Adam()
updates = opt.get_updates(params, [], loss)
train = K.function([x, ytrue],[loss, accuracy],updates=updates)

# Train the network
((xtrain, ytrain),(xtest, ytest)) = mnist.load_data()
xtrain = xtrain.reshape((-1, 28*28)) # flatten input image
ytrain = to_categorical(ytrain, 10)
for epoch in range(500):
	loss, accuracy = train([xtrain, ytrain])
	print("Epoch: {}, Loss: {}, Accuracy: {}".format(epoch, loss, accuracy))

raghakot · 2016-12-26T05:14:55Z

Thanks. The example and gist are awesome. You should perhaps add or reference it somewhere in keras docs/examples for others.

Here is a minimal example of whats happening in my case.

from keras import backend as K
from keras.optimizers import Adam

x = K.placeholder(shape=(None, 224, 224, 3))
opt = Adam()

# Some contrived example
loss = K.square(x)

updates = opt.get_updates([x], [], [loss])
iterate = K.function([x], [], updates=updates)

This will give me TypeError: int() argument must be a string or a number, not 'NoneType' because x has None for batch dimension.

Also, how do i added a placeholder on top of model.input? Basically, i am trying to add a proxy input placeholder on top of the pretrained keras model so that i can perform certain input transformations of the GPU before feeding it into the model.input. I tried:

proxy = K.placeholder(shape=K.int_shape(model.input))
# This was my futile attempt to connect to existing model graph
proxy = model.input + K.variable(0.)

mongoose54 · 2017-08-07T16:22:42Z

@bstriner I am new to Keras, in your example how I can modify it to get the model's parameters if I have a loaded network (e.g. VGG16) through load_model() ? Thanks

bstriner · 2017-08-08T03:31:02Z

@mongoose54 kind of unrelated to the OP. If you have a model you can inspect model.layers model.layers[2].kernel etc. You can also just model.weights to get all the weights.

That will give you the tensor variable which gives you the variable name. You can get the actual value of the variable with import keras.backend as K; value = K.get_value(my_variable).

Cheers

mongoose54 · 2018-01-16T00:57:13Z

@bstriner Sorry for placing it here.

However I have a question related to this topic:

Let's say I have the losses explicitly defined in a numpy array: losses = [0.23 0.432 2.23 ...] . How can I backpropagate them to update the network's parameters?

theceday · 2018-09-06T06:36:48Z

@bstriner thx for such an example but i have a weird problem.

the only reasonable difference with your example is:
updates = self.opt.get_updates(model.trainable_weights, [], loss_out)

model is actually learning, loss is going down, val accuracy increasing (actually up to 100 in some iterations), i can save and load the model etc.

but something wrong with LR not changing.

(i have changed these values just to see the change more easily, but no luck)

self.opt = SGD(lr=1.0, decay= 1e-3, momentum=0.5, nesterov=False)

K.get_value(m.opt.lr) => outputting always 1.0 in each loop, it doesnt change.
(each call to "get_updates" (train step, not test) should change it, however it doesnt)

any ideas anyone?

edit: just added opt.lr to outputs directly, still no change.

edit2: adding "self.lr = lr"
"after" the following statement in get_updates fixes this issue.
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,K.dtype(self.decay))))

edit3: since i use tf as backend, probably it works ok as it builds up a graph, but some dependencies might not work as expected since opt.lr is not updated correctly.

Is this a bug?

what do i miss here?

maulberto3 · 2019-04-22T06:40:17Z

Hi @bstriner, small question for you. Suppose I add another output head to your nn above, then what would need further adjustment?

It's just that I have a very similar nn, but as soon as I add an extra head (output) to it, then I get the An operation has None for gradient. Please make sure ... error.

Everything is working fine before adding the extra output.

Working code:

class NN():
...
def _build_nn(self):
        inputs = Input(shape=(self.obs_size,))
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l2')(inputs)
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l3')(x)
        actions_probs = Dense(units=self.n_actions, activation='softmax', use_bias=False, name='actions_probs')(x)
        self.nn = Model(inputs=inputs, outputs=[**actions_probs**])

    def _build_train(self):
        **actions_probs** = self.nn.output
        actions_1hot = K.placeholder(shape=(None, self.n_actions), name='actions_1hot')
        actions_scales = K.placeholder(shape=(None,), name='actions_scales')
        actions_probs = K.sum(actions_probs * actions_1hot, axis=1)
        log_actions_probs = K.log(actions_probs)
        policy_loss = -1 * actions_scales * log_actions_probs
        policy_loss = K.mean(policy_loss)
        entropy = K.mean(-(actions_probs * log_actions_probs))
        entropy_loss = -ENTROPY_BETA * entropy
        loss = policy_loss + entropy_loss
        optim = SGD(lr=LR, decay=1e-6, momentum=0.9, nesterov=True)
        updates = optim.get_updates(params=self.nn.trainable_weights, loss=loss)
        self.custom_train = K.function(inputs=[self.nn.input, actions_1hot, actions_scales], outputs=[loss], updates=updates)

Not-working code:

def _build_nn(self):
        inputs = Input(shape=(self.obs_size,))
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l2')(inputs)
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l3')(x)
        actions_probs = Dense(units=self.n_actions, activation='softmax', use_bias=False, name='actions_probs')(x)
        **extra_head** = Dense(units=1, activation='linear', use_bias=False, name='extra_head')(x)
        self.nn = Model(inputs=inputs, outputs=[**actions_probs, extra_head**])

    def _build_train(self):
        **actions_probs, _** = self.nn.output
        actions_1hot = K.placeholder(shape=(None, self.n_actions), name='actions_1hot')
        actions_scales = K.placeholder(shape=(None,), name='actions_scales')
        actions_probs = K.sum(actions_probs * actions_1hot, axis=1)
        log_actions_probs = K.log(actions_probs)
        policy_loss = -1 * actions_scales * log_actions_probs
        policy_loss = K.mean(policy_loss)
        entropy = K.mean(-(actions_probs * log_actions_probs))
        entropy_loss = -ENTROPY_BETA * entropy
        loss = policy_loss + entropy_loss
        optim = SGD(lr=LR, decay=1e-6, momentum=0.9, nesterov=True)
        updates = optim.get_updates(params=self.nn.trainable_weights, loss=loss)
        self.custom_train = K.function(inputs=[self.nn.input, actions_1hot, actions_scales], outputs=[loss], updates=updates)

Aditional info: When python vs code debugging, I can see the contents of loss being (correctly?) constructed/passed in, but I can't see as well inside params...
Any ideas?

maulberto3 · 2019-04-22T15:05:13Z

Good morning @bstriner , re-reading my own question, maybe I should feed the "_" inside the call to K.function, as now it needs 2 different y_true's, and actions_1hot is just 1 of the outputs...

fchollet added the type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. label Dec 16, 2016

raghakot mentioned this issue Dec 25, 2016

Refactor optimizer to use keras optimizer raghakot/keras-vis#1

Closed

stale bot added the stale label May 23, 2017

stale bot closed this as completed Jun 22, 2017

theceday mentioned this issue Sep 6, 2018

optimizers decay #11093

Closed

BadrYoubiIdrissi mentioned this issue Dec 26, 2018

Using optimizers on model input with unknown batch size #11935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can i use keras optimizer for backprop-ing on my own loss functions #4746

How can i use keras optimizer for backprop-ing on my own loss functions #4746

raghakot commented Dec 16, 2016

fchollet commented Dec 16, 2016

raghakot commented Dec 17, 2016

raghakot commented Dec 22, 2016

bstriner commented Dec 25, 2016

raghakot commented Dec 25, 2016

bstriner commented Dec 25, 2016

raghakot commented Dec 26, 2016

mongoose54 commented Aug 7, 2017

bstriner commented Aug 8, 2017

mongoose54 commented Jan 16, 2018

theceday commented Sep 6, 2018 •

edited

Loading

maulberto3 commented Apr 22, 2019 •

edited

Loading

maulberto3 commented Apr 22, 2019 •

edited

Loading

How can i use keras optimizer for backprop-ing on my own loss functions #4746

How can i use keras optimizer for backprop-ing on my own loss functions #4746

Comments

raghakot commented Dec 16, 2016

fchollet commented Dec 16, 2016

raghakot commented Dec 17, 2016

raghakot commented Dec 22, 2016

bstriner commented Dec 25, 2016

raghakot commented Dec 25, 2016

bstriner commented Dec 25, 2016

raghakot commented Dec 26, 2016

mongoose54 commented Aug 7, 2017

bstriner commented Aug 8, 2017

mongoose54 commented Jan 16, 2018

theceday commented Sep 6, 2018 • edited Loading

maulberto3 commented Apr 22, 2019 • edited Loading

maulberto3 commented Apr 22, 2019 • edited Loading

theceday commented Sep 6, 2018 •

edited

Loading

maulberto3 commented Apr 22, 2019 •

edited

Loading

maulberto3 commented Apr 22, 2019 •

edited

Loading