Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i use keras optimizer for backprop-ing on my own loss functions #4746

Closed
raghakot opened this issue Dec 16, 2016 · 13 comments
Closed
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.

Comments

@raghakot
Copy link

I am working on guided backprop for activation maximization. Instead of implementing rmsprop, Adam etc., I want to reuse optimizers defined in keras.

@fchollet fchollet added the type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. label Dec 16, 2016
@fchollet
Copy link
Member

You should check out the optimizer API as defined in keras/optimizers.py.

@raghakot
Copy link
Author

I did.
normally i would compute grads as
grads_fn = K.gradients(loss_fn, input_tensor)[0]
loss_grads_fn =K.function([input_tensor], [loss_fn, grads_fn])

My backprop would be:
loss, grads = loss_grads_fn([numpy_array])
numpy_array -= grads * lr

get_gradients (https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L61) seems to be called by get_updates() in Adam. Do i just call get_updates() once to build the update function? I am not sure how to use that function either. Specifically, I am confused about parts that are building a function vs functions where i could pass my numpy array to compute updates.

@raghakot
Copy link
Author

This is the relevant portion: https://github.com/raghakot/keras-vis/blob/master/vis/optimizer.py#L163
Instead of rolling my custom rmsprop. It would be nicer if I used keras optimizers. Would appreciate if you could look through that code and advise. It is a keras visualization library :)

@bstriner
Copy link
Contributor

You can use Keras optimizers outside of Keras if you really can't do whatever you're doing within Keras.

Yes, it is important to call get_updates() once and only once and hang on to the returned updates. For example, the Adam optimizer locally creates momentum variables in the get_updates() function. Calling get_updates() multiple times for the same set of parameters will cause chaos.

If you have some custom loss function and a list of shared variables:

updates = opt.get_updates(params, constraints, loss)
fun = K.function([input],[], updates=updates)

You're better off doing backprop on the GPU instead of back-and-forth with numpy. Store your weights as GPU variables and update them with functions. When you need the weights in numpy, use get_value and set_value.

Cheers,
Ben

@raghakot
Copy link
Author

Thanks. the input (model.input) has shape (?, channels, rows, cols). When i try to create the update function using:

updates = opt.get_updates([input], [], [loss_fn])

it complains about None. Any ideas on how to handle that?

@bstriner
Copy link
Contributor

Please always post a stack trace or something if you have specific issues.

I put together a Gist showing how to use Keras optimizers. It should teach you the basic style of how everything goes together.

https://gist.github.com/bstriner/e1e011652b297d13b3ac3f99fd11b2bc

The standard in Keras is that model parameters are variables that live on the GPU and inputs and targets are placeholders that get passed in for each batch.

A training function is created with inputs: batch inputs, batch targets; and outputs: loss, accuracy, other metrics. The function also performs updates on the model parameters on the GPU each time it is executed.

To train, you just pass batch inputs and batch targets to the training function and print out the current loss.

At the end, if you want to get the trained parameters, use K.get_value.

from keras.optimizers import Adam
from keras import backend as K
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.metrics import categorical_accuracy
import numpy as np

# inputs and targets are placeholders
x = K.placeholder(name="x", shape=(None, 28*28))
ytrue = K.placeholder(name="y", shape=(None, 10))

# model parameters are variables
W = K.variable(np.random.random((28*28,10)).astype(np.float32))
b = K.variable(np.random.random((10,)).astype(np.float32))
params = [W, b]

# single layer model: softmax(xW+b) 
ypred = K.softmax(K.dot(x,W)+b)

# categorical cross entropy loss
loss = K.mean(K.categorical_crossentropy(ytrue, ypred),axis=None)

# categorical accuracy
accuracy = categorical_accuracy(ytrue, ypred)

# Train function
opt = Adam()
updates = opt.get_updates(params, [], loss)
train = K.function([x, ytrue],[loss, accuracy],updates=updates)

# Train the network
((xtrain, ytrain),(xtest, ytest)) = mnist.load_data()
xtrain = xtrain.reshape((-1, 28*28)) # flatten input image
ytrain = to_categorical(ytrain, 10)
for epoch in range(500):
	loss, accuracy = train([xtrain, ytrain])
	print("Epoch: {}, Loss: {}, Accuracy: {}".format(epoch, loss, accuracy))


@raghakot
Copy link
Author

Thanks. The example and gist are awesome. You should perhaps add or reference it somewhere in keras docs/examples for others.

Here is a minimal example of whats happening in my case.

from keras import backend as K
from keras.optimizers import Adam

x = K.placeholder(shape=(None, 224, 224, 3))
opt = Adam()

# Some contrived example
loss = K.square(x)

updates = opt.get_updates([x], [], [loss])
iterate = K.function([x], [], updates=updates)

This will give me TypeError: int() argument must be a string or a number, not 'NoneType' because x has None for batch dimension.

Also, how do i added a placeholder on top of model.input? Basically, i am trying to add a proxy input placeholder on top of the pretrained keras model so that i can perform certain input transformations of the GPU before feeding it into the model.input. I tried:

proxy = K.placeholder(shape=K.int_shape(model.input))
# This was my futile attempt to connect to existing model graph
proxy = model.input + K.variable(0.)

@stale stale bot added the stale label May 23, 2017
@stale stale bot closed this as completed Jun 22, 2017
@mongoose54
Copy link

@bstriner I am new to Keras, in your example how I can modify it to get the model's parameters if I have a loaded network (e.g. VGG16) through load_model() ? Thanks

@bstriner
Copy link
Contributor

bstriner commented Aug 8, 2017

@mongoose54 kind of unrelated to the OP. If you have a model you can inspect model.layers model.layers[2].kernel etc. You can also just model.weights to get all the weights.

That will give you the tensor variable which gives you the variable name. You can get the actual value of the variable with import keras.backend as K; value = K.get_value(my_variable).

Cheers

@mongoose54
Copy link

@bstriner Sorry for placing it here.

However I have a question related to this topic:

Let's say I have the losses explicitly defined in a numpy array: losses = [0.23 0.432 2.23 ...] . How can I backpropagate them to update the network's parameters?

@theceday
Copy link

theceday commented Sep 6, 2018

@bstriner thx for such an example but i have a weird problem.

the only reasonable difference with your example is:
updates = self.opt.get_updates(model.trainable_weights, [], loss_out)

model is actually learning, loss is going down, val accuracy increasing (actually up to 100 in some iterations), i can save and load the model etc.

but something wrong with LR not changing.

(i have changed these values just to see the change more easily, but no luck)

self.opt = SGD(lr=1.0, decay= 1e-3, momentum=0.5, nesterov=False)

K.get_value(m.opt.lr) => outputting always 1.0 in each loop, it doesnt change.
(each call to "get_updates" (train step, not test) should change it, however it doesnt)

any ideas anyone?

edit: just added opt.lr to outputs directly, still no change.

edit2: adding "self.lr = lr"
"after" the following statement in get_updates fixes this issue.
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,K.dtype(self.decay))))

edit3: since i use tf as backend, probably it works ok as it builds up a graph, but some dependencies might not work as expected since opt.lr is not updated correctly.

Is this a bug?

what do i miss here?

@maulberto3
Copy link

maulberto3 commented Apr 22, 2019

Hi @bstriner, small question for you. Suppose I add another output head to your nn above, then what would need further adjustment?

It's just that I have a very similar nn, but as soon as I add an extra head (output) to it, then I get the An operation has None for gradient. Please make sure ... error.

Everything is working fine before adding the extra output.

Working code:

class NN():
...
def _build_nn(self):
        inputs = Input(shape=(self.obs_size,))
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l2')(inputs)
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l3')(x)
        actions_probs = Dense(units=self.n_actions, activation='softmax', use_bias=False, name='actions_probs')(x)
        self.nn = Model(inputs=inputs, outputs=[**actions_probs**])

    def _build_train(self):
        **actions_probs** = self.nn.output
        actions_1hot = K.placeholder(shape=(None, self.n_actions), name='actions_1hot')
        actions_scales = K.placeholder(shape=(None,), name='actions_scales')
        actions_probs = K.sum(actions_probs * actions_1hot, axis=1)
        log_actions_probs = K.log(actions_probs)
        policy_loss = -1 * actions_scales * log_actions_probs
        policy_loss = K.mean(policy_loss)
        entropy = K.mean(-(actions_probs * log_actions_probs))
        entropy_loss = -ENTROPY_BETA * entropy
        loss = policy_loss + entropy_loss
        optim = SGD(lr=LR, decay=1e-6, momentum=0.9, nesterov=True)
        updates = optim.get_updates(params=self.nn.trainable_weights, loss=loss)
        self.custom_train = K.function(inputs=[self.nn.input, actions_1hot, actions_scales], outputs=[loss], updates=updates)

Not-working code:

def _build_nn(self):
        inputs = Input(shape=(self.obs_size,))
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l2')(inputs)
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l3')(x)
        actions_probs = Dense(units=self.n_actions, activation='softmax', use_bias=False, name='actions_probs')(x)
        **extra_head** = Dense(units=1, activation='linear', use_bias=False, name='extra_head')(x)
        self.nn = Model(inputs=inputs, outputs=[**actions_probs, extra_head**])

    def _build_train(self):
        **actions_probs, _** = self.nn.output
        actions_1hot = K.placeholder(shape=(None, self.n_actions), name='actions_1hot')
        actions_scales = K.placeholder(shape=(None,), name='actions_scales')
        actions_probs = K.sum(actions_probs * actions_1hot, axis=1)
        log_actions_probs = K.log(actions_probs)
        policy_loss = -1 * actions_scales * log_actions_probs
        policy_loss = K.mean(policy_loss)
        entropy = K.mean(-(actions_probs * log_actions_probs))
        entropy_loss = -ENTROPY_BETA * entropy
        loss = policy_loss + entropy_loss
        optim = SGD(lr=LR, decay=1e-6, momentum=0.9, nesterov=True)
        updates = optim.get_updates(params=self.nn.trainable_weights, loss=loss)
        self.custom_train = K.function(inputs=[self.nn.input, actions_1hot, actions_scales], outputs=[loss], updates=updates)

Aditional info: When python vs code debugging, I can see the contents of loss being (correctly?) constructed/passed in, but I can't see as well inside params...
Any ideas?

@maulberto3
Copy link

maulberto3 commented Apr 22, 2019

Good morning @bstriner , re-reading my own question, maybe I should feed the "_" inside the call to K.function, as now it needs 2 different y_true's, and actions_1hot is just 1 of the outputs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.
Projects
None yet
Development

No branches or pull requests

6 participants