Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer.trainable example not working #3804

Closed
3 tasks done
ghost opened this issue Sep 18, 2016 · 6 comments
Closed
3 tasks done

layer.trainable example not working #3804

ghost opened this issue Sep 18, 2016 · 6 comments

Comments

@ghost
Copy link

ghost commented Sep 18, 2016

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
  • Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

The example in https://keras.io/getting-started/faq/#how-can-i-freeze-keras-layers does not work.
The example fails with both TF and theano.
Both models will do whatever you put in the second layer.trainable line.

Code to try:

from keras.layers import Input, Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, MaxPooling2D
from keras.layers import Reshape, Deconvolution2D
from keras.models import Model
import numpy as np

x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)
frozen_model = Model(x, y)
frozen_model.compile(optimizer='rmsprop', loss='mse')
#**** both models do whatever you put in the next line
layer.trainable = True
trainable_model = Model(x, y)
trainable_model.compile(optimizer='rmsprop', loss='mse')

data = labels = np.vstack((np.zeros((500,32)), np.ones((500,32))))
print layer.get_weights()[0][0]
frozen_model.fit(data, labels)
print layer.get_weights()[0][0]
trainable_model.fit(data, labels)
print layer.get_weights()[0][0]
@kuza55
Copy link
Contributor

kuza55 commented Sep 18, 2016

The docs do point out this behaviour:

# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)

Though we probably want to update the docs since the lines at the bottom imply the opposite:

frozen_model.fit(data, labels)  # this does NOT update the weights of `layer`
trainable_model.fit(data, labels)  # this updates the weights of `layer`

I don't see a trivial solution for doing what you want, so I would suggest trying to avoid using this pattern and just setting the flag to be what you need before training the model.

@ghost
Copy link
Author

ghost commented Sep 18, 2016

The docs are ok.

The lines you point out mean that, since they share a layer, training the non-frozen model will also change the weights of the frozen one. But training the frozen one should not change anything. This feature used to work properly in the past.

@fchollet
Copy link
Member

That's a good point; the thing is that compile does not actually create the gradient updates in the current Keras version (it used to), it happens at the first call to fit. I have fixed this, so now the example in the docs does what it says it does. Update Keras.

@ghost
Copy link
Author

ghost commented Sep 18, 2016

Oh, I see. Great, thanks!

@ylmeng
Copy link

ylmeng commented Dec 4, 2017

I load a pre-trained model as a layer in another model. Everything is fine is I set the pre-trained model trainable=False. However, if I set it True, I got a hundred lines of errors. The last chunk is this:
NotFoundError (see above for traceback): Resource __per_step_6/_tensor_arraysoutput_ta_965/N10tensorflow11TensorArrayE does not exist. [[Node: training/RMSprop/gradients/model_2/time_distributed_7/while_2/MatMul_grad/MatMul_1/StackPop/_1899 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_11685_training/RMSprop/gradients/model_2/time_distributed_7/while_2/MatMul_grad/MatMul_1/StackPop", _device="/job:localhost/replica:0/task:0/gpu:0"](training/RMSprop/gradients/model_2/time_distributed_7/while_2/MatMul_grad/MatMul_1/StackPop/_1898)]]

It seems something is missing in order to make the combined system trainable?
My last layer is a TimeDistributed(Dense). Does that cause the issue? The error message seems to suggest so.

@fatih-ilhan
Copy link

I have the same problem while loading a pre-trained model checkpoint:

[[Node: training/Adam/gradients/time_distributed_2/while_1/TensorArrayStack/TensorArrayGatherV3_grad/TensorArrayGrad/TensorArrayGradV3 = TensorArrayGradV3[_class=["loc:@time_distributed_2/while_1/TensorArray_2"], source="training/Adam/gradients", _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/Adam/gradients/time_distributed_2/while_1/TensorArrayStack/TensorArrayGatherV3_grad/TensorArrayGrad/TensorArrayGradV3/StackPopV2, training/Adam/gradients/time_distributed_2/while_1/TensorArrayStack/TensorArrayGatherV3_grad/TensorArrayGrad/TensorArrayGradV3/StackPopV2_1)]]
[[Node: training/Adam/gradients/StackPushV2_6/_868 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2377_training/Adam/gradients/StackPushV2_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Could you fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants