New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with TimeDistributed() and Learning Phase #4178
Comments
I think you have incorrectly applied TimeDistributed to Dropout.
|
Thanks for the reply. |
This is a bug in theano when Solutions :
If you are trying to drop the same set of nodes for all timesteps in a sequence, simply wrapping in |
Thanks for the pointer, @farizrahman4u . The solutions that you suggested sadly do not apply to my real use case (which I simplified for this Issue). My actual goal is to have an inner model: inner_in1 = Input(batch_shape=(batch_size, n_elements, element_size), name="inner_in1")
inner_output = GRU(2, dropout_U=0.5, dropout_W=0.5, return_sequences=False, name="gru")(inner_in1)
inner_model = Model(input=inner_in1, output=inner_output, name="inner_model") that I use in a outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
outer_output = TimeDistributedModel(inner_model, name="distr")(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function() You could see this as a sentence model ( Not specifying |
@farizrahman4u as i reported in #4182
|
@sjebbara There is no reason for you to provide the outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
TimeDistributedModel = TimeDistributed(inner_model, name="distr")
TimeDistributedModel.build((None,) + outer_in1._keras_shape[1:])
TimeDistributedModel.build = lambda *_: None
outer_output = TimeDistributedModel(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function() |
Similarly @eyaler, model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None
model.add(dropout)
model.add(...)
model.add(...) |
thanks @farizrahman4u !
|
|
got it! from keras.models import Sequential, Model x=np.zeros((100,20,10)) model = Sequential() input = Input(batch_shape=(100, 20, 10)) |
I think I misunderstood the reshape-based implementation. I was just about to point out why reshaping makes no sense with a distributed RNN layer, but then the pieces fell together 😆. So the solution is simply to leave |
Hello! Is there any update on this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
I was having a similar issue with Tensorflow. Whenever I used the TimeDistributed wrapper on a model containing layers that used the learning phase, the resulting tensor would have the property _uses_learning_phase = False. This meant that when I created a final model containing that tensor, the model's _uses_learning_phase would incorrectly be set to False. In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.
|
@eyaler I tried with a Dense Layer instead of an LSTM. I get an error that says. dropout.build((None,) + a.keras_shape[1:]) The other thing I tried was to have a Dense Layer as input to a dropout layer input_1 = Input(batch_shape=(batch_size, seq_len, num_inputs))
x1 = Dense(32, activation='tanh')(input_1)
x1 = TimeDistributed(Dropout(0.5))(x1) which ends with: InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
[[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] Either way will cause an exception. What I want to do is sequence to sequence learning and I'd like to do it with the functional API. That would be a timeDistributed dense layer on top of a LSTM if I understood correctly and Having dropout would be the icing on the cake though. Like @farizrahman4u said I'd like to drop the exact same number of nodes at every time step with Can anybody provide a pointer on how to do this with the functional API. I can't figure out this EDIT!: I tried using tuple(a.get_shape().as_list())[1:] to make the snippet work. from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np
x=np.zeros((100,20,10))
y=np.zeros((100,20,10))
input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + tuple(a.get_shape().as_list())[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100) Again it terminates with an exception. This time in the training phase: InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
[[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] EDIT!: Thanks @brayan07 your workaround fixed the issue and it compiles. I don't know if the dropout is applied correctly |
This was the key to solve this for me, too. |
(EDIT: The following issue is only a minimal example of how to produce the error. My actual goal is to use a more complicated model instead of
Dropout()
here.)When executing the following script a
MissingInputError
occurs:This is the simplest model that produces the error (In my original architecture, I tried to distribute a more complex model). The same issue occurs when replacing the
Dropout()
layer with e.g.GaussianNoise()
,GRU(dropout_W=0.5)
, but not for e.g.Dense()
. I think the error boils down to the combination ofTimeDistributed()
and any layer (or model) that uses the learning phase.Maybe there is a conceptual problem with
TimeDistributed()
and the learning phase input?These issues seem to be somewhat related: #3834, #2609, #3686, #2391
The full stack trace is this:
Please make sure that the boxes below are checked before you submit your issue. Thank you!
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
The text was updated successfully, but these errors were encountered: