Problem with TimeDistributed() and Learning Phase #4178

sjebbara · 2016-10-25T07:42:08Z

(EDIT: The following issue is only a minimal example of how to produce the error. My actual goal is to use a more complicated model instead of Dropout() here.)

When executing the following script a MissingInputError occurs:

from keras.models import Model
from keras.layers import Input, TimeDistributed, Dropout

in1 = Input(batch_shape=(10, 8, 6), name="in1")
out1 = TimeDistributed(Dropout(0.5))(in1)

model = Model(input=in1, output=out1)
model.compile("adam", "mse")
model._make_predict_function()

This is the simplest model that produces the error (In my original architecture, I tried to distribute a more complex model). The same issue occurs when replacing the Dropout() layer with e.g. GaussianNoise(), GRU(dropout_W=0.5), but not for e.g. Dense(). I think the error boils down to the combination of TimeDistributed() and any layer (or model) that uses the learning phase.

Maybe there is a conceptual problem with TimeDistributed() and the learning phase input?

These issues seem to be somewhat related: #3834, #2609, #3686, #2391

The full stack trace is this:

... 
  File "/homes/sjebbara/git/keras-original/keras/engine/training.py", line 752, in _make_predict_function
    **kwargs)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 787, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 773, in __init__
    **kwargs)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1776, in orig_function
    output_keys=output_keys).create(
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1430, in __init__
    accept_inplace)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 176, in std_fgraph
    update_mapping=update_mapping)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 180, in __init__
    self.__import_r__(output, reason="init")
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 351, in __import_r__
    self.__import__(variable.owner, reason=reason)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 396, in __import__
    variable=r)
theano.gof.fg.MissingInputError: An input of the graph, used to compute Shape(<TensorType(float32, matrix)>), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.

Backtrace when the variable is created:
  File "/homes/sjebbara/PyCharmProjects/NeuralSentiment/src/Test2.py", line 5, in <module>
    out1 = TimeDistributed(Dropout(0.5))(in1)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 149, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/homes/sjebbara/git/keras-original/keras/layers/wrappers.py", line 131, in call
    initial_states=[], input_length=input_length, unroll=unroll)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 947, in rnn
    go_backwards=go_backwards)

Please make sure that the boxes below are checked before you submit your issue. Thank you!

[ x] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[x ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[x ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

The text was updated successfully, but these errors were encountered:

kudkudak · 2016-10-25T10:05:23Z

I think you have incorrectly applied TimeDistributed to Dropout.

TimeDistributed(Dropout(0.5))(in1) should be TimeDistributed(Dropout(0.5)(in1))(in1)

sjebbara · 2016-10-25T11:27:35Z

Thanks for the reply.
I am quiet sure that the TimeDistributed() layer expects a Layer object and not a tensor (which Dropout(0.5)(in1) would return).
Also, when changing
out1 = TimeDistributed(Dropout(0.5))(in1) to
out1 = TimeDistributed(Dense(10))(in1)
everything works fine.

farizrahman4u · 2016-10-25T20:27:33Z

This is a bug in theano when RandomStreams are present inside a scan op. See: https://groups.google.com/forum/#!topic/theano-users/8diyZjq6ngc

Solutions :

Don't provide batch_size or,
Don't use TimeDistributed over Dropout. TimeDistributed(Dropout(0.5))(x) and Dropout(0.5)(x) are equivalent.

If you are trying to drop the same set of nodes for all timesteps in a sequence, simply wrapping in TimeDistributed will not do the job. See my solution at #3995

sjebbara · 2016-10-26T07:44:50Z

Thanks for the pointer, @farizrahman4u .

The solutions that you suggested sadly do not apply to my real use case (which I simplified for this Issue). My actual goal is to have an inner model:

inner_in1 = Input(batch_shape=(batch_size, n_elements, element_size), name="inner_in1")
inner_output = GRU(2, dropout_U=0.5, dropout_W=0.5, return_sequences=False, name="gru")(inner_in1)
inner_model = Model(input=inner_in1, output=inner_output, name="inner_model")

that I use in a TimeDistributed() layer inside an outer model:

outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
outer_output = TimeDistributedModel(inner_model, name="distr")(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()

You could see this as a sentence model (inner_model) that I apply to each sentence in a document (outer_model). In this setup, the error appears when using dropout_W or dropout_U in the GRU.

Not specifying batch_size is not possible here, since these lines in the TimeDistributed() layer wouldn't make much sense with an RNN.

eyaler · 2016-10-26T14:00:38Z

@farizrahman4u as i reported in #4182

batch_size is required when using stateful rnn
what i want to get in timedistributed(dropout) is not the same dropout nodes for every timestep, but having every timestep drop exactly x% of nodes. without timedistributed you would get different fractions for different timesteps

farizrahman4u · 2016-10-26T14:39:00Z

@sjebbara There is no reason for you to provide the batch_size unless you are having a stateful RNN. Both rnn based and reshape based TimeDistributed implementations are strictly mathematically equivalent. (reshape based implementation being faster). If you still want to specify batch_size, here you go:

outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
TimeDistributedModel = TimeDistributed(inner_model, name="distr")
TimeDistributedModel.build((None,) + outer_in1._keras_shape[1:])
TimeDistributedModel.build = lambda *_: None
outer_output = TimeDistributedModel(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()

farizrahman4u · 2016-10-26T14:50:16Z

Similarly @eyaler,
To drop the exact number of nodes at every time step (when batch_size has to be provided becauses of stateful RNN):

model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))

dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None

model.add(dropout)

model.add(...)
model.add(...)

eyaler · 2016-10-26T16:54:07Z

thanks @farizrahman4u !

if reshape is faster why isn't it used also when batch_size is given?
how would your solution look using the functional api? my attempt failed on assert_input_compatibility(x)

farizrahman4u · 2016-10-26T17:06:02Z

If batch size is given, then it is possible that the layer being wrapped is a stateful RNN (or any layer which requires a static batch size). Since the reshape method messes with the batch dimension, we go for the rnn method instead
Maybe you forgot return_sequences=True

eyaler · 2016-10-26T17:52:14Z

got it!

from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np

x=np.zeros((100,20,10))
y=np.zeros((100,20,10))

model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None
model.add(dropout)
model.compile(optimizer='sgd', loss='mse')
model.fit(x,y,nb_epoch=1,batch_size=100)

input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + a.keras_shape[1:])
dropout.build = lambda *: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)

sjebbara · 2016-10-26T18:26:07Z

I think I misunderstood the reshape-based implementation. I was just about to point out why reshaping makes no sense with a distributed RNN layer, but then the pieces fell together 😆.

So the solution is simply to leave batch_size undefined?!
I will try that tomorrow.
Thanks all!

tati- · 2017-04-20T12:16:01Z

Hello!

Is there any update on this?
By the way for me it works with a tensorflow backend, but not with the theano one...

stale · 2017-07-19T13:10:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

brayan07 · 2017-10-04T20:06:30Z

I was having a similar issue with Tensorflow. Whenever I used the TimeDistributed wrapper on a model containing layers that used the learning phase, the resulting tensor would have the property _uses_learning_phase = False. This meant that when I created a final model containing that tensor, the model's _uses_learning_phase would incorrectly be set to False.

In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.

input_scan = Input(shape=(ANGLES,FINAL_WIDTH,FINAL_HEIGHT//2,CHANNELS))
#Time distributed model
sequenced_model = TimeDistributed(intermediate_model)(input_scan)

sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.

out = GlobalAveragePooling1D()(sequenced_model)
#Complete model
model = Model(input_scan,out)

Nimi42 · 2018-01-10T13:31:25Z

@eyaler
I can't get your functional example to work.

I tried with a Dense Layer instead of an LSTM. I get an error that says.
Tensors don't have keras_shape.

dropout.build((None,) + a.keras_shape[1:])

The other thing I tried was to have a Dense Layer as input to a dropout layer
wrapped by a timeDistributed layer.

input_1 = Input(batch_shape=(batch_size, seq_len, num_inputs))

x1 = Dense(32, activation='tanh')(input_1)
x1 = TimeDistributed(Dropout(0.5))(x1)

which ends with:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
	 [[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Either way will cause an exception.

What I want to do is sequence to sequence learning and I'd like to do it with the functional API.

That would be a timeDistributed dense layer on top of a LSTM if I understood correctly and
that works.

Having dropout would be the icing on the cake though.

Like @farizrahman4u said I'd like to drop the exact same number of nodes at every time step with
a stateful RNN.

Can anybody provide a pointer on how to do this with the functional API. I can't figure out this
build magic.

EDIT!:

I tried using

tuple(a.get_shape().as_list())[1:]

to make the snippet work.

from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np

x=np.zeros((100,20,10))
y=np.zeros((100,20,10))

input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + tuple(a.get_shape().as_list())[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)

Again it terminates with an exception. This time in the training phase:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
	 [[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

EDIT!:

Thanks @brayan07

your workaround fixed the issue and it compiles. I don't know if the dropout is applied correctly
though.

davideboschetto · 2018-09-17T07:05:53Z

sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.

This was the key to solve this for me, too.
The model contained in the timedistributed was indeed not training without this.

eyaler mentioned this issue Oct 25, 2016

timedistributed(dropout) error when batch_size is specified and using theano #4182

Closed

stale bot added the stale label Jul 19, 2017

stale bot closed this as completed Aug 18, 2017

rickdzekman mentioned this issue Nov 1, 2018

NameError name 'uses_learning_phase' is not defined when using Keras TimeDistributed with batch_shape on Input tensorflow/tensorflow#23425

Closed

digantamisra98 mentioned this issue Sep 16, 2019

{NameError}name 'uses_learning_phase' is not defined digantamisra98/Mish#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with TimeDistributed() and Learning Phase #4178

Problem with TimeDistributed() and Learning Phase #4178

sjebbara commented Oct 25, 2016 •

edited

kudkudak commented Oct 25, 2016

sjebbara commented Oct 25, 2016 •

edited

farizrahman4u commented Oct 25, 2016 •

edited

sjebbara commented Oct 26, 2016

eyaler commented Oct 26, 2016

farizrahman4u commented Oct 26, 2016 •

edited

farizrahman4u commented Oct 26, 2016 •

edited

eyaler commented Oct 26, 2016 •

edited

farizrahman4u commented Oct 26, 2016 •

edited

eyaler commented Oct 26, 2016

sjebbara commented Oct 26, 2016

tati- commented Apr 20, 2017

stale bot commented Jul 19, 2017

brayan07 commented Oct 4, 2017 •

edited

Nimi42 commented Jan 10, 2018 •

edited

davideboschetto commented Sep 17, 2018

Problem with TimeDistributed() and Learning Phase #4178

Problem with TimeDistributed() and Learning Phase #4178

Comments

sjebbara commented Oct 25, 2016 • edited

kudkudak commented Oct 25, 2016

sjebbara commented Oct 25, 2016 • edited

farizrahman4u commented Oct 25, 2016 • edited

sjebbara commented Oct 26, 2016

eyaler commented Oct 26, 2016

farizrahman4u commented Oct 26, 2016 • edited

farizrahman4u commented Oct 26, 2016 • edited

eyaler commented Oct 26, 2016 • edited

farizrahman4u commented Oct 26, 2016 • edited

eyaler commented Oct 26, 2016

sjebbara commented Oct 26, 2016

tati- commented Apr 20, 2017

stale bot commented Jul 19, 2017

brayan07 commented Oct 4, 2017 • edited

Nimi42 commented Jan 10, 2018 • edited

davideboschetto commented Sep 17, 2018

sjebbara commented Oct 25, 2016 •

edited

sjebbara commented Oct 25, 2016 •

edited

farizrahman4u commented Oct 25, 2016 •

edited

farizrahman4u commented Oct 26, 2016 •

edited

farizrahman4u commented Oct 26, 2016 •

edited

eyaler commented Oct 26, 2016 •

edited

farizrahman4u commented Oct 26, 2016 •

edited

brayan07 commented Oct 4, 2017 •

edited

Nimi42 commented Jan 10, 2018 •

edited