Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with TimeDistributed() and Learning Phase #4178

Closed
sjebbara opened this issue Oct 25, 2016 · 16 comments
Closed

Problem with TimeDistributed() and Learning Phase #4178

sjebbara opened this issue Oct 25, 2016 · 16 comments

Comments

@sjebbara
Copy link

sjebbara commented Oct 25, 2016

(EDIT: The following issue is only a minimal example of how to produce the error. My actual goal is to use a more complicated model instead of Dropout() here.)

When executing the following script a MissingInputError occurs:

from keras.models import Model
from keras.layers import Input, TimeDistributed, Dropout

in1 = Input(batch_shape=(10, 8, 6), name="in1")
out1 = TimeDistributed(Dropout(0.5))(in1)

model = Model(input=in1, output=out1)
model.compile("adam", "mse")
model._make_predict_function()

This is the simplest model that produces the error (In my original architecture, I tried to distribute a more complex model). The same issue occurs when replacing the Dropout() layer with e.g. GaussianNoise(), GRU(dropout_W=0.5), but not for e.g. Dense(). I think the error boils down to the combination of TimeDistributed() and any layer (or model) that uses the learning phase.

Maybe there is a conceptual problem with TimeDistributed() and the learning phase input?

These issues seem to be somewhat related: #3834, #2609, #3686, #2391

The full stack trace is this:

... 
  File "/homes/sjebbara/git/keras-original/keras/engine/training.py", line 752, in _make_predict_function
    **kwargs)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 787, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 773, in __init__
    **kwargs)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1776, in orig_function
    output_keys=output_keys).create(
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1430, in __init__
    accept_inplace)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 176, in std_fgraph
    update_mapping=update_mapping)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 180, in __init__
    self.__import_r__(output, reason="init")
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 351, in __import_r__
    self.__import__(variable.owner, reason=reason)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 396, in __import__
    variable=r)
theano.gof.fg.MissingInputError: An input of the graph, used to compute Shape(<TensorType(float32, matrix)>), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.

Backtrace when the variable is created:
  File "/homes/sjebbara/PyCharmProjects/NeuralSentiment/src/Test2.py", line 5, in <module>
    out1 = TimeDistributed(Dropout(0.5))(in1)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 149, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/homes/sjebbara/git/keras-original/keras/layers/wrappers.py", line 131, in call
    initial_states=[], input_length=input_length, unroll=unroll)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 947, in rnn
    go_backwards=go_backwards)

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • [ x] Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • [x ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
  • [x ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
@kudkudak
Copy link
Contributor

I think you have incorrectly applied TimeDistributed to Dropout.

TimeDistributed(Dropout(0.5))(in1) should be TimeDistributed(Dropout(0.5)(in1))(in1)

@sjebbara
Copy link
Author

sjebbara commented Oct 25, 2016

Thanks for the reply.
I am quiet sure that the TimeDistributed() layer expects a Layer object and not a tensor (which Dropout(0.5)(in1) would return).
Also, when changing
out1 = TimeDistributed(Dropout(0.5))(in1) to
out1 = TimeDistributed(Dense(10))(in1)
everything works fine.

@farizrahman4u
Copy link
Contributor

farizrahman4u commented Oct 25, 2016

This is a bug in theano when RandomStreams are present inside a scan op. See: https://groups.google.com/forum/#!topic/theano-users/8diyZjq6ngc

Solutions :

  • Don't provide batch_size or,
  • Don't use TimeDistributed over Dropout. TimeDistributed(Dropout(0.5))(x) and Dropout(0.5)(x) are equivalent.

If you are trying to drop the same set of nodes for all timesteps in a sequence, simply wrapping in TimeDistributed will not do the job. See my solution at #3995

@sjebbara
Copy link
Author

Thanks for the pointer, @farizrahman4u .

The solutions that you suggested sadly do not apply to my real use case (which I simplified for this Issue). My actual goal is to have an inner model:

inner_in1 = Input(batch_shape=(batch_size, n_elements, element_size), name="inner_in1")
inner_output = GRU(2, dropout_U=0.5, dropout_W=0.5, return_sequences=False, name="gru")(inner_in1)
inner_model = Model(input=inner_in1, output=inner_output, name="inner_model")

that I use in a TimeDistributed() layer inside an outer model:

outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
outer_output = TimeDistributedModel(inner_model, name="distr")(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()

You could see this as a sentence model (inner_model) that I apply to each sentence in a document (outer_model). In this setup, the error appears when using dropout_W or dropout_U in the GRU.

Not specifying batch_size is not possible here, since these lines in the TimeDistributed() layer wouldn't make much sense with an RNN.

@eyaler
Copy link

eyaler commented Oct 26, 2016

@farizrahman4u as i reported in #4182

  1. batch_size is required when using stateful rnn
  2. what i want to get in timedistributed(dropout) is not the same dropout nodes for every timestep, but having every timestep drop exactly x% of nodes. without timedistributed you would get different fractions for different timesteps

@farizrahman4u
Copy link
Contributor

farizrahman4u commented Oct 26, 2016

@sjebbara There is no reason for you to provide the batch_size unless you are having a stateful RNN. Both rnn based and reshape based TimeDistributed implementations are strictly mathematically equivalent. (reshape based implementation being faster). If you still want to specify batch_size, here you go:

outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
TimeDistributedModel = TimeDistributed(inner_model, name="distr")
TimeDistributedModel.build((None,) + outer_in1._keras_shape[1:])
TimeDistributedModel.build = lambda *_: None
outer_output = TimeDistributedModel(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()

@farizrahman4u
Copy link
Contributor

farizrahman4u commented Oct 26, 2016

Similarly @eyaler,
To drop the exact number of nodes at every time step (when batch_size has to be provided becauses of stateful RNN):

model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))

dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None

model.add(dropout)

model.add(...)
model.add(...)

@eyaler
Copy link

eyaler commented Oct 26, 2016

thanks @farizrahman4u !

  1. if reshape is faster why isn't it used also when batch_size is given?
  2. how would your solution look using the functional api? my attempt failed on assert_input_compatibility(x)

@farizrahman4u
Copy link
Contributor

farizrahman4u commented Oct 26, 2016

  1. If batch size is given, then it is possible that the layer being wrapped is a stateful RNN (or any layer which requires a static batch size). Since the reshape method messes with the batch dimension, we go for the rnn method instead
  2. Maybe you forgot return_sequences=True

@eyaler
Copy link

eyaler commented Oct 26, 2016

got it!

from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np

x=np.zeros((100,20,10))
y=np.zeros((100,20,10))

model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None
model.add(dropout)
model.compile(optimizer='sgd', loss='mse')
model.fit(x,y,nb_epoch=1,batch_size=100)

input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + a.keras_shape[1:])
dropout.build = lambda *
: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)

@sjebbara
Copy link
Author

I think I misunderstood the reshape-based implementation. I was just about to point out why reshaping makes no sense with a distributed RNN layer, but then the pieces fell together 😆.

So the solution is simply to leave batch_size undefined?!
I will try that tomorrow.
Thanks all!

@tati-
Copy link

tati- commented Apr 20, 2017

Hello!

Is there any update on this?
By the way for me it works with a tensorflow backend, but not with the theano one...

@stale stale bot added the stale label Jul 19, 2017
@stale
Copy link

stale bot commented Jul 19, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot closed this as completed Aug 18, 2017
@brayan07
Copy link

brayan07 commented Oct 4, 2017

I was having a similar issue with Tensorflow. Whenever I used the TimeDistributed wrapper on a model containing layers that used the learning phase, the resulting tensor would have the property _uses_learning_phase = False. This meant that when I created a final model containing that tensor, the model's _uses_learning_phase would incorrectly be set to False.

In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.

input_scan = Input(shape=(ANGLES,FINAL_WIDTH,FINAL_HEIGHT//2,CHANNELS))
#Time distributed model
sequenced_model = TimeDistributed(intermediate_model)(input_scan)

sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.

out = GlobalAveragePooling1D()(sequenced_model)
#Complete model
model = Model(input_scan,out)

@Nimi42
Copy link

Nimi42 commented Jan 10, 2018

@eyaler
I can't get your functional example to work.

I tried with a Dense Layer instead of an LSTM. I get an error that says.
Tensors don't have keras_shape.

dropout.build((None,) + a.keras_shape[1:])

The other thing I tried was to have a Dense Layer as input to a dropout layer
wrapped by a timeDistributed layer.

input_1 = Input(batch_shape=(batch_size, seq_len, num_inputs))

x1 = Dense(32, activation='tanh')(input_1)
x1 = TimeDistributed(Dropout(0.5))(x1)

which ends with:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
	 [[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Either way will cause an exception.

What I want to do is sequence to sequence learning and I'd like to do it with the functional API.

That would be a timeDistributed dense layer on top of a LSTM if I understood correctly and
that works.

Having dropout would be the icing on the cake though.

Like @farizrahman4u said I'd like to drop the exact same number of nodes at every time step with
a stateful RNN.

Can anybody provide a pointer on how to do this with the functional API. I can't figure out this
build magic.


EDIT!:

I tried using

tuple(a.get_shape().as_list())[1:]

to make the snippet work.

from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np

x=np.zeros((100,20,10))
y=np.zeros((100,20,10))

input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + tuple(a.get_shape().as_list())[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)

Again it terminates with an exception. This time in the training phase:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
	 [[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

EDIT!:

Thanks @brayan07

your workaround fixed the issue and it compiles. I don't know if the dropout is applied correctly
though.

@davideboschetto
Copy link

sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.

This was the key to solve this for me, too.
The model contained in the timedistributed was indeed not training without this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants