Fix Specifying Initial States of RNN Layers #5795

Joshua-Chin · 2017-03-15T17:53:09Z

I have begun to review the changes made in da21c15. There appears to a number of issues

If initial_states is a list of tensors, it will not have the attribute _keras_history, and will be treated as a numerical value.
If initial_states is passed as a keyword argument to call, it will be ignored / overwritten.
state_spec may not be defined by the time it is used (state_spec is defined in build).
In reset_states, there is a check if input_spec is not None. However, input_spec is never None, because it is defined in __init__ to be InputSpec(ndim=3).
There is an inconsistency between using the singular and plural of initial_state and initial_states.
the signature for reset_states(state_values) is inconsistent with the signature for set_weights(weights)
test_specify_initial_states does not check if initial_states is part of the computational graph.

This commit does the following to fix those issues:

If initial_states is passed to __call__ it is always treated as a tensor / list of tensors. It a user wants to specify the state numerically, they should pass a numpy array to reset_states.
The layer will be build in __call__ before using state_spec.
The extraneous check of input_spec is removed.
The plural form initial_states is used throughout the code and documentation.
The signature for reset_states(state_values) is changed to reset_states(states).
test_specify_states explicitly checks if initial_states is added to the computational graph.

Fix when initial_states is a tensor

fchollet · 2017-03-16T17:33:22Z

If initial_states is a list of tensors, it will not have the attribute _keras_history, and will be treated as a numerical value.

The proper behavior in that case would to check if all tensors in the list are Keras tensors. If it's mixed, raise an exception. If they're all Keras tensors, add them to the inputs. Else, use initial_state.

I am confused by what you are referring to as "numerical values". There are no numerical values involved (e.g. Numpy tensors), only symbolic tensors (e.g. TF tensors). Can you clarify?

fchollet · 2017-03-16T17:47:23Z

The plural form initial_states is used throughout the code and documentation.

In the API we should use initial_state everywhere, for API consistency with TF. In the code we can use whatever.

Joshua-Chin · 2017-03-16T21:00:16Z

The proper behavior in that case would to check if all tensors in the list are Keras tensors. If it's mixed, raise an exception. If they're all Keras tensors, add them to the inputs. Else, use initial_state.

It is unclear to me why we are checking if they are Keras tensors in the first place. If initial_state does not depend on the input, it must be a constant and should be specified in reset_states. In addition, it may cause issues for people using Keras as a layer library over TF / Theano.

I am confused by what you are referring to as "numerical values". There are no numerical values involved (e.g. Numpy tensors), only symbolic tensors (e.g. TF tensors). Can you clarify?

My apologies, I misspoke. In the current code, if initial_state is a list of tensors, it won't be treated as numerical values.
However, in the current code, the value of initial_state will be ignored. The code block from 263-269 in Recurrent.call will always overwrite the value of initial_state keyword argument.

In the API we should use initial_state everywhere, for API consistency with TF. In the code we can use whatever.

I am fine with using initial_state everywhere in the API. However, I feel that we should change the code in recurrent.py to always use initial_state. Switching between initial_state and initial_states adds an unnecessary overhead.

fchollet · 2017-03-16T22:13:07Z

If initial_state does not depend on the input, it must be a constant and should be specified in reset_states. In addition, it may cause issues for people using Keras as a layer library over TF / Theano.

A non-Keras tensor set as initial state will generally not be a constant. It will simply be a non-Keras tensor, dependent or not on the underlying model's inputs.

fchollet · 2017-03-16T22:47:29Z

keras/layers/recurrent.py

-            else:
-                kwargs['initial_state'] = initial_state
+
+            # We need to build the layer so that state_spec exists.


All of this should be delegated to the parent's __call__.

Currently, state_spec is defined in Recurrent.build. When, initial_state is passed, we need to build the layer so that we can use state_spec.

fchollet · 2017-03-16T22:49:49Z

keras/layers/recurrent.py

+            # Compute the full inputs, including state
+            if not isinstance(initial_state, (list, tuple)):
+                initial_state = [initial_state]
+            inputs = [inputs] + list(initial_state)


There is no longer any check that the initial states are Keras tensors, which will cause the model construction to fail when using non-Keras tensors.

For every other type of layer, model construction fail when using non-Keras tensors. Why are we making a special exception for the initial states of RNNs?

We're not making an exception, RNN layers will behave like every other layer. We're just building an API. The API is that RNN(inputs, initial_state=x) should work as a way to set initial state tensors, independently of the value of x (Keras tensor or not). It's actually very simple to set up, via a switch between inputs and layer keyword arguments.

Joshua-Chin · 2017-03-17T16:28:32Z

The latest commit moves the definition of state_spec to __init__, so we don't have redundant code in Recurrent.__call__ to build the layer. In addition, it properly handles the case where initial_state may be a list of Keras tensors.

fchollet · 2017-03-17T17:35:47Z

Any volunteers to review this PR?

smodlich · 2017-03-27T17:16:15Z

Any updates on this? Setting the initial state seems to be an important component of any viable Seq2Seq-Model.

farizrahman4u · 2017-04-04T14:02:43Z

Looks good. Please merge if no other issues.

AMabona · 2017-04-04T14:04:13Z

Just a heads up, this breaks because of masking in a seq2seq model with the tensorflow backend. This is because _collect_previous_masks returns a list (with the decoder RNN mask and None). This is ultimately passed to the tensorflow backend RNN function which throws an error because it expects a tensor as the mask, but gets a list.

Something similar will happen whenever you pass a tf tensor as the initial state and either the initial state or the RNN has a mask (so it would also affect image captioning, RNN VAEs etc).

In gratuitous detail:

In Recurrent.__call__, the initial_state is added to inputs and Layer.__call__ is called on it.
In Layer.__call__, previous_mask is computed, it's a list containing the decoder RNN mask and initial_state's mask. Recurrent.call is called with this mask.
Recurrent.call calls K.rnn with the mask which falls over because it's a list.

farizrahman4u · 2017-04-04T14:05:21Z

@fchollet I think it would be nice to have Keras topology handle optional inputs. First step would be to make it such a way that input spec for layers with multiple inputs is not a list, but a single InputSpec object (with additional attributes num_inputs, num_optional_inputs).

farizrahman4u · 2017-04-04T14:14:45Z

@Joshua-Chin Have sent you a pull request to fix masking issue mentioned by @AMabona.

fchollet · 2017-04-04T19:43:44Z

@farizrahman4u currently only this layer will require optional inputs, so an initial ad-hoc system is fine in this case. Later we can use what we learned from this ad-hoc implementation to write a more general system that can apply to all layers.

fchollet · 2017-04-10T00:37:07Z

Could someone please review this PR?

farizrahman4u · 2017-04-10T01:47:04Z

@fchollet Should reset_states be renamed to reset_state (with backward compatibility) for consistency?

Joshua-Chin · 2017-04-16T16:39:06Z

@farizrahman4u @fchollet What's the status of the review?

farizrahman4u · 2017-04-19T10:17:53Z

I think we are good.. fix the typo though.

fchollet

Looks good to me

fchollet · 2017-04-24T21:06:47Z

keras/layers/recurrent.py

+                                 ' non-Keras tensors')
+
+        if is_keras_tensor:
+


Unnecessary blank line

Joshua-Chin and others added 8 commits March 15, 2017 13:18

fix specify state

06a3bd5

Added documentation for reset_states

ba82d55

Remove unneeded check

b1d14c8

Update Documentation

944b29d

pep8

1035ae1

Fix when initial_states is a tensor

19bfb7a

Merge pull request #2 from israelg99/patch-2

ba2f347

Fix when initial_states is a tensor

modify tests for non-list initial states.

77bab36

use initial_state instead of initial_states

e22d676

Joshua-Chin added 2 commits March 16, 2017 18:33

pep8

37073cf

change get_initial_states to get_initial_state in ConvLSTM2D

6891ebf

fchollet reviewed Mar 16, 2017

View reviewed changes

Joshua-Chin added 5 commits March 16, 2017 19:32

Check for Keras Tensors in Recurrent

084f96a

check if initial_state is passed to call

8227bd7

pep8

9b6be95

Move state_spec definition to __init__

197d5a8

Fix reset states

bb4657c

smodlich mentioned this pull request Mar 24, 2017

[Seq2Seq] Specifying tensor as initial state in GRU #5971

Closed

Joshua-Chin mentioned this pull request Mar 26, 2017

Recurrent layer not accepting multiple inputs list despite source code seemingly handling such case #5986

Closed

Joshua-Chin mentioned this pull request Apr 4, 2017

Bug : Specifying initial state for Recurrent layers #6142

Closed

Joshua-Chin added 3 commits April 4, 2017 15:24

fix masking when specifying state

94b4a90

added masking test for RNNs with specified state

8c613a0

pep8

1776d15

fchollet approved these changes Apr 24, 2017

View reviewed changes

keras/layers/recurrent.py Outdated

' non-Keras tensors')

if is_keras_tensor:

Copy link

Member

fchollet Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary blank line

remove unnecessary blank line

2f1671a

fchollet merged commit 365f621 into keras-team:master Apr 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Specifying Initial States of RNN Layers #5795

Fix Specifying Initial States of RNN Layers #5795

Joshua-Chin commented Mar 15, 2017

fchollet commented Mar 16, 2017 •

edited

fchollet commented Mar 16, 2017

Joshua-Chin commented Mar 16, 2017

fchollet commented Mar 16, 2017

fchollet Mar 16, 2017

Joshua-Chin Mar 16, 2017

fchollet Mar 16, 2017

Joshua-Chin Mar 16, 2017

fchollet Mar 16, 2017

Joshua-Chin commented Mar 17, 2017

fchollet commented Mar 17, 2017

smodlich commented Mar 27, 2017

farizrahman4u commented Apr 4, 2017

AMabona commented Apr 4, 2017 •

edited

farizrahman4u commented Apr 4, 2017

farizrahman4u commented Apr 4, 2017

fchollet commented Apr 4, 2017

fchollet commented Apr 10, 2017

farizrahman4u commented Apr 10, 2017

Joshua-Chin commented Apr 16, 2017

farizrahman4u commented Apr 19, 2017

fchollet left a comment

fchollet Apr 24, 2017

Fix Specifying Initial States of RNN Layers #5795

Fix Specifying Initial States of RNN Layers #5795

Conversation

Joshua-Chin commented Mar 15, 2017

fchollet commented Mar 16, 2017 • edited

fchollet commented Mar 16, 2017

Joshua-Chin commented Mar 16, 2017

fchollet commented Mar 16, 2017

fchollet Mar 16, 2017

Choose a reason for hiding this comment

Joshua-Chin Mar 16, 2017

Choose a reason for hiding this comment

fchollet Mar 16, 2017

Choose a reason for hiding this comment

Joshua-Chin Mar 16, 2017

Choose a reason for hiding this comment

fchollet Mar 16, 2017

Choose a reason for hiding this comment

Joshua-Chin commented Mar 17, 2017

fchollet commented Mar 17, 2017

smodlich commented Mar 27, 2017

farizrahman4u commented Apr 4, 2017

AMabona commented Apr 4, 2017 • edited

farizrahman4u commented Apr 4, 2017

farizrahman4u commented Apr 4, 2017

fchollet commented Apr 4, 2017

fchollet commented Apr 10, 2017

farizrahman4u commented Apr 10, 2017

Joshua-Chin commented Apr 16, 2017

farizrahman4u commented Apr 19, 2017

fchollet left a comment

Choose a reason for hiding this comment

fchollet Apr 24, 2017

Choose a reason for hiding this comment

fchollet commented Mar 16, 2017 •

edited

AMabona commented Apr 4, 2017 •

edited