Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Specifying Initial States of RNN Layers #5795

Merged
merged 20 commits into from Apr 25, 2017

Conversation

Joshua-Chin
Copy link
Contributor

I have begun to review the changes made in da21c15. There appears to a number of issues

  • If initial_states is a list of tensors, it will not have the attribute _keras_history, and will be treated as a numerical value.
  • If initial_states is passed as a keyword argument to call, it will be ignored / overwritten.
  • state_spec may not be defined by the time it is used (state_spec is defined in build).
  • In reset_states, there is a check if input_spec is not None. However, input_spec is never None, because it is defined in __init__ to be InputSpec(ndim=3).
  • There is an inconsistency between using the singular and plural of initial_state and initial_states.
  • the signature for reset_states(state_values) is inconsistent with the signature for set_weights(weights)
  • test_specify_initial_states does not check if initial_states is part of the computational graph.

This commit does the following to fix those issues:

  • If initial_states is passed to __call__ it is always treated as a tensor / list of tensors. It a user wants to specify the state numerically, they should pass a numpy array to reset_states.
  • The layer will be build in __call__ before using state_spec.
  • The extraneous check of input_spec is removed.
  • The plural form initial_states is used throughout the code and documentation.
  • The signature for reset_states(state_values) is changed to reset_states(states).
  • test_specify_states explicitly checks if initial_states is added to the computational graph.

#5738

@fchollet
Copy link
Member

fchollet commented Mar 16, 2017

If initial_states is a list of tensors, it will not have the attribute _keras_history, and will be treated as a numerical value.

The proper behavior in that case would to check if all tensors in the list are Keras tensors. If it's mixed, raise an exception. If they're all Keras tensors, add them to the inputs. Else, use initial_state.

I am confused by what you are referring to as "numerical values". There are no numerical values involved (e.g. Numpy tensors), only symbolic tensors (e.g. TF tensors). Can you clarify?

@fchollet
Copy link
Member

The plural form initial_states is used throughout the code and documentation.

In the API we should use initial_state everywhere, for API consistency with TF. In the code we can use whatever.

@Joshua-Chin
Copy link
Contributor Author

The proper behavior in that case would to check if all tensors in the list are Keras tensors. If it's mixed, raise an exception. If they're all Keras tensors, add them to the inputs. Else, use initial_state.

It is unclear to me why we are checking if they are Keras tensors in the first place. If initial_state does not depend on the input, it must be a constant and should be specified in reset_states. In addition, it may cause issues for people using Keras as a layer library over TF / Theano.

I am confused by what you are referring to as "numerical values". There are no numerical values involved (e.g. Numpy tensors), only symbolic tensors (e.g. TF tensors). Can you clarify?

My apologies, I misspoke. In the current code, if initial_state is a list of tensors, it won't be treated as numerical values.
However, in the current code, the value of initial_state will be ignored. The code block from 263-269 in Recurrent.call will always overwrite the value of initial_state keyword argument.

In the API we should use initial_state everywhere, for API consistency with TF. In the code we can use whatever.

I am fine with using initial_state everywhere in the API. However, I feel that we should change the code in recurrent.py to always use initial_state. Switching between initial_state and initial_states adds an unnecessary overhead.

@fchollet
Copy link
Member

If initial_state does not depend on the input, it must be a constant and should be specified in reset_states. In addition, it may cause issues for people using Keras as a layer library over TF / Theano.

A non-Keras tensor set as initial state will generally not be a constant. It will simply be a non-Keras tensor, dependent or not on the underlying model's inputs.

else:
kwargs['initial_state'] = initial_state

# We need to build the layer so that state_spec exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should be delegated to the parent's __call__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, state_spec is defined in Recurrent.build. When, initial_state is passed, we need to build the layer so that we can use state_spec.

# Compute the full inputs, including state
if not isinstance(initial_state, (list, tuple)):
initial_state = [initial_state]
inputs = [inputs] + list(initial_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no longer any check that the initial states are Keras tensors, which will cause the model construction to fail when using non-Keras tensors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every other type of layer, model construction fail when using non-Keras tensors. Why are we making a special exception for the initial states of RNNs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not making an exception, RNN layers will behave like every other layer. We're just building an API. The API is that RNN(inputs, initial_state=x) should work as a way to set initial state tensors, independently of the value of x (Keras tensor or not). It's actually very simple to set up, via a switch between inputs and layer keyword arguments.

@Joshua-Chin
Copy link
Contributor Author

The latest commit moves the definition of state_spec to __init__, so we don't have redundant code in Recurrent.__call__ to build the layer. In addition, it properly handles the case where initial_state may be a list of Keras tensors.

@fchollet
Copy link
Member

Any volunteers to review this PR?

@smodlich
Copy link

Any updates on this? Setting the initial state seems to be an important component of any viable Seq2Seq-Model.

@farizrahman4u
Copy link
Contributor

Looks good. Please merge if no other issues.

@AMabona
Copy link

AMabona commented Apr 4, 2017

Just a heads up, this breaks because of masking in a seq2seq model with the tensorflow backend. This is because _collect_previous_masks returns a list (with the decoder RNN mask and None). This is ultimately passed to the tensorflow backend RNN function which throws an error because it expects a tensor as the mask, but gets a list.

Something similar will happen whenever you pass a tf tensor as the initial state and either the initial state or the RNN has a mask (so it would also affect image captioning, RNN VAEs etc).

In gratuitous detail:

  1. In Recurrent.__call__, the initial_state is added to inputs and Layer.__call__ is called on it.
  2. In Layer.__call__, previous_mask is computed, it's a list containing the decoder RNN mask and initial_state's mask. Recurrent.call is called with this mask.
  3. Recurrent.call calls K.rnn with the mask which falls over because it's a list.

@farizrahman4u
Copy link
Contributor

@fchollet I think it would be nice to have Keras topology handle optional inputs. First step would be to make it such a way that input spec for layers with multiple inputs is not a list, but a single InputSpec object (with additional attributes num_inputs, num_optional_inputs).

@farizrahman4u
Copy link
Contributor

@Joshua-Chin Have sent you a pull request to fix masking issue mentioned by @AMabona.

@fchollet
Copy link
Member

fchollet commented Apr 4, 2017

@farizrahman4u currently only this layer will require optional inputs, so an initial ad-hoc system is fine in this case. Later we can use what we learned from this ad-hoc implementation to write a more general system that can apply to all layers.

@fchollet
Copy link
Member

Could someone please review this PR?

@farizrahman4u
Copy link
Contributor

@fchollet Should reset_states be renamed to reset_state (with backward compatibility) for consistency?

@Joshua-Chin
Copy link
Contributor Author

@farizrahman4u @fchollet What's the status of the review?

@farizrahman4u
Copy link
Contributor

I think we are good.. fix the typo though.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

' non-Keras tensors')

if is_keras_tensor:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary blank line

@fchollet fchollet merged commit 365f621 into keras-team:master Apr 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants