-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Exception encountered when calling RNN.call() - Undefined shapes are not supported #19482
Comments
Yes, that's right. The second model would return 4 state tensors (2 per cell).
No, states are initialized at zero by each cell. To get a non-zero state you would have to pass the initial state when calling the layer.
Yes, that's actually a bug. I've fixed it at HEAD. Check that it works for you. Note that since your layer returns inputs = keras.Input(shape=(5, 10))
outputs, cell_1_states, cell_2_states = keras.layers.RNN(
[keras.layers.LSTMCell(10), keras.layers.LSTMCell(10)],
return_state=True,
)(inputs)
model = keras.Model(inputs, [outputs] + cell_1_states + cell_2_states)
model.summary() |
Hi @fchollet, thanks for the quick fix but I can't find it at So from keras.layers import RNN, LSTM, LSTMCell
inputs = keras.Input(shape=(5, 10))
first_lstm_layer_out, *cell_states = LSTM(10, return_sequences=True, return_state=True)(inputs)
second_lstm_layer_out = LSTM(10)(first_lstm_layer_out, initial_state=cell_states)
model_a = keras.Model(inputs, second_lstm_layer_out)
model_a.summary()
inputs = keras.Input(shape=(5, 10))
stacked_lstm_outputs = RNN([LSTMCell(10), LSTMCell(10)])(inputs)
model_b = keras.Model(inputs, stacked_lstm_outputs)
model_b.summary() |
Yes, there's only one gotcha: Functional model inputs/outputs must be flat structured, and here |
Hi everyone,
I have updated Keras to the last release (3.2.1).
Looking at this snippet:
If I print the summary without
return_state=True
they have the same number of parameters:But with the flag I have this error:
Is this a Keras issue?
From what I understand the main difference should be that
model_b
returns the states of both LSTM layers whilemodel_a
returns only the final ones (as expected). But in the stacked implementation ofmodel_b
are the states of the first layer used to initialize the states of the second one?Thanks for your support again.
The text was updated successfully, but these errors were encountered: