New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models do not propagate symbolic masks when called with symbolic inputs. #18417
Comments
That's a private API, so consistency with tf.keras is not a concern here -- what did you need the symbolic masks for? They don't even exist in the functional API, they're only computed at runtime. |
@fchollet this is relevant when building compound models, e.g. a transformer with separate Minimal example demonstrating non-private difference: import keras_core as keras
inp = keras.Input((3,))
x = keras.layers.Embedding(10, 100, mask_zero=True)(inp)
x = keras.layers.Conv1D(3, 3)(x)
model = keras.Model(inp, x)
z1 = model(inp)**2
model1 = keras.Model(inp, z1)
z2 = x**2
model2 = keras.Model(inp, z2)
This is obviously a highly contribed example, but there's a realistic example here, which is a port of the keras-nlp spanish-to-english translation transformer example to use Also, I'm not sure what you mean by "they don't even exist in the functional API". The result of calling an |
I don't understand the nature of the difference by reading the code, can you explain? |
@fchollet apologies, I'm ballsing this up by trying to make it a minimal example - the above code does indeed seem to behave identically. Below is the most minimal example I can get that illustrates a difference. Apologies for not being able to simplify futher. Note this may mean the error is from import os
os.environ["KERAS_BACKEND"] = "tensorflow" # ensure keras imports are consistent
os.environ["CUDA_VISIBLE_DEVICES"] = "" # I have other models training...
import keras_core as keras
import keras_nlp
import numpy as np
ENG_VOCAB_SIZE = 3
SPA_VOCAB_SIZE = 11
MAX_SEQUENCE_LENGTH = 5
EMBED_DIM = 2
INTERMEDIATE_DIM = 7
NUM_HEADS = 1
def build_without_component_models():
encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="encoder_inputs")
x = keras_nlp.layers.TokenAndPositionEmbedding(
vocabulary_size=ENG_VOCAB_SIZE,
sequence_length=MAX_SEQUENCE_LENGTH,
embedding_dim=EMBED_DIM,
mask_zero=True,
)(encoder_inputs)
encoder_outputs = keras_nlp.layers.TransformerEncoder(
intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS
)(x)
# Decoder
decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="decoder_inputs")
x = keras_nlp.layers.TokenAndPositionEmbedding(
vocabulary_size=SPA_VOCAB_SIZE,
sequence_length=MAX_SEQUENCE_LENGTH,
embedding_dim=EMBED_DIM,
mask_zero=True,
)(decoder_inputs)
x = keras_nlp.layers.TransformerDecoder(
intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS
)(decoder_sequence=x, encoder_sequence=encoder_outputs)
x = keras.layers.Dropout(0.5)(x)
decoder_outputs = keras.layers.Dense(SPA_VOCAB_SIZE, activation="softmax")(x)
transformer = keras.Model(
(encoder_inputs, decoder_inputs),
decoder_outputs,
name="transformer",
)
transformer.summary()
return transformer
def build_with_component_models():
encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="encoder_inputs")
x = keras_nlp.layers.TokenAndPositionEmbedding(
vocabulary_size=ENG_VOCAB_SIZE,
sequence_length=MAX_SEQUENCE_LENGTH,
embedding_dim=EMBED_DIM,
mask_zero=True,
)(encoder_inputs)
encoder_outputs = keras_nlp.layers.TransformerEncoder(
intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS
)(x)
# Decoder
decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="decoder_inputs")
encoded_seq_inputs = keras.Input(
shape=(None, EMBED_DIM), name="decoder_state_inputs"
)
x = keras_nlp.layers.TokenAndPositionEmbedding(
vocabulary_size=SPA_VOCAB_SIZE,
sequence_length=MAX_SEQUENCE_LENGTH,
embedding_dim=EMBED_DIM,
mask_zero=True,
)(decoder_inputs)
x = keras_nlp.layers.TransformerDecoder(
intermediate_dim=INTERMEDIATE_DIM, num_heads=NUM_HEADS
)(decoder_sequence=x, encoder_sequence=encoded_seq_inputs)
x = keras.layers.Dropout(0.5)(x)
decoder_outputs = keras.layers.Dense(SPA_VOCAB_SIZE, activation="softmax")(x)
decoder = keras.Model(
(decoder_inputs, encoded_seq_inputs),
decoder_outputs,
)
decoder_outputs = decoder([decoder_inputs, encoder_outputs])
transformer = keras.Model(
(encoder_inputs, decoder_inputs),
decoder_outputs,
name="transformer",
)
transformer.summary()
return transformer
model = build_with_component_models()
# model = build_without_component_models()
encoder_inputs = np.array([[1, 1, 1, 2, 0]])
decoder_inputs = np.array([[1, 2, 4, 3, 0]])
print(model((encoder_inputs, decoder_inputs))) The code runs fine using
Maybe the issue can be traced to the warnings, but the fact that it behaves differently compared to the non-component-model implementation seems fishy. Note even the non-error-raising implementation trains differently compared to what happens if you use |
Here's a more minimal example without using only import numpy as np
## use either of the following pairs or imports
# from tensorflow import keras
# from tensorflow import logical_and
import keras_core as keras
from keras_core.ops import logical_and
class MySum(keras.layers.Layer):
def __init__(self):
super().__init__()
self.supports_masking = True
def call(self, inputs, mask=None):
a, b = inputs
return a + b
def compute_output_shape(self, input_shape):
a, b = input_shape
assert a == b
return a
def compute_mask(self, inputs, previous_mask):
if previous_mask is None:
return None
a, b = previous_mask
return logical_and(a, b)
embedding = keras.layers.Embedding(3, 5, mask_zero=True)
inp = keras.Input((3,))
out = embedding(inp)
# embedding_model is a Model wrapper around just embedding layer
embedding_model = keras.Model(inp, out)
# construct a model without model components
out1 = embedding(inp)
s = MySum()((out, out1))
model1 = keras.Model(inp, s)
# construct a model with model components
out2 = embedding_model(inp)
s = MySum()((out, out2)) # <- error here with keras_core implementation
model2 = keras.Model(inp, s)
x = np.array([[1, 1, 0]], dtype="int64")
print(embedding(x))
print(model1(x))
print(model2(x))
|
Thanks for the code snippet. Are you sure this is the same issue, though? This is a known behavior difference in Keras Core: it does not allow def call(self, a, b, mask_a=None, mask_b=None): We can also look at lifting the limitation entirely, though that would require quite a bit of work. |
I'm fairly confident it's the same issue. Not-withstanding the work-around, it's surprising to have an error raised as the result of wrapping a layer in a |
Describe the bug
Models (both functional and sequential) do not propagate symbolic masks when called with symbolic inputs.
Example:
Expected behavior
To be consistent with
tf.keras
Additional context
This came up in keras-nlp example porting issue.
The text was updated successfully, but these errors were encountered: