Skip to content

Function merge_padding_and_attention_mask does not return an output with the desired shape when both padding and attention masks are given #783

@abodinier

Description

@abodinier

Describe the bug
Function merge_padding_and_attention_mask does not return an output with the desired shape when both padding and attention masks are given.
Capture d’écran 2023-02-26 à 20 09 54
In this example, the output should be of size (10, 20, 20), instead, it is (10, 10, 20, 20).

This is because a tf.newaxis is applied two times lines 70 https://github.com/keras-team/keras-nlp/blob/1eeec3b6d5de3796d1dbc28c4930c8189bac80c3/keras_nlp/layers/transformer_layer_utils.py#L70 and 77 https://github.com/keras-team/keras-nlp/blob/1eeec3b6d5de3796d1dbc28c4930c8189bac80c3/keras_nlp/layers/transformer_layer_utils.py#L77.

A fix to me would be to return return tf.minimum(mask, attention_mask) instead of tf.minimum(mask[:, tf.newaxis, :],attention_mask)

To Reproduce
Cf screenshot

Expected behavior
The function should return a tensor of shape (batch_size, S, T)

Additional context

Would you like to help us fix it?
Yes I would!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions