Function merge_padding_and_attention_mask does not return an output with the desired shape when both padding and attention masks are given

**Describe the bug**
Function merge_padding_and_attention_mask does not return an output with the desired shape when both padding and attention masks are given.
![Capture d’écran 2023-02-26 à 20 09 54](https://user-images.githubusercontent.com/70907397/221431533-47976532-c541-46ac-a518-f2ebc058f442.png)
In this example, the output should be of size (10, 20, 20), instead, it is (10, 10, 20, 20).

This is because a tf.newaxis is applied two times lines 70 https://github.com/keras-team/keras-nlp/blob/1eeec3b6d5de3796d1dbc28c4930c8189bac80c3/keras_nlp/layers/transformer_layer_utils.py#L70 and 77 https://github.com/keras-team/keras-nlp/blob/1eeec3b6d5de3796d1dbc28c4930c8189bac80c3/keras_nlp/layers/transformer_layer_utils.py#L77.

A fix to me would be to return return `tf.minimum(mask, attention_mask)` instead of `tf.minimum(mask[:, tf.newaxis, :],attention_mask)`


**To Reproduce**
Cf screenshot

**Expected behavior**
The function should return a tensor of shape (batch_size, S, T)

**Additional context**


**Would you like to help us fix it?**
Yes I would!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Function merge_padding_and_attention_mask does not return an output with the desired shape when both padding and attention masks are given #783

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function merge_padding_and_attention_mask does not return an output with the desired shape when both padding and attention masks are given #783

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions