Issue 182: Modified TransformerDecoder with optional parameter #217

jessechancy · 2022-06-01T23:14:21Z

Made encoder sequence an optional parameter, added testing for this change.

…d and edited tests

google-cla · 2022-06-01T23:14:25Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

chenmoneygithub

Thanks for the PR! Dropped some comments.

chenmoneygithub · 2022-06-02T00:01:29Z

keras_nlp/layers/transformer_decoder.py

-        )
+
+        if encoder_sequence is not None:
+            # Encoder-decoder attention.


Move this comment over self._encoder_decoder_attention_layer

chenmoneygithub · 2022-06-02T00:03:33Z

keras_nlp/layers/transformer_decoder.py

+                self._feedforward_layernorm,
+            )
+        else:
+            # Skip Encoder-Decoder attention, Feedforward.


this is a bit confusing - the comma "," could suggest Feedforward is skipped as well.

Maybe just say "# Skip Encoder-Decoder attention if no encoder_sequence is provided."?

keras_nlp/layers/transformer_decoder.py

chenmoneygithub · 2022-06-02T00:05:40Z

keras_nlp/layers/transformer_decoder_test.py

+        output = decoder(decoder_input)
+        model = keras.Model(
+            inputs=decoder_input,
+            outputs = output,


remove the space surrounding "=" => outputs=output

chenmoneygithub · 2022-06-02T00:06:42Z

keras_nlp/layers/transformer_decoder_test.py

            use_causal_mask=True,
        )

+    def test_valid_call_without_encoder_with_mask(self):


We can delete this test case because it is covered by test_valid_call_with_mask

…tion of decoder only

jessechancy · 2022-06-02T18:14:25Z

Right now there are two things that indicate decoder only but can conflict. The first one is the decoder_only attribute that is passed in an initialized. The second is implicit in the optional parameter of encoder_sequence. These are the behaviors right now for these two things:

decoder_only = True and encoder_seq is not None: ignore building encoder layers and don't run encoder_seq
decoder_only = True and encoder_seq is None: ignore building encoder layers and don't run encoder_seq
decoder_only = False and encoder_seq is not None: build encoder layers and run encoder_seq through model
decoder_only = False and encoder_seq is None: build encoder layers but don't use layers and don't run encoder_seq through model

I added this comment to the docstring
"""
If decoder_only is set to True, the encoder layer would not be built,
the encoder output would not be used in TransformerDecoder and ignored.
If decoder_only is set to False, but no encoder sequence is provided,
TransformerDecoder would run as decoder only.
"""

Let me know if this is enough to explain and whether it is intuitive, or if I should make any changes, thanks!

chenmoneygithub · 2022-06-02T18:40:53Z

@jessechancy Yea, we should throw an explicit error message to our users if the two places contradict:

decoder_only = True and encoder_seq is not None: raise ValueError("encoder_seq should be None for decoder-only models").
decoder_only = True and encoder_seq is None: allowed case.
decoder_only = False and encoder_seq is not None: allowed case.
decoder_only = False and encoder_seq is None: raise ValueError("encoder_seq is empty ...")

@mattdangerw Does this look good to you?

jessechancy · 2022-06-07T22:11:24Z

Example for TransformerDecoder usage:
https://colab.sandbox.google.com/gist/jessechancy/e0810cdf2d6c3c6525db0beb28e9240b/transformerdecoderexample.ipynb

…hether encoder_sequence input is recieved

mattdangerw · 2022-06-09T21:11:37Z

keras_nlp/layers/transformer_decoder.py

-            self._feedforward_layernorm,
-        )
+
+        if self._encoder_decoder_attention_layer is None:


One minor comment. It might be nice if you rename the self_attended variable to attention_output, and do something like this.

attention_output = self._add_and_norm(...) if encoder_sequence is not None: ... cross attention ... attention_output = self._add_and_norm(...) feed_forward_output = self._feed_forward(attention_output) return self._add_and_norm(...)

So basically bring this back to the single return statement. As a reader, that would make it much clearer how the computation is flowing overall with and without encoder_sequence.

mattdangerw

Thanks! This looks great. Left a few comments, mostly minor

mattdangerw · 2022-06-09T22:13:36Z

keras_nlp/layers/transformer_decoder.py

    [guide](https://keras.io/guides/understanding_masking_and_padding/)
    for more details.

+    If decoder_only is set to True, the encoder layer would not be built, 


We have removed this argument. We should remove docs too.

We should update the class level docs with a few things

In the second paragraph about masking. Add as a first sentence, "This layer will always apply a causal mask to the decoder attention layer."

Add a new paragraph. Some suggested text below:

This layer can be called with with either one or two inputs as follows: - `layer(decoder_sequence)`: no cross-attention will be built into the decoder block. This is useful when building a "decoder-only" transformer such as GPT-2. - `layer(decoder_sequence, encoder_sequence)`: cross-attention will be built into the encoder block. This is useful when building an "encoder-decoder" transformer, such as the original transformer model described in Attention is All You Need.

mattdangerw · 2022-06-09T22:18:26Z

keras_nlp/layers/transformer_decoder.py

            defaults to "zeros". The bias initializer for
            the dense and multiheaded attention layers.
        name: string, defaults to None. The name of the layer.
+        decoder_only: bool, defaults to False. If True, only the decoder layers 


mattdangerw · 2022-06-09T22:19:03Z

keras_nlp/layers/transformer_decoder.py

        self.supports_masking = True

-    def _build(self, input_shape):
+    def _build(self, input_shape, cross_attention):


maybe include_cross_attention, so it is more obvious this is a boolean value?

mattdangerw · 2022-06-09T22:21:02Z

keras_nlp/layers/transformer_decoder.py

+                raise ValueError(
+                    f"The number of call arguments to "
+                    f"`keras_nlp.layers.TransformerDecoder` should not change."
+                    f"\nUse `layer(decoder_sequence, encoder_sequence)` to "


remove all the \n in both error messages

mattdangerw · 2022-06-09T22:24:14Z

keras_nlp/layers/transformer_decoder.py

+                encoder_sequence, encoder_padding_mask, encoder_attention_mask
+            )
+            # Encoder-decoder attention.
+            encoder_decoder_attended = self._encoder_decoder_attention_layer(


let's clean up some variable names

_encoder_decoder_attention_layer -> _cross_attention_layer
_enc_dec_attentiondropout -> _cross_attention_dropout
_enc_dec_attention_layernorm -> _cross_attention_layernorm
encoder_decoder_attended -> cross_attended

mattdangerw · 2022-06-09T22:26:27Z

keras_nlp/layers/transformer_decoder_test.py

-        output = decoder(encoder_input, decoder_input)
+        output = decoder(decoder_input, encoder_input)
+        # should raise ValueError if encoder_input is not provided
+        try:


Remove the try block. You can add a separate test for these using self.assertRaises(ValueError). There's other examples in this test file.

mattdangerw · 2022-06-09T22:26:43Z

keras_nlp/layers/transformer_decoder_test.py

-            use_causal_mask=True,
+        output = decoder(decoder_input)
+        # should raise ValueError if encoder_input is provided
+        try:


same here, remove the try, catch

mattdangerw · 2022-06-09T22:27:15Z

keras_nlp/layers/transformer_decoder_test.py

        self.assertGreater(len(grad), 1)
        optimizer.apply_gradients(zip(grad, model.trainable_variables))

+    def test_one_training_step_of_transformer_without_encoder(self):


without_encoder -> without_cross_attention

here and elsewhere

mattdangerw · 2022-06-09T22:28:59Z

keras_nlp/layers/transformer_decoder_test.py

+        model_output = model(decoder_sequence)
+        loaded_model_output = loaded_model(decoder_sequence)
+        self.assertAllClose(model_output, loaded_model_output)
+


Remove extra newlines

mattdangerw

LGTM! Thanks.

One nit, and I think there are some format issues still.

mattdangerw · 2022-06-10T16:51:30Z

keras_nlp/layers/transformer_decoder.py

+            decoder block. This is useful when building a "decoder-only"
+            transformer such as GPT-2.
+        `layer(decoder_sequence, encoder_sequence)`: cross-attention will be
+            built into the encoder block. This is useful when building an


encoder block -> decoder block

Modified TransformerDecoder with optional encoder seq parameter, adde…

0cbdc8d

…d and edited tests

Added email to commit

319c252

chenmoneygithub reviewed Jun 2, 2022

View reviewed changes

jessechancy added 2 commits June 2, 2022 10:59

changed initalization of layers for TransformerDecoder based on condi…

74a2282

…tion of decoder only

Made changes based on pull request comments

b0b0dde

jessechancy added 2 commits June 2, 2022 13:32

added exceptions for invalid cases of transformerdecoder

8be5995

minor changes to comments

31629b2

Updated to implicitly remove/include cross-attention model based on w…

a066870

…hether encoder_sequence input is recieved

mattdangerw reviewed Jun 9, 2022

View reviewed changes

jessechancy added 2 commits June 9, 2022 14:32

updated logical flow within TransformerDecoder call function

8c8efc5

changed variable names and rearranged

7e8b9ef

mattdangerw requested changes Jun 9, 2022

View reviewed changes

jessechancy added 2 commits June 9, 2022 15:56

fixed minor changes

6e27763

fixed lint style errors

c0c5744

mattdangerw approved these changes Jun 10, 2022

View reviewed changes

jessechancy added 3 commits June 10, 2022 10:43

fixed minor lint style changes

281b6b4

update encoder -> decode in comment

a832482

format changes, removed indenting and change code format

920f6c2

chenmoneygithub approved these changes Jun 10, 2022

View reviewed changes

another lint style change with /shell/format.sh

b3782b0

chenmoneygithub merged commit f9f52ca into keras-team:master Jun 13, 2022

Issue 182: Modified TransformerDecoder with optional parameter #217

Issue 182: Modified TransformerDecoder with optional parameter #217

Uh oh!

Conversation

jessechancy commented Jun 1, 2022

Uh oh!

google-cla bot commented Jun 1, 2022

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessechancy commented Jun 2, 2022

Uh oh!

chenmoneygithub commented Jun 2, 2022

Uh oh!

jessechancy commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jessechancy commented Jun 7, 2022 •

edited

Loading