Add FNet Encoder Layer #43

abheesht17 · 2022-03-15T13:30:55Z

Resolves #40.

I've implemented the FNet Encoder Layer. I'll add unit tests once the code has been finalised. @mattdangerw ,@chenmoneygithub, please go through it. Thanks!

Here is the link to the Colab notebook where I do a forward pass: https://colab.research.google.com/drive/1ldWMSqw8q8qwndHUxtuILxowRRFiBTbT?usp=sharing

I'm planning to add an example for this (text classification). I'll stack 2-3 encoder layers and train the model.
Caveat: In the paper, the authors mention that they used SentencePiece Tokenizer (this hasn't been implemented in keras-nlp yet). So, should I add the example later in a separate PR? Or should I just use SentencePiece from TF Text?

abheesht17 · 2022-03-19T10:50:34Z

Hey, @mattdangerw. Any thoughts on this?

chenmoneygithub

Thanks for the PR!

Dropped some comments, and please add unit tests to test the layer's funcitonality, thanks!

keras_nlp/layers/fnet_encoder.py

chenmoneygithub · 2022-03-19T20:02:53Z

keras_nlp/layers/fnet_encoder.py

+        if not self._built:
+            self._build(inputs.shape)
+
+        # Apply fourier transform on the input. Note: We don't have padding


I want to understand this a bit more - are you saying the original paper does not do padding, and take in RaggedTensor as input? Thanks!

So, as far as I understand, they do pad the sequences so that every sequence is of the same length. See this: https://github.com/google-research/google-research/blob/master/f_net/layers.py#L106. The expected shape is (bsz, max_seq_length, hidden_dim).

However, the padding mask is not used at all. They delete it here: https://github.com/google-research/google-research/blob/master/f_net/layers.py#L114. So, the mixing is done over all the tokens, including the padding tokens. See the input pipeline here: https://github.com/google-research/google-research/blob/master/f_net/input_pipeline.py#L100 and https://github.com/google-research/google-research/blob/master/f_net/input_pipeline.py#L107-L109.

My guess would be that FLAX/JAX probably doesn't have the equivalent of a Ragged Tensor. Hence, they had to pad the sequences in order to be able to batch them up. As to why they did not mask the pad tokens, I'm not sure.

Edit: I've mailed the authors of FNet: Mixing Tokens with Fourier Transforms asking them about why the padding mask is deleted.

I received a reply from the first author, James Lee-Thorp. This is the gist of the message:

For FNet, padding tokens are applied, but padding tokens are not masked. There is a possibility that the model may become sensitive to samples with varying number of padding tokens...but in practice, the authors did not notice any major performance issues. So, I think we can keep it the same, we don't need to apply padding masks.

If we do want to apply a padding mask, we should only apply it after the embedding layer, and not after every Fourier sublayer. This is because if we apply it after every Fourier sublayer, we will end up zeroing out some frequencies which is counter-productive.

I'll add a more detailed comment on Line 129, I haven't properly conveyed the meaning with the current comment.

mattdangerw · 2022-03-19T21:45:32Z

Actually, if it's ok, maybe wait until this Tuesday to fully flesh this out? We have an internal sync meeting then, and will discussing where we want to set our bar (e.g. with citations) for inclusion in this repo.

I'm pretty sure we will want this, but if you wait till then, we could give a more confident green light. Thanks for patience! Really do appreciate the contributions!

abheesht17 · 2022-03-19T23:38:36Z

Sure! Thank you :)

abheesht17

Addressed the first round of comments. Thank you, @chenmoneygithub, for reviewing the code :)

abheesht17 · 2022-03-20T10:11:15Z

keras_nlp/layers/fnet_encoder.py

+        if not self._built:
+            self._build(inputs.shape)
+
+        # Apply fourier transform on the input. Note: We don't have padding


So, as far as I understand, they do pad the sequences so that every sequence is of the same length. See this: https://github.com/google-research/google-research/blob/master/f_net/layers.py#L106. The expected shape is (bsz, max_seq_length, hidden_dim).

However, the padding mask is not used at all. They delete it here: https://github.com/google-research/google-research/blob/master/f_net/layers.py#L114. So, the mixing is done over all the tokens, including the padding tokens. See the input pipeline here: https://github.com/google-research/google-research/blob/master/f_net/input_pipeline.py#L100 and https://github.com/google-research/google-research/blob/master/f_net/input_pipeline.py#L107-L109.

My guess would be that FLAX/JAX probably doesn't have the equivalent of a Ragged Tensor. Hence, they had to pad the sequences in order to be able to batch them up. As to why they did not mask the pad tokens, I'm not sure.

Edit: I've mailed the authors of FNet: Mixing Tokens with Fourier Transforms asking them about why the padding mask is deleted.

keras_nlp/layers/fnet_encoder.py

mattdangerw · 2022-03-22T18:53:10Z

Thanks for patience @abheesht17! We discussed in our meeting and confirmed we would like to add this.

Please do go ahead and add tests. Also if you could add initializers arguments similar to #50 that would be great.

Thank you!

abheesht17 · 2022-03-22T18:59:39Z

Awesome, @mattdangerw! Thank you :). I'll add the the tests and the initialiser arguments.

One more thing - could you and @chenmoneygithub please give #43 (comment) and #43 (comment) a look so that we can have closure on the padding mask issue?

Should I also add a text classification example (stacks of FNet Encoder layers, with a dense layer on top), or should we leave that to a future PR?

abheesht17 · 2022-03-22T20:30:54Z

@mattdangerw, @chenmoneygithub, I've done the following:

Added UTs (followed the UTs for TransformerEncoder).
Added kernel and bias initialisers.

Let me know if it's okay. Thanks!

fchollet

Thanks for the PR!

fchollet · 2022-03-22T21:17:43Z

keras_nlp/layers/fnet_encoder.py

+        self.bias_initializer = keras.initializers.get(bias_initializer)
+        self._built = False
+
+    def _build(self, input_shape):


This should just be the built-in build()

Ah, overriding the build() function in tf.keras.layers.Layer? Done! Should I make the same changes to the TransformerEncoder class?
https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/layers/transformer_encoder.py#L83

the default build() only works if your call() method takes one input, the TransformerEncoder takes multiple inputs, so default build() cannot be used there.

Ohh, I see. Thanks! :)

fchollet · 2022-03-22T21:18:58Z

keras_nlp/layers/fnet_encoder.py

+        )
+        self._output_dropout = keras.layers.Dropout(rate=self.dropout)
+
+    def _fourier_transform(self, input):


In-line it in call

fchollet · 2022-03-22T21:19:02Z

keras_nlp/layers/fnet_encoder.py

+        mixing_output = tf.math.real(tf.signal.fft2d(input))
+        return mixing_output
+
+    def _add_and_norm(self, input1, input2, norm_layer):


In-line it in call

fchollet · 2022-03-22T21:19:06Z

keras_nlp/layers/fnet_encoder.py

+    def _add_and_norm(self, input1, input2, norm_layer):
+        return norm_layer(input1 + input2)
+
+    def _feed_forward(self, input):


In-line it in call

fchollet · 2022-03-22T21:21:02Z

keras_nlp/layers/fnet_encoder.py

+
+    def __init__(
+        self,
+        intermediate_dim=3072,


Where does this value come from?

Let's remove the default. intermediate_dim should be a required argument to pass in.

From the FNet config: https://github.com/google-research/google-research/blob/master/f_net/configs/base.py#L83-L84

At component level let's do not set the default, we will pass this value when we are building the actual model.

Ah, alright. I've removed the default value!

fchollet · 2022-03-22T21:21:11Z

keras_nlp/layers/fnet_encoder.py

+    def __init__(
+        self,
+        intermediate_dim=3072,
+        dropout=0.1,


Should default to 0

FNet config: https://github.com/google-research/google-research/blob/master/f_net/configs/base.py#L92-L95

Right, but we are not shipping a model architecture here, we are shipping a reusable layer. I would be inclined to use the same defaults as the TransformerEncoder block for most of the args here so this is essentially a swap in replacement. Having each encoder block we ship use different defaults everywhere would be a lot to track, and lead to bad usability.

Right. I'll set the defaults to the ones used by TransformerEncoder 👍🏼

fchollet · 2022-03-22T21:21:30Z

keras_nlp/layers/fnet_encoder.py

+        intermediate_dim=3072,
+        dropout=0.1,
+        activation="gelu",
+        layer_norm_epsilon=1e-12,


This seems way too small

FNet config: https://github.com/google-research/google-research/blob/master/f_net/models.py#L44

fchollet · 2022-03-22T21:22:02Z

keras_nlp/layers/fnet_encoder.py

+        dropout=0.1,
+        activation="gelu",
+        layer_norm_epsilon=1e-12,
+        kernel_initializer="glorot_uniform",


Is this choice of initializer the best practice for this layer?

FNet repo: https://github.com/google-research/google-research/blob/6fd7a1a0872e9d27c1f1764836bbf9048a7903e7/f_net/models.py#L39-L42

I believe it is tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) in Keras. Will change it to this.

Intermediate Dense Layer

kernel_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) bias_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2)

Output Dense Layer:

kernel_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) bias_initializer = "zeros" # They don't specify the bias initializer in the output dense layer. The default in Flax is "zeros".

https://github.com/google-research/google-research/blob/master/f_net/layers.py#L73-L81

Should I set it as tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) for all, or should I separate the bias initializer for the output dense layer and set that to "zeros"?

abheesht17

Thank you for the review comments, @fchollet, @mattdangerw! :)

Summary:

For all argument-related comments, I've attached links to the config present in the official FNet code. What do you think I should do - keep the same defaults as given in the FNet config, or change it to the defaults given in TransformerEncoder?
Please see the comment on initializers. I've changed everything to RandomNormal with a standard deviation of 2e-2.
For the comments on making the functions in-line, I had followed what's been done in the TransformerEncoder class. I've now changed it to what @fchollet mentioned in the comments. Should I make the same changes in TransformerEncoder and TransformerDecoder?

abheesht17 · 2022-03-23T02:25:25Z

keras_nlp/layers/fnet_encoder.py

+
+    def __init__(
+        self,
+        intermediate_dim=3072,


From the FNet config: https://github.com/google-research/google-research/blob/master/f_net/configs/base.py#L83-L84

abheesht17 · 2022-03-23T02:25:49Z

keras_nlp/layers/fnet_encoder.py

+    def __init__(
+        self,
+        intermediate_dim=3072,
+        dropout=0.1,


FNet config: https://github.com/google-research/google-research/blob/master/f_net/configs/base.py#L92-L95

abheesht17 · 2022-03-23T02:27:06Z

keras_nlp/layers/fnet_encoder.py

+        intermediate_dim=3072,
+        dropout=0.1,
+        activation="gelu",
+        layer_norm_epsilon=1e-12,


FNet config: https://github.com/google-research/google-research/blob/master/f_net/models.py#L44

abheesht17 · 2022-03-23T02:59:31Z

keras_nlp/layers/fnet_encoder.py

+        dropout=0.1,
+        activation="gelu",
+        layer_norm_epsilon=1e-12,
+        kernel_initializer="glorot_uniform",


FNet repo: https://github.com/google-research/google-research/blob/6fd7a1a0872e9d27c1f1764836bbf9048a7903e7/f_net/models.py#L39-L42

I believe it is tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) in Keras. Will change it to this.

Intermediate Dense Layer

kernel_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) bias_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2)

Output Dense Layer:

kernel_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) bias_initializer = "zeros" # They don't specify the bias initializer in the output dense layer. The default in Flax is "zeros".

https://github.com/google-research/google-research/blob/master/f_net/layers.py#L73-L81

Should I set it as tf.keras.initializers.RandomNormal(mean=0.0, stddev=2e-2) for all, or should I separate the bias initializer for the output dense layer and set that to "zeros"?

abheesht17 · 2022-03-23T03:18:01Z

keras_nlp/layers/fnet_encoder.py

+        self.bias_initializer = keras.initializers.get(bias_initializer)
+        self._built = False
+
+    def _build(self, input_shape):


Ah, overriding the build() function in tf.keras.layers.Layer? Done! Should I make the same changes to the TransformerEncoder class?
https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/layers/transformer_encoder.py#L83

abheesht17 · 2022-03-23T03:19:51Z

keras_nlp/layers/fnet_encoder.py

+        )
+        self._output_dropout = keras.layers.Dropout(rate=self.dropout)
+
+    def _fourier_transform(self, input):


abheesht17 · 2022-03-23T03:19:56Z

keras_nlp/layers/fnet_encoder.py

+        mixing_output = tf.math.real(tf.signal.fft2d(input))
+        return mixing_output
+
+    def _add_and_norm(self, input1, input2, norm_layer):


abheesht17 · 2022-03-23T03:20:00Z

keras_nlp/layers/fnet_encoder.py

+    def _add_and_norm(self, input1, input2, norm_layer):
+        return norm_layer(input1 + input2)
+
+    def _feed_forward(self, input):


chenmoneygithub · 2022-03-23T03:57:16Z

keras_nlp/layers/fnet_encoder.py

+            x = self._output_dense(x)
+            return self._output_dropout(x)
+
+        # Apply fourier transform on the input.


I would prefer putting these comments into the class' docstring, since the line mixing_output = _fourier_transform(inputs) does not indicate this much information.

Additionally, this code reference is error-prone, because the line number is subject to change. Let's delete it, thanks!

abheesht17

Thank you for the comments, @chenmoneygithub! I've addressed them.

abheesht17 · 2022-03-23T04:10:33Z

keras_nlp/layers/fnet_encoder.py

+
+    def __init__(
+        self,
+        intermediate_dim=3072,


Ah, alright. I've removed the default value!

abheesht17 · 2022-03-23T04:13:40Z

keras_nlp/layers/fnet_encoder.py

+            x = self._output_dense(x)
+            return self._output_dropout(x)
+
+        # Apply fourier transform on the input.


mattdangerw

Left some small comments. Once we resolve the default argument discussion this looks good to me.

mattdangerw · 2022-03-23T19:26:40Z

keras_nlp/layers/fnet_encoder.py

+    def __init__(
+        self,
+        intermediate_dim=3072,
+        dropout=0.1,


Right, but we are not shipping a model architecture here, we are shipping a reusable layer. I would be inclined to use the same defaults as the TransformerEncoder block for most of the args here so this is essentially a swap in replacement. Having each encoder block we ship use different defaults everywhere would be a lot to track, and lead to bad usability.

mattdangerw · 2022-03-23T19:27:57Z

keras_nlp/layers/fnet_encoder.py

+            A Tensor of the same shape as the `inputs`.
+        """
+
+        def _fourier_transform(input):


I don't think there's any reason to keep this inner functions private with underscores. They are local to the call function.

You only need the underscore for self._x to communicate that it's private.

Makes sense. Changed!

mattdangerw · 2022-03-23T19:28:56Z

keras_nlp/layers/fnet_encoder.py

+        # Apply fourier transform on the input.
+        mixing_output = _fourier_transform(inputs)
+
+        # LayerNorm layer.


remove the # LayerNorm layer. and # Feedforward layer. comments, the function names communicate that fine.

abheesht17

Thank you, @mattdangerw, for the review! I've changed all defaults to what they are in TransformerEncoder (including the initialisers).

abheesht17 · 2022-03-24T02:35:36Z

keras_nlp/layers/fnet_encoder.py

+    def __init__(
+        self,
+        intermediate_dim=3072,
+        dropout=0.1,


Right. I'll set the defaults to the ones used by TransformerEncoder 👍🏼

abheesht17 · 2022-03-24T02:35:53Z

keras_nlp/layers/fnet_encoder.py

+            A Tensor of the same shape as the `inputs`.
+        """
+
+        def _fourier_transform(input):


Makes sense. Changed!

abheesht17 · 2022-03-24T02:43:48Z

keras_nlp/layers/fnet_encoder.py

+        # Apply fourier transform on the input.
+        mixing_output = _fourier_transform(inputs)
+
+        # LayerNorm layer.


mattdangerw

Thanks! This looks good to me.

abheesht17 · 2022-03-25T19:04:48Z

Thank you, @mattdangerw! 🥳

mattdangerw · 2022-03-25T23:03:17Z

keras_nlp/layers/fnet_encoder.py

+    Args:
+        intermediate_dim: int, defaults to 3072. The hidden size of feedforward
+            network.
+        dropout: float, defaults to 0.1. The dropout value, applied in the


oops just saw this, update the default in the docstrings too (or maybe just drop them), when we generate docs we can include the default from the function signature

Ah, my bad. I've fixed it 👍🏼. Thanks for noticing!

fchollet

API LGTM.

* Add rough code for FNet Encoder * Format code * Minor doc-string changes * Format __init__.py * Address review comments - 1 * Add detailed comment about padding masks * Add kernel and bias initialisers * Add unit tests for the layer * Address review comments - 2 * Address review comments - 3 * Address review comments - 4 * Minor change * Correct doc-string

abheesht17 added 4 commits March 15, 2022 18:09

Add rough code for FNet Encoder

59cc53c

Format code

e520e4d

Minor doc-string changes

a005ae2

Merge branch 'master' into fnet-encoder

7bd4a8a

abheesht17 changed the title ~~Fnet encoder~~ Add FNet Encoder Layer Mar 15, 2022

Format __init__.py

a5575ff

chenmoneygithub reviewed Mar 19, 2022

View reviewed changes

Address review comments - 1

c75e09f

abheesht17 commented Mar 20, 2022

View reviewed changes

Add detailed comment about padding masks

6d85073

abheesht17 added 3 commits March 23, 2022 01:12

Merge branch 'keras-team:master' into fnet-encoder

514cee1

Add kernel and bias initialisers

63b7f22

Add unit tests for the layer

debafb7

fchollet reviewed Mar 22, 2022

View reviewed changes

abheesht17 commented Mar 23, 2022

View reviewed changes

Address review comments - 2

669ca9c

chenmoneygithub reviewed Mar 23, 2022

View reviewed changes

Address review comments - 3

f213472

abheesht17 commented Mar 23, 2022

View reviewed changes

mattdangerw reviewed Mar 23, 2022

View reviewed changes

Address review comments - 4

0a16b30

abheesht17 commented Mar 24, 2022

View reviewed changes

abheesht17 added 2 commits March 24, 2022 08:22

Minor change

ada4eb1

Merge branch 'keras-team:master' into fnet-encoder

2def525

mattdangerw approved these changes Mar 25, 2022

View reviewed changes

mattdangerw reviewed Mar 25, 2022

View reviewed changes

chenmoneygithub approved these changes Mar 25, 2022

View reviewed changes

Correct doc-string

8ce1d2e

fchollet approved these changes Mar 26, 2022

View reviewed changes

mattdangerw merged commit b17ca27 into keras-team:master Mar 28, 2022

apupneja mentioned this pull request Feb 16, 2023

Adding a FNetMaskedLM task model and preprocessor #740

Merged

Add FNet Encoder Layer #43

Add FNet Encoder Layer #43

Uh oh!

Conversation

abheesht17 commented Mar 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Mar 19, 2022

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abheesht17 Mar 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abheesht17 Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattdangerw commented Mar 19, 2022

Uh oh!

abheesht17 commented Mar 19, 2022

Uh oh!

abheesht17 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abheesht17 Mar 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattdangerw commented Mar 22, 2022

Uh oh!

abheesht17 commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abheesht17 Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

abheesht17 commented Mar 15, 2022 •

edited

Loading

abheesht17 Mar 20, 2022 •

edited

Loading

abheesht17 Mar 22, 2022 •

edited

Loading

abheesht17 left a comment •

edited

Loading

abheesht17 Mar 20, 2022 •

edited

Loading

abheesht17 commented Mar 22, 2022 •

edited

Loading

abheesht17 commented Mar 22, 2022 •

edited

Loading

abheesht17 Mar 23, 2022 •

edited

Loading

abheesht17 left a comment •

edited

Loading

abheesht17 Mar 23, 2022 •

edited

Loading