Add StartEndPacker layer #221

abheesht17 · 2022-06-10T20:44:29Z

Resolves #220

mattdangerw · 2022-06-13T19:28:07Z

@abheesht17 will review line by line later, but missed one spot where the functionality is broken here.

We need to ensure that the end token will show up even if the sequence is longer than sequence length. The start and end tokens should show up in the output for all sequences if they are set. Right now I think an overlong sequence will cause the layer to truncate the end token away.

You might be able to do this with regular slicing. Something like...

if self.end_value is not None:
    end_token_id_tensor = tf.fill((batch_size, 1), self.end_value)
    inputs = inputs[..., :sequence_length - 1]  # Trim to leave room for end token.
    inputs = tf.concat([inputs, end_token_id_tensor], axis=-1)

We should update our unit testing so we check this case somewhere too!

mattdangerw

Looks good! Just a few minor comments. The big thing to fix here is still adding a end token when input length > sequence length.

mattdangerw · 2022-06-14T00:02:11Z

keras_nlp/layers/start_end_packer.py

+class StartEndPacker(keras.layers.Layer):
+    """Adds start and end tokens to a sequence and pads to a fixed length.
+
+    If inputs are batched, input should be a `tf.RaggedTensor`s with shape


Might be good to add a short intro paragraph.

This layer is useful when tokenizing inputs for tasks like translation, where each sequence should be marked with a start and end marker. It should be called after tokenization. The layer will first trim inputs to fit, then add start/end tokens, and finally pad if necessary to sequence_length.

Right. Thanks! Pretty much copied it verbatim :)

mattdangerw · 2022-06-14T00:42:32Z

keras_nlp/layers/start_end_packer.py

+        self.pad_value = pad_value
+
+    def call(self, inputs):
+        input_is_tensor = isinstance(inputs, tf.Tensor)


Before this should probably do the same

if not isinstance(inputs, tf.Tensor) or isinstance(inputs, tf.RaggedTensor): inputs = tf.conver_to_tensor(inputs)

That will allow us to support things like numpy or list inputs. Useful for demos.

👍🏼 . I've removed the else block below:

else: raise ValueError( "Input must be of type `tf.Tensor` or `tf.RaggedTensor`, " f"but got {type(inputs)}" )

One minor thing - I would rename this to input_is_dense to match input_is_ragged.

mattdangerw · 2022-06-14T00:45:39Z

keras_nlp/layers/start_end_packer.py

+        if input_is_tensor:
+            if inputs.shape.rank != 1:
+                raise ValueError(
+                    "Input dense tensor must be of rank 1. "


I think it would be more helpful to say....

Input must be either dense and rank 1 or ragged and rank 2. Received dense input with rank={...}

mattdangerw · 2022-06-14T00:45:47Z

keras_nlp/layers/start_end_packer.py

+        elif input_is_ragged:
+            if inputs.shape.rank != 2:
+                raise ValueError(
+                    "Input ragged tensor must be of rank 2. "


Similar edit here

mattdangerw · 2022-06-14T00:46:24Z

keras_nlp/layers/start_end_packer_test.py

+        ]
+        self.assertAllEqual(output, expected_output)
+
+    def test_functional_model(self):


add a config test

chenmoneygithub · 2022-06-14T01:36:27Z

keras_nlp/layers/start_end_packer_test.py

+    def test_dense_input_error(self):
+        input_data = tf.constant([[5, 6, 7]])
+        start_end_packer = StartEndPacker(sequence_length=5)
+        with self.assertRaises(ValueError):


Please include the error message (just prefix works) to ensure we are capturing the right error.

Done! I've reported the whole error message. Hope that's okay?

chenmoneygithub · 2022-06-14T01:36:39Z

keras_nlp/layers/start_end_packer_test.py

+    def test_ragged_input_error(self):
+        input_data = tf.ragged.constant([[[5, 6, 7], [8, 9, 10, 11]]])
+        start_end_packer = StartEndPacker(sequence_length=5)
+        with self.assertRaises(ValueError):


abheesht17

@mattdangerw, addressed your comments. Thanks for the review!

abheesht17 · 2022-06-14T01:38:46Z

keras_nlp/layers/start_end_packer.py

+class StartEndPacker(keras.layers.Layer):
+    """Adds start and end tokens to a sequence and pads to a fixed length.
+
+    If inputs are batched, input should be a `tf.RaggedTensor`s with shape


Right. Thanks! Pretty much copied it verbatim :)

abheesht17 · 2022-06-14T01:42:27Z

keras_nlp/layers/start_end_packer.py

+        self.pad_value = pad_value
+
+    def call(self, inputs):
+        input_is_tensor = isinstance(inputs, tf.Tensor)


👍🏼 . I've removed the else block below:

else: raise ValueError( "Input must be of type `tf.Tensor` or `tf.RaggedTensor`, " f"but got {type(inputs)}" )

mattdangerw

lgtm! just pushed a few copy edits

mattdangerw

One more bug actually!

mattdangerw · 2022-06-14T19:44:35Z

keras_nlp/layers/start_end_packer.py

+                    f"rank={inputs.shape.rank}"
+                )
+
+        batch_size = inputs.shape[0]


Actually one more issue here. We need to support the case where the static shape of the batch size is None. I think you will need to call tf.shape(inputs)[0] to get the dynamic batch size.

We should also add another unit test, using tf.data.map, and calling batch() on the dataset before applying the layer. That should catch this bug.

abheesht17 · 2022-06-15T03:21:30Z

One more bug actually!

Fixed!

chenmoneygithub

LGTM! one minor comment about variable naming

chenmoneygithub · 2022-06-15T04:47:07Z

keras_nlp/layers/start_end_packer.py

+        self.pad_value = pad_value
+
+    def call(self, inputs):
+        input_is_tensor = isinstance(inputs, tf.Tensor)


One minor thing - I would rename this to input_is_dense to match input_is_ragged.

mattdangerw

Thank you! LGTM!

abheesht17 added 5 commits June 11, 2022 02:07

Add StartEndPacker layer

e0905d2

Format

19e9d37

Move batch_size assignment outside if...else block

e0c5074

Add UTs

6fb2777

Add examples

5a38a9d

mattdangerw reviewed Jun 14, 2022

View reviewed changes

chenmoneygithub suggested changes Jun 14, 2022

View reviewed changes

Address review comments - I

50638b2

abheesht17 commented Jun 14, 2022

View reviewed changes

abheesht17 and others added 2 commits June 14, 2022 07:37

Address review comments - II

53b7394

Copy edits

8b854bf

mattdangerw approved these changes Jun 14, 2022

View reviewed changes

mattdangerw requested changes Jun 14, 2022

View reviewed changes

Fix batch_size bug

e955959

chenmoneygithub reviewed Jun 15, 2022

View reviewed changes

Change var name

b768b5b

mattdangerw approved these changes Jun 16, 2022

View reviewed changes

mattdangerw merged commit 3c89e51 into keras-team:master Jun 16, 2022

Add StartEndPacker layer #221

Add StartEndPacker layer #221

Uh oh!

Conversation

abheesht17 commented Jun 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw commented Jun 13, 2022

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abheesht17 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abheesht17 commented Jun 15, 2022

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abheesht17 commented Jun 10, 2022 •

edited

Loading