Stripping the MASK token #876

TheAthleticCoder · 2023-03-18T20:26:54Z

Resolves #829
I hope this PR solves the issue! @mattdangerw and @abheesht17
Would like to finish this issue so that I can resolve the issue of Speeding up testing for the Deberta model.

mattdangerw · 2023-03-20T23:31:10Z

keras_nlp/models/deberta_v3/deberta_v3_tokenizer.py


+    def detokenize(self, ids):
+        blank_token_id = self.token_to_id("")
+        ids = tf.where(ids == self.mask_token_id, blank_token_id, ids)


We make frequent use of tf.ragged.boolean_mask to do stuff like this.

Would ids = tf.ragged.boolean_mask(ids, tf.not_equal(ids, self.mask_token_id)) work?

Also please add a unit test to verify this behavior.

@mattdangerw I have added the requested changes. A few things to note are:

In tf.ragged.boolean_mask(ids, tf.not_equal(ids, self.mask_token_id)), I set the default _value to the blank token.

Successfully added a simple unit test.

Do let me know if there are any changes to be made. Thank You!

This is not the proper fix based on the results after seeing the checks below. I am going to try to use my original method and the added unit test to see if that works correctly.

I think you just don't need the blank ids at all anymore. There is no need for a default value, essentially what you are doing is passing a mask of all location that should be kept to the boolean mask function.

Did you try the ids = tf.ragged.boolean_mask(ids, tf.not_equal(ids, self.mask_token_id)) line? Remove blank_token_id entirely.

abheesht17 · 2023-03-21T18:06:18Z

keras_nlp/models/deberta_v3/deberta_v3_tokenizer.py

+    def detokenize(self, ids):
+        blank_token_id = self.token_to_id("")
+        mask = tf.not_equal(ids, self.mask_token_id)
+        ids = tf.ragged.boolean_mask(ids, mask, default_value=blank_token_id)


@TheAthleticCoder, your tests are failing because default_value is not a valid argument: https://www.tensorflow.org/api_docs/python/tf/ragged/boolean_mask. Please remove default_value and trigger the tests again.

mattdangerw

LGTM!

* Stripping the MASK token * Stripping the MASK token * added unit test * fixed detokenize function * check if test unit is correct * changed MASK token index * trial using tokenizer mask id * using tf ragged boolean mask * reformatted the prev commit

TheAthleticCoder added 2 commits March 19, 2023 01:51

Stripping the MASK token

8b8bc19

Stripping the MASK token

c991718

mattdangerw requested changes Mar 20, 2023

View reviewed changes

added unit test

c95e0a9

abheesht17 reviewed Mar 21, 2023

View reviewed changes

TheAthleticCoder added 4 commits March 21, 2023 23:50

fixed detokenize function

40231bd

check if test unit is correct

6152965

changed MASK token index

408f1b5

trial using tokenizer mask id

7bc5d18

mattdangerw assigned mattdangerw and abheesht17 Mar 22, 2023

TheAthleticCoder added 2 commits March 23, 2023 04:38

using tf ragged boolean mask

ae43476

reformatted the prev commit

1bbb9ed

mattdangerw approved these changes Mar 22, 2023

View reviewed changes

mattdangerw merged commit b35e83f into keras-team:master Mar 23, 2023

TheAthleticCoder deleted the issue829 branch March 23, 2023 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stripping the MASK token #876

Stripping the MASK token #876

Uh oh!

TheAthleticCoder commented Mar 18, 2023

Uh oh!

mattdangerw Mar 20, 2023

Uh oh!

TheAthleticCoder Mar 21, 2023

Uh oh!

TheAthleticCoder Mar 21, 2023

Uh oh!

mattdangerw Mar 22, 2023

Uh oh!

abheesht17 Mar 21, 2023

Uh oh!

mattdangerw left a comment

Uh oh!

Uh oh!

Stripping the MASK token #876

Stripping the MASK token #876

Uh oh!

Conversation

TheAthleticCoder commented Mar 18, 2023

Uh oh!

mattdangerw Mar 20, 2023

Choose a reason for hiding this comment

Uh oh!

TheAthleticCoder Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

TheAthleticCoder Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

abheesht17 Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!