Stop generation once end_token_id is seen #769

chenmoneygithub · 2023-02-23T05:35:33Z

Resolve #749

chenmoneygithub · 2023-02-23T05:58:56Z

No unit tests added because the user-side behavior is unchanged.

Also I will mark this as ready after merging the cache pr.

mattdangerw

One meta comment on approach.

Another thing we should keep in mind is that this is I think precisely where left padding will become more efficient.

If you left pad your prompt originally, you start generating for all prompts right away, so it's more likely they will all "die" earlier. If you right pad, you might not even start generating on some sequences in the batch or a while.

Not something we need to solve on this PR, but something to think about!

keras_nlp/samplers/sampler.py

chenmoneygithub · 2023-02-24T05:22:43Z

@mattdangerw Yea, actually I have tried left padding earlier. My findings - Left padding does not work well with GPT2 and other models using absolute positional embedding. In my experiment, the generated text becomes chaotic when left padding is applied.

mattdangerw · 2023-02-24T22:42:36Z

@mattdangerw Yea, actually I have tried left padding earlier. My findings - Left padding does not work well with GPT2 and other models using absolute positional embedding. In my experiment, the generated text becomes chaotic when left padding is applied.

I actually think this would be bugs in the attention mask and position embedding setup (both of which are complex in the left pad setup!). But if you do everything correctly, the computation is exactly the same as I understand (e.g. greedy search output will one to one identical). I can try to put together a colab with huggingface showing this.

mattdangerw · 2023-02-24T22:48:34Z

One thing that might be worth noting in the left pad setup, is your really need to switch to a gather op for the position embedding, because your indices for the position embedding start varying per sample.

But overall, I am totally down to look at that as a follow up. Just wanted to point out a place where we are starting to leave performance on the table.

chenmoneygithub · 2023-02-28T03:56:53Z

@mattdangerw Yea, I also suspected my code was buggy, and it was based off a much earlier version, so definitely worth a second trial.

mattdangerw

This would need tests too before we land.

keras_nlp/samplers/sampler.py

keras_nlp/samplers/beam_sampler.py

mattdangerw · 2023-03-02T02:21:37Z

keras_nlp/samplers/beam_sampler_test.py

                    output_dim=self.feature_size,
                ),
                keras.layers.Dense(self.vocab_size),
-                keras.layers.Softmax(),


Why change this?

It's because we by default set from_logits=True (was False earlier for mysterious reason), so I am aligning the unit test with it.

mattdangerw

LGTM! approving!

Note that github actions appears to be down, so make sure to test out any changes locally! Left some comments for some weird XLA compilation errors I was seeing.

mattdangerw · 2023-03-03T00:32:11Z

keras_nlp/samplers/sampler.py

        # The index of the last non-padding token in prompt. Since all sequences
        # are aligned to the right side, the index is the same for all.
        current_index = max_length - num_steps
+        original_padding_mask = tf.cast(tf.identity(mask), dtype=tf.int32)


we compute this twice (above as well), should we just pass it through?

It's fairly cheap to compute. I was doing this way because sample already has confusing arg list, would like to keep it shorter (though not much...)

keras_nlp/samplers/sampler.py

chenmoneygithub requested a review from mattdangerw February 23, 2023 05:36

chenmoneygithub marked this pull request as draft February 23, 2023 05:59

chenmoneygithub force-pushed the early-truncation branch from ccc4116 to 8d4a8a8 Compare February 23, 2023 06:04

mattdangerw requested changes Feb 24, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Outdated Show resolved Hide resolved

chenmoneygithub added 2 commits February 28, 2023 12:46

Stop generation once end_token_id is seen

3e4f2f5

Better code!

e292f53

chenmoneygithub force-pushed the early-truncation branch from 8d4a8a8 to e292f53 Compare February 28, 2023 05:18

chenmoneygithub marked this pull request as ready for review February 28, 2023 05:51

mattdangerw requested changes Feb 28, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Show resolved Hide resolved

keras_nlp/samplers/sampler.py Outdated Show resolved Hide resolved

chenmoneygithub added 3 commits March 1, 2023 11:42

fix none end_token_id

ea63d50

fix test

17b1c1e

fix tests

7bbcc41

mattdangerw reviewed Mar 2, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Outdated Show resolved Hide resolved

mattdangerw reviewed Mar 2, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Show resolved Hide resolved

mattdangerw reviewed Mar 2, 2023

View reviewed changes

chenmoneygithub added 2 commits March 2, 2023 10:22

siimplify the end_token_id logic

a67ea77

even better

6d2ef24

chenmoneygithub requested a review from mattdangerw March 2, 2023 02:38

mattdangerw approved these changes Mar 3, 2023

View reviewed changes

chenmoneygithub added 2 commits March 6, 2023 11:34

fix comments

1a675d6

merge

9583a8c

chenmoneygithub merged commit d9ab8b3 into keras-team:master Mar 6, 2023

abheesht17 mentioned this pull request Mar 20, 2023

The tokenizer for the double quotes symbol " is not assigning the corresponding ID from the vocabulary. #877

Closed

Stop generation once end_token_id is seen #769

Stop generation once end_token_id is seen #769

Uh oh!

Conversation

chenmoneygithub commented Feb 23, 2023

Uh oh!

chenmoneygithub commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenmoneygithub commented Feb 24, 2023

Uh oh!

mattdangerw commented Feb 24, 2023

Uh oh!

mattdangerw commented Feb 24, 2023

Uh oh!

chenmoneygithub commented Feb 28, 2023

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 3, 2023

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenmoneygithub commented Feb 23, 2023 •

edited

Loading