Add contrastive sampler #896

chenmoneygithub · 2023-03-21T08:40:30Z

This turned out to require lots of design, so I am taking over the work by myself.

To play with the API, use the code below:

!pip install -q -U git+https://github.com/chenmoneygithub/keras-nlp.git@contrastive

import keras_nlp

gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
gpt2_lm.compile(run_eagerly=False, jit_compile=True, sampler="contrastive")

print(
    gpt2_lm.generate(
        ["that's weird", "that's even weirder"],
        max_length=10,
    )
)

mattdangerw

Very cool this is working! Talk offline, I'm just submitting the comments from talking live for now.

Let's try to make it so that the model code does not need to special case a sampling strategy, and the sampler code does not need to assume anything about the structure of the passed state.

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/contrastive_sampler.py

mattdangerw · 2023-03-22T01:13:48Z

Also, let's benchmark this! I think there are two performance questions of interest.

First, we should make sure our implementation of contrastive search is competitive with what else is out there.
Second, we should make sure that if we do always return hidden state from the model, that we will not slow down our current "top_p" or "top_k" search etc. I'm pretty sure this will be fine, as the unused hidden_state that we return should "compile out" when we trace.

mattdangerw · 2023-03-22T01:28:59Z

Here's a colab that might be usefulf for benchmarking https://colab.research.google.com/gist/mattdangerw/b31acb4b33ac13e6403fb043c9162657/rough-generation-benchmarking.ipynb

chenmoneygithub · 2023-03-26T13:58:01Z

@mattdangerw From benchmark - we are faster OvO. The time cost I got from your colab:

our contrastive search: 55.54s for 25 times.
HuggingFace: 62.98s for 25 times.

I cannot share github gist while I am in China, but here is how I set HF to use contrastive:

generation_kwargs = {"penalty_alpha": 0.2, "max_new_tokens": 256, "top_k": 5}

I am marking this as ready for review.

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/sampler.py

keras_nlp/samplers/contrastive_sampler.py

chenmoneygithub · 2023-03-31T04:33:12Z

/gcbrun

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/beam_sampler_test.py

keras_nlp/samplers/contrastive_sampler.py

keras_nlp/samplers/contrastive_sampler_test.py

keras_nlp/samplers/sampler.py

mattdangerw

I still want to step through the main logic a bit more, but submitting these comments, as I am almost out of time for the day!

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/beam_sampler.py

keras_nlp/samplers/beam_sampler_test.py

keras_nlp/samplers/contrastive_sampler_test.py

keras_nlp/samplers/contrastive_sampler.py

keras_nlp/samplers/greedy_sampler_test.py

keras_nlp/samplers/contrastive_sampler.py

mattdangerw

A few more comment as I dig in here!

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/contrastive_sampler_test.py

keras_nlp/samplers/contrastive_sampler.py

mattdangerw · 2023-04-05T23:42:24Z

OK! Finally looked through this in more detail! Great work, this is a quite readable implementation of a complex sampler.

The one thing I found a big confusing was the indexing here. I actually think we may want to mix up the way we count our indices across all samplers, so loop index == cache index == index of the token being fed in. Then for this contrastive search we would not need to feed in an "out of bounds" looking index on the final pass (currently the final index is 30 if max_length is 30, which just seems weird to me).

I think an update like this could make our overall code more readable. Here's an example commit -> mattdangerw@3b079b7

No need to do this on this PR, we can do this as a follow up, but what do you think?

chenmoneygithub · 2023-04-07T01:12:13Z

/gcbrun

mattdangerw

Looks good! Just a last few comments!

Any ideas on how we can validate this is actually working correctly?

keras_nlp/samplers/beam_sampler.py

keras_nlp/samplers/contrastive_sampler.py

keras_nlp/samplers/contrastive_sampler_test.py

keras_nlp/samplers/greedy_sampler.py

keras_nlp/samplers/top_k_sampler.py

keras_nlp/samplers/top_p_sampler.py

keras_nlp/samplers/contrastive_sampler.py

chenmoneygithub · 2023-04-07T21:57:42Z

/gcbrun

chenmoneygithub · 2023-04-07T23:13:22Z

/gcbrun

chenmoneygithub marked this pull request as draft March 21, 2023 08:40

kanpuriyanawab mentioned this pull request Mar 21, 2023

Add contrastive search to our Sampler collection #644

Closed

mattdangerw requested changes Mar 22, 2023

View reviewed changes

chenmoneygithub force-pushed the contrastive branch from 429f0b6 to 78a6dd0 Compare March 22, 2023 11:58

chenmoneygithub marked this pull request as ready for review March 26, 2023 13:58

mattdangerw self-assigned this Mar 29, 2023

mattdangerw requested changes Mar 30, 2023

View reviewed changes

chenmoneygithub force-pushed the contrastive branch from 9796e51 to 2f37199 Compare March 31, 2023 04:32

mattdangerw requested changes Mar 31, 2023

View reviewed changes

chenmoneygithub force-pushed the contrastive branch from c0eaf4e to 47ad7a2 Compare April 4, 2023 04:33

chenmoneygithub requested a review from mattdangerw April 4, 2023 04:34

mattdangerw mentioned this pull request Apr 4, 2023

Remove the old sampler utilities #948

Merged

mattdangerw requested changes Apr 5, 2023

View reviewed changes

chenmoneygithub and others added 6 commits April 5, 2023 18:21

rebase

4ddd679

better!

479c87f

better!

291eee5

renaming

8d57bd4

even better style

3fe775e

address comments

cb085a3

chenmoneygithub force-pushed the contrastive branch from 6b10de7 to 2bda2d7 Compare April 6, 2023 01:23

fix comments

e5877b3

chenmoneygithub force-pushed the contrastive branch from 2bda2d7 to e5877b3 Compare April 6, 2023 01:24

chenmoneygithub added 2 commits April 6, 2023 14:54

small fix

95a86e5

Merge branch 'master' into contrastive

77c8800

merge master

379d243

mattdangerw approved these changes Apr 7, 2023

View reviewed changes

one more pass

27c8ae1

fix tests

4eb17c4

chenmoneygithub merged commit 12549ce into keras-team:master Apr 7, 2023

Add contrastive sampler #896

Add contrastive sampler #896

Uh oh!

Conversation

chenmoneygithub commented Mar 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw commented Mar 22, 2023

Uh oh!

mattdangerw commented Mar 22, 2023

Uh oh!

chenmoneygithub commented Mar 26, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenmoneygithub commented Mar 31, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw commented Apr 5, 2023

Uh oh!

chenmoneygithub commented Apr 7, 2023

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenmoneygithub commented Apr 7, 2023

Uh oh!

chenmoneygithub commented Apr 7, 2023

Uh oh!

Uh oh!

chenmoneygithub commented Mar 21, 2023 •

edited

Loading