Add stop sequence to text generation pipeline #18444

KMFODA · 2022-08-03T08:36:59Z

What does this PR do?

As per the conversation in #17562, creating this draft PR to add a stop_sequence option to text generation pipelines.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Narsil

Models:

All

Library:

text generation: @patrickvonplaten
pipelines: @LysandreJik

HuggingFaceDocBuilderDev · 2022-08-03T08:50:29Z

The documentation is not available anymore as the PR was closed or merged.

KMFODA · 2022-08-08T09:06:51Z

Hey @Narsil. I've managed to get this working for greedy decoding and multimodal sampling. For beam-search, what would be the best approach to deal with a stop_sequence? I've assumed that if a stop_sequence appears in any of the beams then we stop the generation process.

Should it instead be that we wait until each beam reaches the stop_sequence or any other stopping criteria before stopping the generation process?

Narsil

LGTM.

I pinged other maintainers to get an advice.

The main thing is that EOS is already handled without a stopping criteria so I don't know if we should add the new StoppingCriteria.

Also we should add some simple tests.

Ideally just set up a random model from hf-internal-testing, generate 5 tokens, look at the results and use token 3 as the new eos_token_id, decode it to get it as string, then reuse the generation with generate(..., stop_sequence='xx') and verify we stopped at token 3.

(We can leave the first steps with checks just so that the readers of the test can understand why we're supposed to stop at token 3).

src/transformers/pipelines/text2text_generation.py

Narsil · 2022-08-08T12:08:49Z

src/transformers/pipelines/text2text_generation.py

+                stop_sequence
+            )
+            if len(stop_sequence_ids) > 3:
+                warnings.warn(f"Stopping on a multiple token sequence is not yet supported on transformers. The first token of the stop sequence will be used as the stop sequence string in the interim.")


Great message.

src/transformers/pipelines/text2text_generation.py

Narsil · 2022-08-08T12:10:23Z

src/transformers/pipelines/text_generation.py

+        if stop_sequence is not None:
+            stop_sequence_ids = self.tokenizer.encode(
+                stop_sequence
+            )
+            if len(stop_sequence_ids) > 1:
+                warnings.warn(f"Stopping on a multiple token sequence is not yet supported on transformers. The first token of the stop sequence will be used as the stop sequence string in the interim.")
+            generate_kwargs["eos_token_id"] = stop_sequence_ids[0]


I think implementing either in generate or here is enough.
We shouldn't try to implement it everywhere.

@gante @patrickvonplaten Are you ok if it's directly included in generate (otherwise we can keep it just in the pipeline).

src/transformers/pipelines/text_generation.py

Narsil · 2022-08-08T12:11:21Z

src/transformers/generation_stopping_criteria.py

+
+    @add_start_docstrings(STOPPING_CRITERIA_INPUTS_DOCSTRING)
+    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
+        return sum([self.eos_token_id in i for i in input_ids]) == input_ids.shape[0]


I think this line is wrong and the following one is correct, no ?

Hey @Narsil thanks for the helpful comments. And apologies for the few quality errors I was planning on addressing those after we decided on the strategy for eos_token_id.

Here specifically I had 2 different returns because I was experimenting with different approaches to stop a beam search. One stops if any of the beams reach the eos_token_id the other waits for all of them to reach the eos_token_id.

Oh I see. I think beam search should wait on all beams being "done". (And you keep the eos ones has long as their score allows.

I didn't touch enough the beam search code to be sure how to handle that.
Also I think you only need to check then final tokens to prevent looping over the whole input_ids.

input_ids[:, -1] == eos_token_id no ?

It's a bit more complex than that -- we can only check new tokens, but different batch sequences may generate eos_token_id at different times.

The generate functions use an auxiliary variable to keep track of which members of the batch have already finished, see here

In any case, I'd recommend doing this change in a separate PR :) In addition to adding the stopping criteria, we also need to remove the existing equivalent code from generate

Narsil · 2022-08-08T12:18:39Z

Should it instead be that we wait until each beam reaches the stop_sequence or any other stopping criteria before stopping the generation process?

@KMFODA I think eos_token_id is already handled for beam search, see my comment on the StoppingCriteria.

I will let others comment on the best way to do this in .generate but I think we don't need the criteria, just let eos_token_id regular logic apply (it's handled separately from StoppingCriteria).

Narsil · 2022-08-08T12:24:19Z

For the tests removing the breakpoint should help then for code quality.

pip install -e .[quality]
make fixup

Should do the trick.

gante · 2022-08-08T14:33:37Z

@Narsil @KMFODA I'm in favor of moving it to a StoppingCriteria, so that all conditions that can terminate generation fall under the same class. However, it should be noted that it is not a requirement to complete the issue, i.e. to add a stop sequence to the text generation pipeline :P

It is already implemented on the multiple generation strategies (e.g. here for greedy search). Also, the existing implementation is different from the current PR -- the existing implementation only checks whether the eos_token is present in newly generated tokens. This is because models like GPT-2 often set pad_token_id to eos_token_id, and we don't want the pad tokens to trigger this condition.

KMFODA · 2022-08-09T06:55:26Z

Thanks @Narsil @gante. Okay so for the sake of deploying iteratively I've removed the eos_token_id from the StoppingCriteria and will add it as a separate PR.

I've added a test for the stop_sequence being fed in at the pipeline level. When @Narsil's comment around wether the stop sequence should be handled in the pipeline or in the generation_kwargs is addressed I can alter this test accordingly.

Narsil

This looks good to me.

I think if the test needs to change if it's going to be in generate testing files. to only use generate.

We should implement stop_sequence only once (probably in generate) but we could have 2 tests if you want to test the full pipeline too. (Probably in tests/pipelines/test_pipelines_text_generation.py for instance.)

Narsil · 2022-08-09T16:12:13Z

src/transformers/generation_stopping_criteria.py

+
+    @add_start_docstrings(STOPPING_CRITERIA_INPUTS_DOCSTRING)
+    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
+        return sum([self.eos_token_id in i for i in input_ids]) == input_ids.shape[0]


Oh I see. I think beam search should wait on all beams being "done". (And you keep the eos ones has long as their score allows.

I didn't touch enough the beam search code to be sure how to handle that.
Also I think you only need to check then final tokens to prevent looping over the whole input_ids.

input_ids[:, -1] == eos_token_id no ?

Narsil · 2022-08-09T16:17:09Z

tests/generation/test_generation_utils.py

+    def test_stop_sequence_stopping_criteria(self):
+        prompt = """Hello I believe in"""
+        generator = pipeline("text-generation", model="hf-internal-testing/tiny-random-bart")
+        output = generator(prompt, stop_sequence=" number")
+        self.assertEqual(output[0]["generated_text"].split()[-1], "number")
+


Suggested change

def test_stop_sequence_stopping_criteria(self):

prompt = """Hello I believe in"""

generator = pipeline("text-generation", model="hf-internal-testing/tiny-random-bart")

output = generator(prompt, stop_sequence=" number")

self.assertEqual(output[0]["generated_text"].split()[-1], "number")

def test_stop_sequence_stopping_criteria(self):

prompt = """Hello I believe in"""

generator = pipeline("text-generation", model="hf-internal-testing/tiny-random-bart")

output = generator(prompt)

self.assertEqual(output, [{'generated_text': 'Hello I believe in in in number number number number number number number number number'}])

output = generator(prompt, stop_sequence=" number")

self.assertEqual(output, [{'generated_text': 'Hello I believe in in in number'}])

I think this formulation conveys the intent of the test a tiny bit better.
Also since you were only testing the last generated token, if we deactivated the whole option your test would still pass since the model just generates number all the time.
Wdyt?

this makes sense. I'll change it to that.

KMFODA · 2022-08-10T07:08:41Z

We should implement stop_sequence only once (probably in generate) but we could have 2 tests if you want to test the full pipeline too. (Probably in tests/pipelines/test_pipelines_text_generation.py for instance.)

If we were to move stop_sequence to be in generate wouldn't we have to tokenise it first. In that case what's the reasoning behind feeding it as a stop_sequence instead of a eos_token_id?

Narsil · 2022-08-12T14:08:04Z

If we were to move stop_sequence to be in generate wouldn't we have to tokenise it first. In that case what's the reasoning behind feeding it as a stop_sequence instead of a eos_token_id?

You're entirely right, oversight on my part. eos_token_id already does the job. So we just need to implement stop_sequence in the pipeline to tokenize the stop_sequence and produce the eos_token_id and just feed it to generate.
So no additional code in generate should be needed actually.

Sorry, failed to see that.

KMFODA · 2022-08-15T11:37:18Z

No problem I've just moved the stop_sequence back to the pipeline function and added the tests you requested in the tests/pipelines/test_pipelines_text_generation.py folder. This should make this PR ready for review now.

When I was playing with the stop_sequence though I found that sometime when I add a specific stop_sequence the output changes and avoids mentioning the word entirely. I don't have live examples now but I just wanted to check if this is normal behaviour? If not I can find examples on public models and share it in a different issue.

Narsil

LGTM !

Narsil · 2022-08-22T10:02:27Z

src/transformers/generation_utils.py

@@ -1063,7 +1063,7 @@ def generate(
            exponential_decay_length_penalty (`tuple(int, float)`, *optional*, defaults to `model.config.exponential_decay_length_penalty`):
                This Tuple adds an exponentially increasing length penalty, after a certain amount of tokens have been
                generated. The tuple shall consist of: `(start_index, decay_factor)` where `start_index` indicates
-                where penalty starts and `decay_factor` represents the factor of exponential decay
+                where penalty starts and `decay_factor` represents the factor of exponential decays


I think without an s is actually better, no ? There's only a single decay.

Narsil · 2022-08-22T10:10:16Z

tests/pipelines/test_pipelines_text_generation.py

@@ -147,6 +147,24 @@ def get_test_pipeline(self, model, tokenizer, feature_extractor):
        text_generator = TextGenerationPipeline(model=model, tokenizer=tokenizer)
        return text_generator, ["This is a test", "Another test"]

+    def test_stop_sequence_stopping_criteria(self):
+        prompt = """Hello I believe in"""
+        text_generator = pipeline("text-generation", model="hf-internal-testing/tiny-random-bart")


bart is a seq2seq model so it will fail.

You can use https://huggingface.co/hf-internal-testing/tiny-random-gpt2 instead I think

gante

Other than the two comments I added and the failing tests, LGTM as well 👍

gante · 2022-08-24T16:07:02Z

tests/pipelines/test_pipelines_text_generation.py

@@ -147,6 +147,24 @@ def get_test_pipeline(self, model, tokenizer, feature_extractor):
        text_generator = TextGenerationPipeline(model=model, tokenizer=tokenizer)
        return text_generator, ["This is a test", "Another test"]

+    def test_stop_sequence_stopping_criteria(self):
+        prompt = """Hello I believe in"""
+        text_generator = pipeline("text-generation", model="hf-internal-testing/tiny-random-bart")


gante · 2022-08-24T16:08:16Z

src/transformers/generation_stopping_criteria.py

@@ -107,6 +107,24 @@ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwa
        return time.time() - self.initial_timestamp > self.max_time


+class EndOfStringCriteria(StoppingCriteria):


Since it is not used anywhere, I'd suggest adding this class in a follow-up PR, where we implement it and use it instead of the current logic for the eos token :)

github-actions · 2022-09-18T15:01:44Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten · 2022-09-27T11:51:42Z

@KMFODA I think your PR is almost ready to be merged! Would you like to try to fix the final problems and apply the review suggestions? :-)

KMFODA · 2022-09-28T14:21:36Z

Hey @patrickvonplaten. My apologies I was out sick over the past month. I worked on the suggestions now. Hopefully this should be good to merge now but if not let me know!

patrickvonplaten

Very nice! @gante I let you merge the PR :-)

gante · 2022-09-30T09:24:41Z

I'm happy with the PR, except for the EndOfStringCriteria class -- it is not being used, and it is not a good practice to add unused classes/functions.

@KMFODA can you remove it for now, and perhaps reintroduce it in a follow-up PR (with use cases)? :)

KMFODA · 2022-09-30T12:00:34Z

Hi @gante yes of course. I had removed it locally but somehow the changes didn't push through with one of the commits. Forced changed it now. Hopefully that looks good now :).

KMFODA added 2 commits August 2, 2022 12:50

Initial commit

14dd0f4

Add stop_sequence to text & text2text generation

3a1f506

Add eos stopping criteria

54a0947

Narsil reviewed Aug 8, 2022

View reviewed changes

Revert eos stopping criteria changes

d7049c0

KMFODA marked this pull request as ready for review August 9, 2022 07:19

Narsil reviewed Aug 9, 2022

View reviewed changes

Move stop_sequence to generate.py

64a068c

KMFODA added 2 commits August 15, 2022 12:33

Add tests

be01d37

Make quality

acc5006

Merge branch 'main' into add_stop_sequence

e902b42

Narsil approved these changes Aug 22, 2022

View reviewed changes

Narsil reviewed Aug 22, 2022

View reviewed changes

gante approved these changes Aug 24, 2022

View reviewed changes

github-actions bot closed this Sep 26, 2022

patrickvonplaten reopened this Sep 27, 2022

KMFODA added 3 commits September 28, 2022 09:51

Fix spelling + model name

ddee314

Merge remote-tracking branch 'origin/main' into add_stop_sequence

bfebb7e

fix failing test

ca35fff

Fix quality

56a4db2

patrickvonplaten approved these changes Sep 29, 2022

View reviewed changes

Update generation_stopping_criteria.py

a637b91

gante merged commit e396358 into huggingface:main Sep 30, 2022

This was referenced Dec 7, 2022

Stop Sequence microsoft/DeepSpeed-MII#109

Open

Add custom stop token ids for generation #20727

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stop sequence to text generation pipeline #18444

Add stop sequence to text generation pipeline #18444

KMFODA commented Aug 3, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 3, 2022 •

edited

Loading

KMFODA commented Aug 8, 2022

Narsil left a comment

Narsil Aug 8, 2022

Narsil Aug 8, 2022

Narsil Aug 8, 2022

KMFODA Aug 9, 2022 •

edited

Loading

Narsil Aug 9, 2022

gante Aug 9, 2022 •

edited

Loading

Narsil commented Aug 8, 2022

Narsil commented Aug 8, 2022

gante commented Aug 8, 2022

KMFODA commented Aug 9, 2022

Narsil left a comment

Narsil Aug 9, 2022

Narsil Aug 9, 2022

KMFODA Aug 10, 2022

KMFODA commented Aug 10, 2022

Narsil commented Aug 12, 2022

KMFODA commented Aug 15, 2022

Narsil left a comment

Narsil Aug 22, 2022

Narsil Aug 22, 2022

gante Aug 24, 2022

gante left a comment

gante Aug 24, 2022

gante Aug 24, 2022

github-actions bot commented Sep 18, 2022

patrickvonplaten commented Sep 27, 2022

KMFODA commented Sep 28, 2022

patrickvonplaten left a comment

gante commented Sep 30, 2022

KMFODA commented Sep 30, 2022

		@@ -107,6 +107,24 @@ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwa
		return time.time() - self.initial_timestamp > self.max_time


		class EndOfStringCriteria(StoppingCriteria):

Add stop sequence to text generation pipeline #18444

Add stop sequence to text generation pipeline #18444

Conversation

KMFODA commented Aug 3, 2022 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 3, 2022 • edited Loading

KMFODA commented Aug 8, 2022

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KMFODA Aug 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante Aug 9, 2022 • edited Loading

Choose a reason for hiding this comment

Narsil commented Aug 8, 2022

Narsil commented Aug 8, 2022

gante commented Aug 8, 2022

KMFODA commented Aug 9, 2022

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KMFODA commented Aug 10, 2022

Narsil commented Aug 12, 2022

KMFODA commented Aug 15, 2022

Narsil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 18, 2022

patrickvonplaten commented Sep 27, 2022

KMFODA commented Sep 28, 2022

patrickvonplaten left a comment

Choose a reason for hiding this comment

gante commented Sep 30, 2022

KMFODA commented Sep 30, 2022

KMFODA commented Aug 3, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 3, 2022 •

edited

Loading

KMFODA Aug 9, 2022 •

edited

Loading

gante Aug 9, 2022 •

edited

Loading