Enable greedy sampling #70

aashiqmuhamed · 2023-05-24T06:31:25Z

This CR enables greedy sampling in model.generate on XLA devices such as Trainium and TPU. This addresses issues such as huggingface/transformers#18661 and huggingface/transformers#12322.

The implementation is inspired by the corresponding Tensorflow generate function in transformers. The CR uses conditional statements to support greedy sampling, and also implements kv-cache functionality that is XLA compatible.

aashiqmuhamed · 2023-05-24T06:32:27Z

@michaelbenayoun Could you please take a look when you get a chance?

optimum/neuron/generation/utils.py

tests/test_generate.py

michaelbenayoun · 2023-05-24T09:35:42Z

tests/test_generate.py

+
+    return np.array(results)
+
+


Parameterize this test to test all the generative models we support.

We are testing sampling for GPT and BART models at the moment. In the revision, I've currently included t5-small, and will include more models over the next few weeks.

Where is it tested for GPT and BART?
Anyways, alright let's do that!

GPT and BART will be committed after we merge this CR. We are currently fixing a few bugs,

optimum/neuron/generation/utils.py

michaelbenayoun · 2023-05-25T08:51:43Z

tests/test_generate.py

+
+    return np.array(results)
+
+


Where is it tested for GPT and BART?
Anyways, alright let's do that!

bocchris-aws · 2023-05-30T14:14:43Z

optimum/neuron/generation/utils.py

+                next_token_logits = outputs.logits[:, -1, :]
+
+            # pre-process distribution
+            next_tokens_scores = logits_processor(input_ids, next_token_logits)


In the non-XLA version, input_ids is of dim bs x seq_length. With the padding we introduced here, dimensions change. Also, we don't know which exact processor is being used and whether it is supported by XLA -> I think we can't expect each logit processor to be XLA-compatible, so we should probably compute on CPU.

Suggested change

next_tokens_scores = logits_processor(input_ids, next_token_logits)

if is_torch_tpu_available():

input_ids_ = input_ids.to('cpu')[:, :seq_length]

next_token_logits_ = next_token_logits.to('cpu')

next_tokens_scores = logits_processor(input_ids_, next_token_logits_)

next_tokens_scores = next_tokens_scores.to(input_ids.device)

else:

next_tokens_scores = logits_processor(input_ids, next_token_logits)

@aashiqmuhamed want to commit this?

Yes, I'm committing without the is_torch_tpu_available() check, since we expect to run on Trainium by default.

michaelbenayoun

Some tests are failing, left comments to fix that.

michaelbenayoun · 2023-06-02T08:46:27Z

optimum/neuron/generation/utils.py

+
+import torch
+import torch.distributed as dist
+import torch_xla.core.xla_model as xm


This causes an import error.
Can you do:

from ..utils import is_torch_xla_available if is_torch_xla_available(): import torch_xla.core.xla_model

Basically we want to be able to import and test code on regular machines when we do not need all of this.

Got it, is the optimum neuron library designed for both CPU and Trainium?

Only Trainium but it is more to be able to run non-trainum dependent tests on regular machines. Without this the test will fail if we dont have torch_xla installed even though we want to test something unrelated.

tests/test_generate.py

michaelbenayoun

Some tests are failing, left comments to fix that.

michaelbenayoun

LGTM!
Will merge if the tests pass.

HuggingFaceDocBuilderDev · 2023-06-05T08:50:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

michaelbenayoun · 2023-06-05T11:54:00Z

The EC2 runners cannot be used because they need secrets from this repo, which you do not have on your fork...
We can skip them for now.

Can you run the following command please:

make style

This will fix the styling error you currently have.

Enable greedy sampling

16edd63

michaelbenayoun reviewed May 24, 2023

View reviewed changes

Refactored based on comments in revision 1

de9f4de

michaelbenayoun reviewed May 25, 2023

View reviewed changes

philschmid mentioned this pull request May 26, 2023

Add greedy-sampling to GenerationMixin #77

Closed

bocchris-aws reviewed Jun 1, 2023

View reviewed changes

michaelbenayoun reviewed Jun 2, 2023

View reviewed changes

aashiqmuhamed and others added 3 commits June 3, 2023 07:31

Merge branch 'huggingface:main' into main

f299fb7

Supporting decoder model sampling

7704dbe

Addressed comments in rev2

0d16bc6

michaelbenayoun approved these changes Jun 5, 2023

View reviewed changes

aashiqmuhamed and others added 3 commits June 5, 2023 19:32

Merge branch 'huggingface:main' into main

94c90fe

Formatted files with make style

d60d3e7

Reformatting after rebase

cc878dc

michaelbenayoun merged commit 7bc8e9b into huggingface:main Jun 5, 2023
6 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable greedy sampling #70

Enable greedy sampling #70

aashiqmuhamed commented May 24, 2023

aashiqmuhamed commented May 24, 2023

michaelbenayoun May 24, 2023

aashiqmuhamed May 25, 2023

michaelbenayoun May 25, 2023

aashiqmuhamed Jun 3, 2023

michaelbenayoun May 25, 2023

bocchris-aws May 30, 2023

michaelbenayoun Jun 2, 2023

aashiqmuhamed Jun 3, 2023

michaelbenayoun left a comment

michaelbenayoun Jun 2, 2023

aashiqmuhamed Jun 3, 2023

michaelbenayoun Jun 5, 2023

michaelbenayoun left a comment

michaelbenayoun left a comment

HuggingFaceDocBuilderDev commented Jun 5, 2023

michaelbenayoun commented Jun 5, 2023

-            next_tokens_scores = logits_processor(input_ids, next_token_logits)
+            if is_torch_tpu_available():
+                input_ids_ = input_ids.to('cpu')[:, :seq_length]
+                next_token_logits_ = next_token_logits.to('cpu')
+                next_tokens_scores = logits_processor(input_ids_, next_token_logits_)
+                next_tokens_scores = next_tokens_scores.to(input_ids.device)
+            else:
+                next_tokens_scores = logits_processor(input_ids, next_token_logits)

Enable greedy sampling #70

Enable greedy sampling #70

Conversation

aashiqmuhamed commented May 24, 2023

aashiqmuhamed commented May 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 5, 2023

michaelbenayoun commented Jun 5, 2023