Generate: work around PT `multinomial` sampling 0 probability tokens #23088

gante · 2023-05-01T17:58:01Z

What does this PR do?

As raised in this transformers issue and this pytorch issue, multinomial can erroneously pick 0 probability tokens. According to the reports and my own observations, the error is much more likely on CPU.

There is a high chance that a token with -inf logits is selected: in this simple example with top_k=40, it happens 0.158% of the times on CPU -- or ~50% chance that a sequence with 500 newly generated tokens to have at least one token that shouldn't be there.

This PR adds a quick-and-dirty workaround, while the PT team works in the issue: at each sample step, pick 5 candidates, and keep the first valid one. Assuming independence, the probability of having one or more forbidden token in the example above drops to ~5e-10 %.

Runtime overhead: considering distilgpt2, a small model where operations outside the model have some weight, it got 2% slower on GPU (RTX3090) and 1% slower on CPU (Ryzen 9 5950X). On larger models, the slowdown becomes negligible.

HuggingFaceDocBuilderDev · 2023-05-01T18:13:43Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-05-01T18:40:09Z

As discussed internally, this is a regression on the PyTorch side for 2.0, so this should be fixed by PyTorch and not by us adding some overload to generate.

gante · 2023-05-01T19:09:01Z

(closing because of the comment above)

gante added 2 commits May 1, 2023 16:37

fix?

ca82b81

PT sample workaround

4863099

gante requested a review from sgugger May 1, 2023 17:58

gante closed this May 1, 2023

gante deleted the fix_22979 branch May 18, 2023 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: work around PT `multinomial` sampling 0 probability tokens #23088

Generate: work around PT `multinomial` sampling 0 probability tokens #23088

gante commented May 1, 2023 •

edited

HuggingFaceDocBuilderDev commented May 1, 2023 •

edited

sgugger commented May 1, 2023

gante commented May 1, 2023

Generate: work around PT multinomial sampling 0 probability tokens #23088

Generate: work around PT multinomial sampling 0 probability tokens #23088

Conversation

gante commented May 1, 2023 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented May 1, 2023 • edited

sgugger commented May 1, 2023

gante commented May 1, 2023

Generate: work around PT `multinomial` sampling 0 probability tokens #23088

Generate: work around PT `multinomial` sampling 0 probability tokens #23088

gante commented May 1, 2023 •

edited

HuggingFaceDocBuilderDev commented May 1, 2023 •

edited