-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some inferences take forever to complete #450
Comments
Thank you so much for the detailed report! Will come back to you shortly. |
These timing results contain significant non-inference setup steps (e.g. |
Yes indeed! |
It would still be nice to have results without having it in the loop, and use cProfile to understand which step "gets stuck". To get to similar experimental conditions I would also use the |
Please try
And pass Additionally, you will need to set
With these changes your script works for me and doesn't have any slow or failed inference. |
Fixes #839 #908 #690 #450 ## Problem A major problem, especially with smaller language models, is the repetition problem. For example, let's say a model is generating json and must provide 12 space tokens for indentation in json output. Often a language model will assign a high probability to a 13th space token, and do the same for a 14th space, and then enter an infinite space generation loop. This is a problem with NLG that has been known for half a decade, but only has mitigations (mirostat, repetition penalty, using hundreds of billions of weights, etc), no absolute solutions (except for **structured generation**) ## Solution For structured json generation, we set a sane default whitespace pattern of `r"[ ]?"`. This removes all newlines and indentation. It disallows any syntactic whitespace beyond a single space separator. Users can still set the argument `whitespace_pattern=` if they want different behavior
Issue description
The issue was raised by other people on Discord too.
To quote one of them:
their screenshot
Repro
I made a reproduction code snippet that can run in Google Colab (w/ free T4 GPU):
💻 Code snippet
📃 Output
💥 Exceptions raised
Results
Outlines/Python version information:
The text was updated successfully, but these errors were encountered: