-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] OpenAI-compatible stop
param
#1731
Comments
v1/completions
endpoint does not support stop
paramstop
param
Thank you for your feedback. I will take a look at |
I forgot to mention, but |
ref vLLM |
Also, with vLLM, if
I.e. indicating which stop string caused the stop. This is a handy feature, and nice for compatibility with vLLM (ease of transition), but not strictly necessary if |
For others who are hitting this issue, but who desperately want to use LMDeploy, you can of course remove the |
In LMDeploy, the word in the |
Using the latest official Docker image,
openmmlab/lmdeploy:v0.4.2
, I served a Llama 2 model, and sent a request with thestop
parameter of the/v1/completions
endpoint set to["\n\n"]
. But the generation didn't stop at a double newline. It generated lots of paragraphs with double newlines between them and kept going until it reached the maximum generation length.I then saw in the docs that the
stop
param "Only accepts stop words that are encoded to one token index."Not being able to stop at something simple like "\n\n" in a Llama 2 model is a pretty serious flaw that makes it hard to use this in a production setting. It would be great if the stop param were compatible with OpenAI.
(Also, while I'm here, it would also be very useful to have a
include_stop_str_in_output
option like in vLLM.)Thanks!
The text was updated successfully, but these errors were encountered: