Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] OpenAI-compatible stop param #1731

Open
josephrocca opened this issue Jun 7, 2024 · 7 comments
Open

[Feature Request] OpenAI-compatible stop param #1731

josephrocca opened this issue Jun 7, 2024 · 7 comments
Assignees

Comments

@josephrocca
Copy link

josephrocca commented Jun 7, 2024

Using the latest official Docker image, openmmlab/lmdeploy:v0.4.2, I served a Llama 2 model, and sent a request with the stop parameter of the /v1/completions endpoint set to ["\n\n"]. But the generation didn't stop at a double newline. It generated lots of paragraphs with double newlines between them and kept going until it reached the maximum generation length.

I then saw in the docs that the stop param "Only accepts stop words that are encoded to one token index."

Not being able to stop at something simple like "\n\n" in a Llama 2 model is a pretty serious flaw that makes it hard to use this in a production setting. It would be great if the stop param were compatible with OpenAI.

(Also, while I'm here, it would also be very useful to have a include_stop_str_in_output option like in vLLM.)

Thanks!

@josephrocca josephrocca changed the title [Bug] v1/completions endpoint does not support stop param [Feature Request] OpenAI-compatible stop param Jun 7, 2024
@irexyc
Copy link
Collaborator

irexyc commented Jun 7, 2024

Thank you for your feedback. I will take a look at include_stop_str_in_output. Since it is not clear whether openai use 'stop str' or 'stop token id', I will look into the behavior of their apis with or without streaming.

@josephrocca
Copy link
Author

josephrocca commented Jun 7, 2024

it is not clear whether openai use 'stop str' or 'stop token id'

I forgot to mention, but ["the"] works correctly as a stop param, so the fact that ["\n\n"] does not work indicates to me that this issue is related to exact token matching/alignment.

@zhyncs
Copy link
Collaborator

zhyncs commented Jun 8, 2024

@zhyncs
Copy link
Collaborator

zhyncs commented Jun 8, 2024

ref vLLM include_stop_str_in_output
https://github.com/vllm-project/vllm/blob/c96fc067479453b02e92d9378eeeaebb6b3816de/vllm/sampling_params.py#L176-L183

@josephrocca
Copy link
Author

josephrocca commented Jun 8, 2024

Also, with vLLM, if finish_reason is "stop", and for example it was due to a stop parameter like stop:["\n"], then the EventStream message JSON also has stop_reason, like this:

...
"finish_reason":"stop",
"stop_reason":"\n",
...

I.e. indicating which stop string caused the stop. This is a handy feature, and nice for compatibility with vLLM (ease of transition), but not strictly necessary if include_stop_str_in_output feature is implemented.

@josephrocca
Copy link
Author

josephrocca commented Jun 8, 2024

For others who are hitting this issue, but who desperately want to use LMDeploy, you can of course remove the stop parameter, and then manually check for the stop strings in the full generated text each time you receive a new token, and then manually abort the request if one of those stop strings is detected. That's my current workaround.

@lvhan028
Copy link
Collaborator

lvhan028 commented Jun 9, 2024

In LMDeploy, the word in the stop_words list is supposed to be tokenized to ONE token id.
It does not support words that can be tokenized into multiple tokens as stop words now.
We have plans to resolve it. But it will take a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants