[Feature Request] OpenAI-compatible `stop` param #1731

josephrocca · 2024-06-07T05:16:00Z

Using the latest official Docker image, openmmlab/lmdeploy:v0.4.2, I served a Llama 2 model, and sent a request with the stop parameter of the /v1/completions endpoint set to ["\n\n"]. But the generation didn't stop at a double newline. It generated lots of paragraphs with double newlines between them and kept going until it reached the maximum generation length.

I then saw in the docs that the stop param "Only accepts stop words that are encoded to one token index."

Not being able to stop at something simple like "\n\n" in a Llama 2 model is a pretty serious flaw that makes it hard to use this in a production setting. It would be great if the stop param were compatible with OpenAI.

(Also, while I'm here, it would also be very useful to have a include_stop_str_in_output option like in vLLM.)

Thanks!

The text was updated successfully, but these errors were encountered:

irexyc · 2024-06-07T08:47:48Z

Thank you for your feedback. I will take a look at include_stop_str_in_output. Since it is not clear whether openai use 'stop str' or 'stop token id', I will look into the behavior of their apis with or without streaming.

josephrocca · 2024-06-07T15:46:40Z

it is not clear whether openai use 'stop str' or 'stop token id'

I forgot to mention, but ["the"] works correctly as a stop param, so the fact that ["\n\n"] does not work indicates to me that this issue is related to exact token matching/alignment.

zhyncs · 2024-06-08T05:15:10Z

ref https://help.openai.com/en/articles/5072263-how-do-i-use-stop-sequences-in-the-openai-api

zhyncs · 2024-06-08T05:22:46Z

ref vLLM include_stop_str_in_output
https://github.com/vllm-project/vllm/blob/c96fc067479453b02e92d9378eeeaebb6b3816de/vllm/sampling_params.py#L176-L183

josephrocca · 2024-06-08T08:17:05Z

Also, with vLLM, if finish_reason is "stop", and for example it was due to a stop parameter like stop:["\n"], then the EventStream message JSON also has stop_reason, like this:

...
"finish_reason":"stop",
"stop_reason":"\n",
...

I.e. indicating which stop string caused the stop. This is a handy feature, and nice for compatibility with vLLM (ease of transition), but not strictly necessary if include_stop_str_in_output feature is implemented.

josephrocca · 2024-06-08T21:10:58Z

For others who are hitting this issue, but who desperately want to use LMDeploy, you can of course remove the stop parameter, and then manually check for the stop strings in the full generated text each time you receive a new token, and then manually abort the request if one of those stop strings is detected. That's my current workaround.

lvhan028 · 2024-06-09T07:29:10Z

In LMDeploy, the word in the stop_words list is supposed to be tokenized to ONE token id.
It does not support words that can be tokenized into multiple tokens as stop words now.
We have plans to resolve it. But it will take a while.

josephrocca changed the title ~~[Bug] v1/completions endpoint does not support stop param~~ [Feature Request] OpenAI-compatible stop param Jun 7, 2024

lvhan028 assigned irexyc Jun 7, 2024

This was referenced Jun 7, 2024

docs: add BentoLMDeploy in README #1736

Merged

[RFC] Automatic Prefix Caching vllm-project/vllm#2614

Closed

josephrocca mentioned this issue Jun 8, 2024

[Bug] Many concurrent requests with --enable-prefix-caching AND --quant-policy 8 crashes with: CUDA runtime error: an illegal memory access was encountered /opt/lmdeploy/src/turbomind/utils/allocator.h:231 #1744

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] OpenAI-compatible `stop` param #1731

[Feature Request] OpenAI-compatible `stop` param #1731

josephrocca commented Jun 7, 2024 •

edited

Loading

irexyc commented Jun 7, 2024

josephrocca commented Jun 7, 2024 •

edited

Loading

zhyncs commented Jun 8, 2024

zhyncs commented Jun 8, 2024

josephrocca commented Jun 8, 2024 •

edited

Loading

josephrocca commented Jun 8, 2024 •

edited

Loading

lvhan028 commented Jun 9, 2024

[Feature Request] OpenAI-compatible stop param #1731

[Feature Request] OpenAI-compatible stop param #1731

Comments

josephrocca commented Jun 7, 2024 • edited Loading

irexyc commented Jun 7, 2024

josephrocca commented Jun 7, 2024 • edited Loading

zhyncs commented Jun 8, 2024

zhyncs commented Jun 8, 2024

josephrocca commented Jun 8, 2024 • edited Loading

josephrocca commented Jun 8, 2024 • edited Loading

lvhan028 commented Jun 9, 2024

[Feature Request] OpenAI-compatible `stop` param #1731

[Feature Request] OpenAI-compatible `stop` param #1731

josephrocca commented Jun 7, 2024 •

edited

Loading

josephrocca commented Jun 7, 2024 •

edited

Loading

josephrocca commented Jun 8, 2024 •

edited

Loading

josephrocca commented Jun 8, 2024 •

edited

Loading