Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve#44558
Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve#44558rain-1 wants to merge 9 commits intohuggingface:mainfrom
transformers serve#44558Conversation
…sformers serve` Adds support for the legacy text completions endpoint, which accepts a freeform text prompt (no chat template) and returns generated text in choices[].text. Supports both streaming and non-streaming modes, suffix for fill-in-the-middle insertion, and proper finish_reason detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Added documentation for the new API endpoint |
|
Thanks for the PR, I'll take a look! |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
|
Thanks for those improvements @stevhliu I've comitted them all |
|
Hey @rain-1, we're doing significant changes to the structure of serve at this time with @SunMarc, sorry your PR is getting delayed. Btw, what prohibits you from updating to a newer, maintained API? I'm not sure if it's transformers serve's place to introduce deprecated API endpoints instead of supporting the existing/future ones |
|
@LysandreJik No worries. I hold off on other contributions til this is done.
Do any of these support text continuation? I'm working with pretrained LLM models. They do not have chat template. So I need an API endpoint that does input "Once upon a time, " -> output "Once upon a time, there was a princess who"
OpenAI deprecated this endpoint because they cannot safely serve base models to customers as part of their product. It's hard to do safety tuning on those. Regardless of that.. LLM development still has the pretraining phase. Most tools are using a chat template which is ontly trained into the model in the post-training phase. |
What does this PR do?
Hello, my motivation for this:
I work with base models. I often need to continue text documents. So I need the OpenAI /v1/completions that uses an LLM to continune text. This is applicable for base models/pretrained foundational models, which have not been post-trained as instruct models to follow a chat template.
The 'transformers' tool is amazingly useful for bringing up a quick API endpoint, and I would love this to support the ability to continue a prompt! That's why I worked with Claude Code/Opus 4.6 to produce this PR.
Note: While this is a legacy feature of the OpenAI API (they have moved away from providing base model support) vllm still supports. It's very useful for research or any work with pretrained models.
Here's an example script that tests the functionality
So I run
transformers servethen run that script and I get the result:So this implementation is working well for my purposes.
Fixes # (issue)
Before submitting
Pull Request section? yes
to it if that's the case. no, this PR is the first contact
documentation guidelines, and
here are tips on formatting docstrings. Yes, API endpoint documented
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
based on git blame I tag @LysandreJik