Add /v1/completions endpoint (OpenAI legacy completions API) to `transformers serve` by rain-1 · Pull Request #44558 · huggingface/transformers

rain-1 · 2026-03-10T07:09:07Z

Adds support for the legacy text completions endpoint, which accepts a freeform text prompt (no chat template) and returns generated text in choices[].text. Supports both streaming and non-streaming modes, suffix for fill-in-the-middle insertion, and proper finish_reason detection.

What does this PR do?

Hello, my motivation for this:

I work with base models. I often need to continue text documents. So I need the OpenAI /v1/completions that uses an LLM to continune text. This is applicable for base models/pretrained foundational models, which have not been post-trained as instruct models to follow a chat template.

The 'transformers' tool is amazingly useful for bringing up a quick API endpoint, and I would love this to support the ability to continue a prompt! That's why I worked with Claude Code/Opus 4.6 to produce this PR.

Note: While this is a legacy feature of the OpenAI API (they have moved away from providing base model support) vllm still supports. It's very useful for research or any work with pretrained models.

Here's an example script that tests the functionality

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

# Non-streaming
resp = client.completions.create(
    model="Qwen/Qwen3.5-0.8B-Base",
    prompt="The capital of France is",
    max_tokens=20
)
for i, choice in enumerate(resp.choices):
    print(i)
    print("-")
    print(choice.text)
    print("\n\n")

# Streaming
for chunk in client.completions.create(
    model="Qwen/Qwen3.5-0.8B-Base",
    prompt="Once upon a time",
    max_tokens=50,
    stream=True,
):
    print(chunk.choices[0].text, end="", flush=True)

So I run transformers serve then run that script and I get the result:

0
-
 Paris, and the capital of the United States is Washington, D.C. The capital of the United



, in a magical land of science, there was a very special kind of animal called a jellyfish. This jellyfish had a really interesting way of living.

So this implementation is working well for my purposes.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). No this provides a new feature to the transformers command line tool
Did you read the contributor guideline,
Pull Request section? yes
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. no, this PR is the first contact
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. Yes, API endpoint documented
Did you write any new necessary tests? *Yes, we added a test for this feature. and also tested the functionality with an independent pythoon script

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

based on git blame I tag @LysandreJik

…sformers serve` Adds support for the legacy text completions endpoint, which accepts a freeform text prompt (no chat template) and returns generated text in choices[].text. Supports both streaming and non-streaming modes, suffix for fill-in-the-middle insertion, and proper finish_reason detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rain-1 · 2026-03-10T07:12:30Z

Added documentation for the new API endpoint

LysandreJik · 2026-03-10T08:51:19Z

Thanks for the PR, I'll take a look!

stevhliu

the docs side lgtm, thanks!

docs/source/en/serve-cli/serving.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

rain-1 · 2026-03-11T09:29:27Z

Thanks for those improvements @stevhliu I've comitted them all

LysandreJik · 2026-03-19T12:57:38Z

Hey @rain-1, we're doing significant changes to the structure of serve at this time with @SunMarc, sorry your PR is getting delayed.

Btw, what prohibits you from updating to a newer, maintained API? I'm not sure if it's transformers serve's place to introduce deprecated API endpoints instead of supporting the existing/future ones

rain-1 · 2026-03-22T07:19:13Z

@LysandreJik No worries. I hold off on other contributions til this is done.

newer, maintained API

Do any of these support text continuation?

I'm working with pretrained LLM models. They do not have chat template. So I need an API endpoint that does input "Once upon a time, " -> output "Once upon a time, there was a princess who"

I'm not sure if it's transformers serve's place to introduce deprecated API endpoints

OpenAI deprecated this endpoint because they cannot safely serve base models to customers as part of their product. It's hard to do safety tuning on those.

Regardless of that.. LLM development still has the pretraining phase.

Most tools are using a chat template which is ontly trained into the model in the post-training phase.

rain-1 and others added 2 commits March 10, 2026 06:58

Add /v1/completions endpoint documentation to Serve CLI docs

9c8ce44

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stevhliu approved these changes Mar 10, 2026

View reviewed changes

docs/source/en/serve-cli/serving.md Outdated Show resolved Hide resolved

docs/source/en/serve-cli/serving.md Outdated Show resolved Hide resolved

docs/source/en/serve-cli/serving.md Outdated Show resolved Hide resolved

rain-1 and others added 3 commits March 11, 2026 09:28

Update docs/source/en/serve-cli/serving.md

9ba3384

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/serve-cli/serving.md

0c30ce4

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/serve-cli/serving.md

aef4673

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

rain-1 added 3 commits March 11, 2026 11:19

Merge branch 'main' into main

52b6b13

Merge branch 'main' into main

aefc970

Merge branch 'main' into main

83578fe

Merge branch 'main' into main

71dc772

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /v1/completions endpoint (OpenAI legacy completions API) to `transformers serve`#44558

Add /v1/completions endpoint (OpenAI legacy completions API) to `transformers serve`#44558
rain-1 wants to merge 9 commits intohuggingface:mainfrom
rain-1:main

rain-1 commented Mar 10, 2026 •

edited

Loading

Uh oh!

rain-1 commented Mar 10, 2026

Uh oh!

LysandreJik commented Mar 10, 2026

Uh oh!

stevhliu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rain-1 commented Mar 11, 2026

Uh oh!

LysandreJik commented Mar 19, 2026

Uh oh!

rain-1 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rain-1 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

rain-1 commented Mar 10, 2026

Uh oh!

LysandreJik commented Mar 10, 2026

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rain-1 commented Mar 11, 2026

Uh oh!

LysandreJik commented Mar 19, 2026

Uh oh!

rain-1 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rain-1 commented Mar 10, 2026 •

edited

Loading