Propagate the model loading from transformers serve to chat by LysandreJik · Pull Request #44758 · huggingface/transformers

LysandreJik · 2026-03-16T15:02:15Z

Provides a nicer feedback when transformers chat loads a model, instead of hanging

Screen.Recording.2026-03-17.at.00.00.52.mov

Adds a POST /load_model endpoint to transformers serve that streams model loading progress via Server-Sent Events (SSE). Previously, transformers chat would hang with no feedback while downloading and loading a model — now users see real-time progress for each phase.

transformers chat calls this endpoint on startup and renders the progress with Rich progress bars.

The `/load_model` SSE protocol

The endpoint uses a simple, implementation-agnostic SSE protocol designed to be easy to consume from any language or framework. No Python or library-specific details leak into the wire format.

Request:

POST /load_model Content-Type: application/json

{"model": "HuggingFaceTB/SmolLM2-135M-Instruct"}

Response: text/event-stream — each frame is data: <json>\n\n

Every event has status and model. Three possible statuses:

Status	Terminal?	Extra fields	Description
loading	No	stage, optionally progress	Something is being loaded
ready	Yes	cached: bool	Model is available
error	Yes	message: str	Loading failed

download is skipped when files are already cached locally. Progress events include {"current": int, "total": int}.

Example stream (fresh load with download):

data: {"status": "loading", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "stage": "processor"}
data: {"status": "loading", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "stage": "config"}
data: {"status": "loading", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "stage": "download", "progress": {"current": 67020052, "total": 269060552}}
data: {"status": "loading", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "stage": "download", "progress": {"current": 269060552, "total": 269060552}}
data: {"status": "loading", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "stage": "weights", "progress": {"current": 136, "total": 272}}
data: {"status": "loading", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "stage": "weights", "progress": {"current": 272, "total": 272}}
data: {"status": "ready", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "cached": false}

When the model is already in memory, a single event is returned:

data: {"status": "ready", "model": "HuggingFaceTB/SmolLM2-135M-Instruct@main", "cached": true}

Integration in `chat.py`

The RichInterface.print_model_load() method in chat.py consumes this SSE stream and renders it with Rich progress bars. It's intentionally kept simple as a reference implementation — the client just reads events sequentially and reacts to each status/stage:

processor / config → status text
download → byte-progress bar with speed and ETA
weights → item-progress bar
ready → done message
error → raise

HuggingFaceDocBuilderDev · 2026-03-16T15:12:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks ! Just a few nits

src/transformers/cli/serve.py

SunMarc

Thanks a lot, just a few nits and we can merge !

docs/source/en/serve-cli/serving.md

src/transformers/utils/logging.py

SunMarc · 2026-03-18T13:12:53Z

src/transformers/cli/serve.py


    def load_model_and_processor(
-        self, model_id_and_revision: str
+        self, model_id_and_revision: str, progress_callback: Callable[[dict], None] | None = None


should we also propagate to load_audio_model_and_processor ?

I would be down for us to do that, but maybe in a follow-up? It's not a hugely used path for now and no client would benefit from it at this time

SunMarc · 2026-03-18T13:13:12Z

src/transformers/cli/serve.py

    def load_model_and_processor(
-        self, model_id_and_revision: str
+        self, model_id_and_revision: str, progress_callback: Callable[[dict], None] | None = None


can you add some docstring for progress_callback ?

src/transformers/cli/serve.py

SunMarc · 2026-03-18T14:17:57Z

src/transformers/cli/chat.py

+        }
+
+        progress = Progress(
+            TextColumn("[bold]{task.description}", table_column=Column(width=50, no_wrap=True)),


This is cropping the description sometimes when it is too long. Maybe we shouldn't put the model name in the description

I put the model name in the desc only if we have the space to show it, otherwise I remove it

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

SunMarc

Thanks for this clean PR !

SunMarc · 2026-03-19T13:54:50Z

tests/cli/test_serve.py

+
+    def test_concurrent_load_same_model(self):


nice, we should add more concurrent test in general for our endpoints to see if they behaves correctly. I will probably do that in the refactor

SunMarc reviewed Mar 17, 2026

View reviewed changes

src/transformers/cli/serve.py Outdated Show resolved Hide resolved

src/transformers/cli/serve.py Outdated Show resolved Hide resolved

src/transformers/cli/serve.py Outdated Show resolved Hide resolved

src/transformers/cli/serve.py Outdated Show resolved Hide resolved

Propagate the model loading from transformers serve to chat

decc747

LysandreJik force-pushed the transformers-chat-serve-progress branch from ebd54fa to decc747 Compare March 18, 2026 11:18

Docs and tests

9ddcbcd

LysandreJik marked this pull request as ready for review March 18, 2026 11:52

SunMarc reviewed Mar 18, 2026

View reviewed changes

LysandreJik and others added 5 commits March 19, 2026 11:22

Apply suggestions from code review

f5496c6

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

logigng update

a6ff559

Adjust docs re Marc's comment

e75ee00

Remove model name if too long for current console size

39ee367

Refactor dual model loading w/ locks

975383e

SunMarc approved these changes Mar 19, 2026

View reviewed changes

SunMarc added this pull request to the merge queue Mar 19, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2026

Merge branch 'main' into transformers-chat-serve-progress

7462214

SunMarc added this pull request to the merge queue Mar 19, 2026

Merged via the queue into main with commit e94695e Mar 19, 2026
29 checks passed

SunMarc deleted the transformers-chat-serve-progress branch March 19, 2026 17:20

Conversation

LysandreJik commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The /load_model SSE protocol

Integration in chat.py

Uh oh!

HuggingFaceDocBuilderDev commented Mar 16, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SunMarc Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

LysandreJik Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

SunMarc Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SunMarc Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

LysandreJik Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LysandreJik commented Mar 16, 2026 •

edited

Loading

The `/load_model` SSE protocol

Integration in `chat.py`