Propagate the model loading from transformers serve to chat#44758
Propagate the model loading from transformers serve to chat#44758
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ebd54fa to
decc747
Compare
SunMarc
left a comment
There was a problem hiding this comment.
Thanks a lot, just a few nits and we can merge !
|
|
||
| def load_model_and_processor( | ||
| self, model_id_and_revision: str | ||
| self, model_id_and_revision: str, progress_callback: Callable[[dict], None] | None = None |
There was a problem hiding this comment.
should we also propagate to load_audio_model_and_processor ?
There was a problem hiding this comment.
I would be down for us to do that, but maybe in a follow-up? It's not a hugely used path for now and no client would benefit from it at this time
| def load_model_and_processor( | ||
| self, model_id_and_revision: str | ||
| self, model_id_and_revision: str, progress_callback: Callable[[dict], None] | None = None |
There was a problem hiding this comment.
can you add some docstring for progress_callback ?
src/transformers/cli/chat.py
Outdated
| } | ||
|
|
||
| progress = Progress( | ||
| TextColumn("[bold]{task.description}", table_column=Column(width=50, no_wrap=True)), |
There was a problem hiding this comment.
This is cropping the description sometimes when it is too long. Maybe we shouldn't put the model name in the description
There was a problem hiding this comment.
I put the model name in the desc only if we have the space to show it, otherwise I remove it
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
|
|
||
| def test_concurrent_load_same_model(self): |
There was a problem hiding this comment.
nice, we should add more concurrent test in general for our endpoints to see if they behaves correctly. I will probably do that in the refactor
Provides a nicer feedback when
transformers chatloads a model, instead of hangingScreen.Recording.2026-03-17.at.00.00.52.mov
Adds a
POST /load_modelendpoint totransformers servethat streams model loading progress via Server-Sent Events (SSE). Previously,transformers chatwould hang with no feedback while downloading and loading a model — now users see real-time progress for each phase.transformers chatcalls this endpoint on startup and renders the progress with Rich progress bars.The
/load_modelSSE protocolThe endpoint uses a simple, implementation-agnostic SSE protocol designed to be easy to consume from any language or framework. No Python or library-specific details leak into the wire format.
Request:
Response:
text/event-stream— each frame isdata: <json>\n\nEvery event has
statusandmodel. Three possible statuses:downloadis skipped when files are already cached locally. Progress events include{"current": int, "total": int}.Example stream (fresh load with download):
When the model is already in memory, a single event is returned:
Integration in
chat.pyThe
RichInterface.print_model_load()method inchat.pyconsumes this SSE stream and renders it with Rich progress bars. It's intentionally kept simple as a reference implementation — the client just reads events sequentially and reacts to each status/stage:processor/config→ status textdownload→ byte-progress bar with speed and ETAweights→ item-progress barready→ done messageerror→ raise