feat: stream model conversion by shikaku2 · Pull Request #1581 · leejet/stable-diffusion.cpp

shikaku2 · 2026-05-29T19:19:39Z

Split out from draft PR #1573: #1573

Summary

Changes --convert to stream converted tensors instead of allocating the entire converted model in one ggml_context before writing the output file.

This PR intentionally only covers the regular conversion memory/threading path. RMSE-guided conversion is not included here and will be handled separately after this is reviewed.

What changed

Collect output tensor metadata first without loading tensor data.
Write GGUF or safetensors metadata/header up front.
Load, convert, and write tensors in batches instead of keeping every converted tensor resident until the end.
Parallelize tensor loading/conversion within each batch.
Cap each batch by output tensor bytes, so large tensors still stream with bounded peak memory while smaller tensors can use available CPU threads.
Reuse the existing convert(input_path, vae_path, output_path, output_type, tensor_type_rules, convert_name) API and CLI behavior.

What is not included

No RMSE option or RMSE type selection.
No AIO/separate text encoder/diffusion/VAE packaging changes.
No --lazy-load runtime behavior changes.

Validation

cmake --build build -j16
git diff --check
Tiny safetensors -> GGUF conversion: build/bin/sd-cli -M convert -m /tmp/sdcpp-convert-tiny.safetensors -o /tmp/sdcpp-convert-tiny-final.gguf --type f16
Full SD3.5 Medium conversion: time build/bin/sd-cli -M convert -m /home/aaron/models/sd3.5-medium/sd3.5_medium.safetensors -o /tmp/sd3.5_medium_streaming_convert.gguf
- Output: /tmp/sd3.5_medium_streaming_convert.gguf, 4.8G
- Completed successfully in about 3.7s wall time on my machine

Notes

This is a draft because the new streaming writer path should get review and broader testing across output formats and platforms before being marked ready.

wbruna

First, about the coding style: this is placing format-specific logic inside convert.cpp. The format-specific code should go to the appropriate files inside model_io/, likely with a separate "write tensor" per file type. Note you should also avoid opening and closing the model files for each tensor, so some kind of "opened model file" abstraction will probably be needed. A "read the tensor at the specified offset" abstraction would probably make sense, too.

I gave this a try for a .safetensors -> Q4_K .gguf. On my machine, it was never able to saturate all CPU cores, so it got much slower than the normal conversion (around 1/2 - 1/3 speed). I/O didn't seem to be the bottleneck: system and wait times remained low.

Looking at the code, my guess would be the batching calculation: it would explain this behavior if for some reason it consistently used only 1 or 2 threads (the number of threads should also respect the --threads parameter by the way). The batching division also looks sub-optimal: you split up work between threads, then stop everything, write everything, then open threads again. So you are not allowing an overlap between the conversion and the writing; plus, a thread could finish much sooner than the others, and would stay idle until the next batch.

I would avoid the fixed batching, and use a true pipeline instead: either n read+convert threads + 1 write thread, or n read+convert+write threads, controlling for the memory budget with a condition variable. I would bet on the second option: if writing is the bottleneck, you'd naturally parallelize it as well.

Note you are not forced to write sequentially, either: you have offsets for each tensor, so they could be written as soon as they are ready, with each thread using its own open file object (I'd recommend preallocating the file at the beginning, to give the filesystem a better chance to avoid fragmentation issues). An out-of-order approach could also help with models with huge tensors, since you can try to overlap them with smaller ones.

feat: stream model conversion

1fbb26b

wbruna suggested changes May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stream model conversion#1581

feat: stream model conversion#1581
shikaku2 wants to merge 1 commit into
leejet:masterfrom
shikaku2:feat/streaming-convert

shikaku2 commented May 29, 2026

Uh oh!

wbruna left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shikaku2 commented May 29, 2026

Summary

What changed

What is not included

Validation

Notes

Uh oh!

wbruna left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants