RFC: ggml-bridge — Standardized Tensor Exchange with llama.cpp #1642
Replies: 3 comments
-
|
Found this: |
Beta Was this translation helpful? Give feedback.
-
|
Slop. Also No. Dont create a new file format. Just use existing gguf. You can convey all infi via kv and tensor names. |
Beta Was this translation helpful? Give feedback.
-
|
Fair point on both counts. You're right that GGUF covers the format need — We could update the proposal to use GGUF with a naming convention for bridge tensors instead of inventing a new format.... However I thought, a slim format would be straight forward. The core idea behind this: I'm building a image generation pipelines and IMHO there is way too much model specific code. Would that be something worth discussing? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Note
This RFC is cross-posted. The primary discussion is at:
ggml-org/llama.cpp#24538
RFC: ggml-bridge — Standardized Tensor Exchange Between llama.cpp and stable-diffusion.cpp
Authors: [TBD]
Status: Draft
Target: huggingface/llama.cpp, leejet/stable-diffusion.cpp
Date: June 2026
Abstract
We propose ggml-bridge, a lightweight specification and library for exchanging intermediate tensor data (embeddings, conditioning vectors) between ggml-based inference tools — primarily
llama.cppandstable-diffusion.cpp.This enables a UNIX-philosophy approach to multimodal AI: each binary does one thing well, and a standardized tensor pipe connects them.
Problem Statement
The Duplication Problem
stable-diffusion.cppcurrently reimplements transformer inference for text and vision encoders thatllama.cppalready handles — often better:This creates several problems:
The Multimodal Gap
Modern image generation models are increasingly multi-model pipelines:
Each pipeline combines a transformer encoder with a diffusion backbone. Today, sd.cpp must implement both internally. With ggml-bridge, the split becomes natural:
Proposed Solution
Architecture
Build Modes for sd.cpp
A key concern is that sd.cpp must remain standalone-capable. We propose three compile-time build modes via cmake, so the codebase can be cleanly separated without losing any capability:
STANDALONE (default — today's behavior)
Nothing changes. All internal encoders compiled in. No bridge dependency.
BRIDGED (slim — needs external llama.cpp)
Internal encoders stripped out. Conditioning must come via bridge files or SHM. Smallest possible binary, focused purely on diffusion inference.
JOINT (Mixture-of-Experts binary — standalone + bridge)
Statically links llama.cpp as the encoder backend. Single binary, fully standalone, but internally uses the clean bridge architecture. The bridge becomes an in-process function call — zero IPC overhead.
This is the best of both worlds: clean separation of concerns internally, single-file deployment externally.
Code Separation
The build mode controls which code path is compiled:
Over time, the
STANDALONEcode paths can be deprecated without breaking anything — theJOINTmode provides identical functionality with better optimization.File Format:
.ggmlb(ggml bridge)A minimal binary format for exchanging named tensors between processes. Designed to be:
Note
This is intentionally simpler than GGUF. GGUF is a model storage format with rich metadata.
.ggmlbis an IPC format — it carries only the tensors needed for one inference step.Transport Layer: File + SHM
The bridge supports two transport mechanisms with the same ggmlb format:
.ggmlbfile on diskshm_open/ Win32CreateFileMappingBoth transports mmap the same header + tensor layout. The only difference is the open call:
The CLI uses a
shm://prefix to select the transport:Note
On POSIX,
shm_open()returns a file descriptor that supportsmmap()— so the reader/writer code is nearly identical for both transports. On Windows,CreateFileMappingwithINVALID_HANDLE_VALUEprovides equivalent functionality.CLI Integration
llama.cpp:
--export-bridgesd.cpp:
--bridge-conditioning# Generate image using pre-computed conditioning sd-cli --model ideogram4-dit.gguf \ --bridge-conditioning clip_cond.ggmlb \ --bridge-conditioning vision_cond.ggmlb \ --output result.pngCombined pipeline (shell)
Use Cases
1. Ideogram 4 Character Reference (currently impossible in sd.cpp)
2. FLUX.2 with better T5 encoding
3. Audio-to-Image (future)
4. Batch processing with cached encodings
Benefits
For llama.cpp / Hugging Face
For sd.cpp / leejet
For the ecosystem
Implementation Roadmap
Phase 1: Minimal POC (weeks)
ggmlbreader/writer as a standalone C library (~500 lines)--export-bridgefor CLIP text embeddings--bridge-conditioningthat reads.ggmlbinstead of running internal CLIPPhase 2: Multi-encoder support (months)
Phase 3: Vision encoders (months)
Alternatives Considered
Open Questions
Important
Tensor naming convention: Should bridge files use standardized names (e.g.,
clip_l_hidden_states,t5_encoder_output) or model-specific names? A registry of standard names would improve interoperability.Important
JOINT mode linking: Should the JOINT binary link llama.cpp statically or dynamically? Static linking produces a single file but increases binary size. Dynamic linking (
libllama.so) allows shared updates but adds a deployment dependency.References
Beta Was this translation helpful? Give feedback.
All reactions