RFC: ggml-bridge — Standardized Tensor Exchange between with llama.cpp (and stable-diffusion.cpp) #24538
martinbu69
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
RFC: ggml-bridge — Standardized Tensor Exchange Between llama.cpp and stable-diffusion.cpp
Authors: [TBD]
Status: Draft
Target: huggingface/llama.cpp, leejet/stable-diffusion.cpp
Date: June 2026
Abstract
We propose ggml-bridge, a lightweight specification and library for exchanging intermediate tensor data (embeddings, conditioning vectors) between ggml-based inference tools — primarily
llama.cppandstable-diffusion.cpp.This enables a UNIX-philosophy approach to multimodal AI: each binary does one thing well, and a standardized tensor pipe connects them.
Problem Statement
The Duplication Problem
stable-diffusion.cppcurrently reimplements transformer inference for text and vision encoders thatllama.cppalready handles — often better:This creates several problems:
The Multimodal Gap
Modern image generation models are increasingly multi-model pipelines:
Each pipeline combines a transformer encoder with a diffusion backbone. Today, sd.cpp must implement both internally. With ggml-bridge, the split becomes natural:
Proposed Solution
Architecture
Build Modes for sd.cpp
A key concern is that sd.cpp must remain standalone-capable. We propose three compile-time build modes via cmake, so the codebase can be cleanly separated without losing any capability:
STANDALONE (default — today's behavior)
Nothing changes. All internal encoders compiled in. No bridge dependency.
BRIDGED (slim — needs external llama.cpp)
Internal encoders stripped out. Conditioning must come via bridge files or SHM. Smallest possible binary, focused purely on diffusion inference.
JOINT (Mixture-of-Experts binary — standalone + bridge)
Statically links llama.cpp as the encoder backend. Single binary, fully standalone, but internally uses the clean bridge architecture. The bridge becomes an in-process function call — zero IPC overhead.
This is the best of both worlds: clean separation of concerns internally, single-file deployment externally.
Code Separation
The build mode controls which code path is compiled:
Over time, the
STANDALONEcode paths can be deprecated without breaking anything — theJOINTmode provides identical functionality with better optimization.File Format:
.ggmlb(ggml bridge)A minimal binary format for exchanging named tensors between processes. Designed to be:
Note
This is intentionally simpler than GGUF. GGUF is a model storage format with rich metadata.
.ggmlbis an IPC format — it carries only the tensors needed for one inference step.Transport Layer: File + SHM
The bridge supports two transport mechanisms with the same ggmlb format:
.ggmlbfile on diskshm_open/ Win32CreateFileMappingBoth transports mmap the same header + tensor layout. The only difference is the open call:
The CLI uses a
shm://prefix to select the transport:Note
On POSIX,
shm_open()returns a file descriptor that supportsmmap()— so the reader/writer code is nearly identical for both transports. On Windows,CreateFileMappingwithINVALID_HANDLE_VALUEprovides equivalent functionality.CLI Integration
llama.cpp:
--export-bridgesd.cpp:
--bridge-conditioning# Generate image using pre-computed conditioning sd-cli --model ideogram4-dit.gguf \ --bridge-conditioning clip_cond.ggmlb \ --bridge-conditioning vision_cond.ggmlb \ --output result.pngCombined pipeline (shell)
Use Cases
1. Ideogram 4 Character Reference (currently impossible in sd.cpp)
2. FLUX.2 with better T5 encoding
3. Audio-to-Image (future)
4. Batch processing with cached encodings
Benefits
For llama.cpp / Hugging Face
For sd.cpp / leejet
For the ecosystem
Implementation Roadmap
Phase 1: Minimal POC (weeks)
ggmlbreader/writer as a standalone C library (~500 lines)--export-bridgefor CLIP text embeddings--bridge-conditioningthat reads.ggmlbinstead of running internal CLIPPhase 2: Multi-encoder support (months)
Phase 3: Vision encoders (months)
Alternatives Considered
Open Questions
Important
Tensor naming convention: Should bridge files use standardized names (e.g.,
clip_l_hidden_states,t5_encoder_output) or model-specific names? A registry of standard names would improve interoperability.Important
JOINT mode linking: Should the JOINT binary link llama.cpp statically or dynamically? Static linking produces a single file but increases binary size. Dynamic linking (
libllama.so) allows shared updates but adds a deployment dependency.References
Beta Was this translation helpful? Give feedback.
All reactions