-
Notifications
You must be signed in to change notification settings - Fork 467
Description
Detailed Description
I managed to get the quantized Z-ControlNet (Z-Image Turbo) loading on the Vulkan backend by manually patching control.hpp. I'm sharing the tensor geometry I discovered in case it helps implement proper dynamic support:
Architecture: It uses 2-1 Embedder mapping (64 -> 3840).
Modulation: It uses the Z-Image 4x scheme (Input 256 -> Output 15360), not the standard Flux 6x.
Layers: 6 Control, 2 Refiner.
Bias: Must be explicitly disabled (false) for all Linears or it fails to find tensors.
Alternatives you considered
-
Using the Standard
VERSION_FLUXLoader:
I initially attempted to map the tensors to the existing Flux definition. This failed because standard Flux ControlNets expect a Modulation/AdaLN layer output of6 * hidden_size(23040), whereas this Z-Image variant uses a compressed4 * hidden_size(15360) scheme. -
Using Img2Img Only:
I considered bypassing ControlNet and relying solely on high-strength Image-to-Image. While this worked for general composition, it lacked the precise structural guidance required, necessitating a working ControlNet implementation. -
Python/Torch Inference (ComfyUI/Forge):
I evaluated running the.safetensorsversion via Python backends. However, given my hardware constraints (AMD 5700 XT, 8GB VRAM), the overhead was too high. Thestable-diffusion.cppVulkan backend was the only viable path for acceptable inference speeds with this specific architecture.
Additional context
This works on an AMD 5700 XT (8GB) with Vulkan.
Update
I managed to load it by patching control.hpp, but the inference crashes or OOMs on 8GB cards because the Vulkan backend seems to dequantize the ControlNet to FP16, blowing up VRAM. We need a way to keep ControlNet quantized during inference.