Closed #1214 by mistake.
I've been testing FLUX.2-klein in both the 4B and 9B variants and the generation speed is unusually slow (I expect generation to be slower than pytorch but this much seems wrong).
Here's some of the results:
FLUX.2-klein-4B: 1024x1024, 4 steps, cfg 1
ComfyUI: 5s
stable-diffusion.cpp: 25s (tested with Vulkan and ROCm)