I'm having this issue on Mac when using sd with Flash Attention enabled:
[DEBUG] ggml_extend.hpp:599 - clip compute buffer size: 9.88 MB
[DEBUG] stable-diffusion.cpp:441 - computing condition graph completed, taking 631 ms
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 1570 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 485932809
[DEBUG] ggml_extend.hpp:599 - unet compute buffer size: 728.52 MB
|> | 0/8 - 0.00it/sGGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
GGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
GGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
Abort trap: 6
Here how I build the binary:
mkdir build
cd build
cmake .. -DSD_FLASH_ATTN=ON
cmake --build . --config Release --clean-first
Here a test to reproduce it:
./bin/sd \
-m ../../data/downloaded/models/realvisxlV30Turbo_v30TurboBakedvae.safetensors \
--vae ../../data/downloaded/vae/sdxl_vae.safetensors \
-p "Photo of a girl,cinematic film still,super saiyan, full plate armor" \
-n "ugly, deformed, noisy, blurry, low contrast, text, 3d, cgi, render, fake, anime, open mouth, big forehead, long neck" \
-o ../../data/samples/output/test001_sdxl.png \
--steps 8 --cfg-scale 2.0 -s 1850492235 -v -H 1536 -W 1152
the checkpoint used for it --> https://civitai.com/models/139562/realvisxl-v30-turbo
The result should be something like this:

I'm having this issue on Mac when using
sdwith Flash Attention enabled:Here how I build the binary:
Here a test to reproduce it:
the checkpoint used for it --> https://civitai.com/models/139562/realvisxl-v30-turbo
The result should be something like this:
