Seems to be an issue with group_norm on metal, haven't tried with other backends.
./bin/sd -m models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors -p "a cat" --steps 2 -H 640 -W 640
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Max
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading 'stable-diffusion.cpp/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name: Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB
[INFO ] stable-diffusion.cpp:142 - loading model from 'models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors'
[INFO ] model.cpp:676 - load models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:164 - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:170 - Stable Diffusion weight type: f32
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 469.45 MiB, ( 471.33 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 2155.34 MiB, ( 2626.67 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 94.47 MiB, ( 2721.14 / 21845.34)
[INFO ] stable-diffusion.cpp:306 - total params memory size = 1408.32MB (clip 469.44MB, unet 2155.33MB, vae 94.47MB, controlnet 0.00MB)
[INFO ] stable-diffusion.cpp:310 - loading model from 'models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors' completed, taking 1.03s
[INFO ] stable-diffusion.cpp:327 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1374 - apply_loras completed, taking 0.00s
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 1.41 MiB, ( 2722.55 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 1.41 MiB, ( 2722.55 / 21845.34)
[INFO ] stable-diffusion.cpp:1413 - get_learned_condition completed, taking 69 ms
[INFO ] stable-diffusion.cpp:1429 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1433 - generating image: 1/1 - seed 42
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 1320.61 MiB, ( 3572.30 / 21845.34)
GGML_ASSERT: stable-diffusion.cpp/ggml/src/ggml-metal.m:2034: ne00 % 4 == 0
GGML_ASSERT: stable-diffusion.cpp/ggml/src/ggml-metal.m:2034: ne00 % 4 == 0
zsh: abort ./bin/sd -m -p "a cat" --steps 2 -H 640 -W 640
Seems to be an issue with group_norm on metal, haven't tried with other backends.