Skip to content

[Bug] AVX512 false detection #1343

@AdeoNeriola

Description

@AdeoNeriola

Git commit

d6dd6d7

Operating System & Version

Windows 10 22H2

GGML backends

CUDA

Command-line arguments used

.\sd-cli.exe --diffusion-model "W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf" --vae "W:\Z-Image-Turbo\ae.safetensors" --llm "W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf" -H 1280 -W 960 --cfg-scale 1.0 --steps 10 --diffusion-fa --offload-to-cpu -p "fantasy forest" -o "./o2.png"

Steps to reproduce

[DEBUG] main.cpp:516 - System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |

What you expected to happen

Picture generation with CPU model offloading

What actually happened

Program crashed with exception code: 0xc000001d (which is 'Illegal Instruction')

Logs / error messages / stack trace

[DEBUG] stable-diffusion.cpp:173 - Using CUDA backend
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:78 - Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:267 - loading diffusion model from 'W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf'
[INFO ] model.cpp:366 - load W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf using gguf format
[DEBUG] model.cpp:412 - init from 'W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf'
[INFO ] stable-diffusion.cpp:314 - loading llm from 'W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf'
[INFO ] model.cpp:366 - load W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf using gguf format
[DEBUG] model.cpp:412 - init from 'W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf'
[INFO ] stable-diffusion.cpp:328 - loading vae from 'W:\Z-Image-Turbo\ae.safetensors'
[INFO ] model.cpp:369 - load W:\Z-Image-Turbo\ae.safetensors using safetensors format
[DEBUG] model.cpp:503 - init from 'W:\Z-Image-Turbo\ae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:345 - Version: Z-Image
[INFO ] stable-diffusion.cpp:373 - Weight type stat: f32: 634 | q6_K: 433 | bf16: 28
[INFO ] stable-diffusion.cpp:374 - Conditioner weight type stat: f32: 145 | q6_K: 253
[INFO ] stable-diffusion.cpp:375 - Diffusion model weight type stat: f32: 245 | q6_K: 180 | bf16: 28
[INFO ] stable-diffusion.cpp:376 - VAE weight type stat: f32: 244
[DEBUG] stable-diffusion.cpp:378 - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:286 - merges size 151387
[DEBUG] llm.hpp:318 - vocab size: 151669
PS W:\sd-master-d6dd6d7-bin-win-cuda12-x64>

Additional context / environment details

The same crash happens with any CPU related parameters e.g. --clip-on-cpu
My CPU is Xeon E5-2666 v3 (supports only AVX and AVX2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions