[Bug] AVX512 false detection

### Git commit

d6dd6d7b555c233bb9bc9f20b4751eb8c9269743

### Operating System & Version

Windows 10 22H2

### GGML backends

CUDA

### Command-line arguments used

.\sd-cli.exe --diffusion-model "W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf" --vae "W:\Z-Image-Turbo\ae.safetensors" --llm "W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf" -H 1280 -W 960 --cfg-scale 1.0 --steps 10 --diffusion-fa --offload-to-cpu -p "fantasy forest" -o "./o2.png"

### Steps to reproduce

[DEBUG] main.cpp:516  - System Info:
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 1 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 |

### What you expected to happen

Picture generation with CPU model offloading

### What actually happened

Program crashed with exception code: 0xc000001d (which is 'Illegal Instruction')

### Logs / error messages / stack trace

[DEBUG] stable-diffusion.cpp:173  - Using CUDA backend
[INFO ] ggml_extend.hpp:78   - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:78   -   Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:267  - loading diffusion model from 'W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf'
[INFO ] model.cpp:366  - load W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf using gguf format
[DEBUG] model.cpp:412  - init from 'W:\Z-Image-Turbo\z_image_turbo-Q6_K.gguf'
[INFO ] stable-diffusion.cpp:314  - loading llm from 'W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf'
[INFO ] model.cpp:366  - load W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf using gguf format
[DEBUG] model.cpp:412  - init from 'W:\Z-Image-Turbo\Qwen3-4B-Instruct-2507-Q6_K.gguf'
[INFO ] stable-diffusion.cpp:328  - loading vae from 'W:\Z-Image-Turbo\ae.safetensors'
[INFO ] model.cpp:369  - load W:\Z-Image-Turbo\ae.safetensors using safetensors format
[DEBUG] model.cpp:503  - init from 'W:\Z-Image-Turbo\ae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:345  - Version: Z-Image
[INFO ] stable-diffusion.cpp:373  - Weight type stat:                      f32: 634  |    q6_K: 433  |    bf16: 28
[INFO ] stable-diffusion.cpp:374  - Conditioner weight type stat:          f32: 145  |    q6_K: 253
[INFO ] stable-diffusion.cpp:375  - Diffusion model weight type stat:      f32: 245  |    q6_K: 180  |    bf16: 28
[INFO ] stable-diffusion.cpp:376  - VAE weight type stat:                  f32: 244
[DEBUG] stable-diffusion.cpp:378  - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:286  - merges size 151387
[DEBUG] llm.hpp:318  - vocab size: 151669
PS W:\sd-master-d6dd6d7-bin-win-cuda12-x64>

### Additional context / environment details

The same crash happens with any CPU related parameters e.g. --clip-on-cpu
My CPU is Xeon E5-2666 v3 (supports only AVX and AVX2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] AVX512 false detection #1343

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] AVX512 false detection #1343

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions