Conversation
…tive) ggml's CMake defaults GGML_NATIVE=ON, which adds -march=native and silently enables every extension the build runner's CPU advertises. Recent Azure ubuntu-latest hosts are Xeon Platinum 8370C (AVX-512 capable), so the master-593 release of libstable-diffusion.so was built with 5,396 zmm / 1,582 k-mask AVX-512 instructions and SIGILLs immediately on any consumer or CI host without AVX-512. Pin the Linux x86_64 cpu and ROCm builds to GGML_NATIVE=OFF + GGML_AVX2=ON + GGML_FMA=ON + GGML_F16C=ON — a portable Haswell-era baseline that downstream consumers (Lemonade, end users) can rely on. The Windows AVX2 cpu build and Windows ROCm builds already use GGML_NATIVE=OFF; this brings Linux into line. Verified by inspecting libstable-diffusion.so: master-569 (no AVX-512): 0 zmm refs, 0 k-mask refs master-593 (AVX-512): 5,396 zmm refs, 1,582 k-mask refs The crash this fixes manifests in lemonade-sdk/lemonade CI as sd-server "exit code: -1" within 200ms on the GitHub-hosted ollama test job, while the self-hosted ROCm runner (full AVX-512) succeeds.
Geramy
added a commit
to Geramy/lemonade
that referenced
this pull request
May 6, 2026
The master-593 release of sd-cpp was built with -march=native on a runner with AVX-512, baking 5,396 zmm + 1,582 k-mask AVX-512 instructions into libstable-diffusion.so. That binary SIGILLed on AVX-512-less GitHub-hosted ubuntu-latest runners — surfaced as Test ollama (ubuntu-latest) failing test_022_generate_image_output with sd-server "exit code: -1" within 200ms. master-596 is the first release after lemonade-sdk/stable-diffusion.cpp#3, which pins the Linux x86_64 cpu and rocm builds to GGML_NATIVE=OFF + AVX2/FMA/F16C. Verified: 0 zmm refs, 0 k-mask refs, 12,720 ymm refs.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Linux x86_64 `ubuntu-latest-cmake` (cpu) and `ubuntu-latest-rocm` jobs both rely on ggml's default `GGML_NATIVE=ON`, which appends `-march=native` and silently enables every CPU extension the build runner advertises. On recent Azure GitHub-hosted runners (Xeon Platinum 8370C with AVX-512), this bakes AVX-512 ops directly into `libstable-diffusion.so`, and the resulting binary SIGILLs at first use on any host without AVX-512.
This PR pins the two Linux jobs to a portable Haswell-era baseline:
The Windows AVX2 cpu build and both Windows ROCm builds already use `GGML_NATIVE=OFF`; this just brings Linux into line.
Why
Concretely: in lemonade-sdk/lemonade PR #1777, the `Test ollama (ubuntu-latest)` job in run 25415331776 crashes with `sd-server process has terminated with exit code: -1` ~200 ms after start. The same binary on the self-hosted `[Linux, rocm]` runner (AMD Ryzen, full AVX-512) generates an image successfully.
Confirmed via objdump on the released `libstable-diffusion.so`:
So master-569 happens to be portable (built on a non-AVX-512 runner of the day), master-593 isn't. Pinning the flags removes the runner-of-the-day lottery.
Side effect
cpu/rocm Linux release artifacts after this PR's next tag will be ~1-2% smaller and ~2-5% slower on AVX-512 hosts. That's the right trade for a binary that has to run on consumer hardware and CI runners.
The Linux cpu `-DSD_HIPBLAS=ON` build still uses CPU-side ggml ops for parts of the pipeline (CLIP encoding, etc.), so the same flag set applies to it.