Skip to content

ci: pin Linux x86_64 cpu+rocm builds to AVX2+FMA+F16C (drop -march=native)#3

Merged
Geramy merged 1 commit intolemonadefrom
geramy/portable-linux-cpu-build
May 6, 2026
Merged

ci: pin Linux x86_64 cpu+rocm builds to AVX2+FMA+F16C (drop -march=native)#3
Geramy merged 1 commit intolemonadefrom
geramy/portable-linux-cpu-build

Conversation

@Geramy
Copy link
Copy Markdown
Member

@Geramy Geramy commented May 6, 2026

What

Linux x86_64 `ubuntu-latest-cmake` (cpu) and `ubuntu-latest-rocm` jobs both rely on ggml's default `GGML_NATIVE=ON`, which appends `-march=native` and silently enables every CPU extension the build runner advertises. On recent Azure GitHub-hosted runners (Xeon Platinum 8370C with AVX-512), this bakes AVX-512 ops directly into `libstable-diffusion.so`, and the resulting binary SIGILLs at first use on any host without AVX-512.

This PR pins the two Linux jobs to a portable Haswell-era baseline:

-DGGML_NATIVE=OFF
-DGGML_AVX2=ON
-DGGML_FMA=ON
-DGGML_F16C=ON

The Windows AVX2 cpu build and both Windows ROCm builds already use `GGML_NATIVE=OFF`; this just brings Linux into line.

Why

Concretely: in lemonade-sdk/lemonade PR #1777, the `Test ollama (ubuntu-latest)` job in run 25415331776 crashes with `sd-server process has terminated with exit code: -1` ~200 ms after start. The same binary on the self-hosted `[Linux, rocm]` runner (AMD Ryzen, full AVX-512) generates an image successfully.

Confirmed via objdump on the released `libstable-diffusion.so`:

Tag zmm refs k-mask refs
master-569-ab6afe8 0 0
master-593-7f65f2a 5,396 1,582

So master-569 happens to be portable (built on a non-AVX-512 runner of the day), master-593 isn't. Pinning the flags removes the runner-of-the-day lottery.

Side effect

cpu/rocm Linux release artifacts after this PR's next tag will be ~1-2% smaller and ~2-5% slower on AVX-512 hosts. That's the right trade for a binary that has to run on consumer hardware and CI runners.

The Linux cpu `-DSD_HIPBLAS=ON` build still uses CPU-side ggml ops for parts of the pipeline (CLIP encoding, etc.), so the same flag set applies to it.

…tive)

ggml's CMake defaults GGML_NATIVE=ON, which adds -march=native and silently
enables every extension the build runner's CPU advertises. Recent Azure
ubuntu-latest hosts are Xeon Platinum 8370C (AVX-512 capable), so the
master-593 release of libstable-diffusion.so was built with 5,396 zmm /
1,582 k-mask AVX-512 instructions and SIGILLs immediately on any consumer
or CI host without AVX-512.

Pin the Linux x86_64 cpu and ROCm builds to GGML_NATIVE=OFF +
GGML_AVX2=ON + GGML_FMA=ON + GGML_F16C=ON — a portable Haswell-era
baseline that downstream consumers (Lemonade, end users) can rely on.

The Windows AVX2 cpu build and Windows ROCm builds already use
GGML_NATIVE=OFF; this brings Linux into line.

Verified by inspecting libstable-diffusion.so:
  master-569 (no AVX-512): 0 zmm refs, 0 k-mask refs
  master-593 (AVX-512):    5,396 zmm refs, 1,582 k-mask refs

The crash this fixes manifests in lemonade-sdk/lemonade CI as
sd-server "exit code: -1" within 200ms on the GitHub-hosted ollama
test job, while the self-hosted ROCm runner (full AVX-512) succeeds.
@Geramy Geramy merged commit b6f38cd into lemonade May 6, 2026
Geramy added a commit to Geramy/lemonade that referenced this pull request May 6, 2026
The master-593 release of sd-cpp was built with -march=native on a runner
with AVX-512, baking 5,396 zmm + 1,582 k-mask AVX-512 instructions into
libstable-diffusion.so. That binary SIGILLed on AVX-512-less GitHub-hosted
ubuntu-latest runners — surfaced as Test ollama (ubuntu-latest) failing
test_022_generate_image_output with sd-server "exit code: -1" within 200ms.

master-596 is the first release after lemonade-sdk/stable-diffusion.cpp#3,
which pins the Linux x86_64 cpu and rocm builds to GGML_NATIVE=OFF +
AVX2/FMA/F16C. Verified: 0 zmm refs, 0 k-mask refs, 12,720 ymm refs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant