[test] chore(turboquant): bump fork pin to rebase/upstream-sync-april-2026 by mudler · Pull Request #9493 · mudler/LocalAI

mudler · 2026-04-22T20:06:46Z

Move the TurboQuant llama.cpp fork pin from feature/turboquant-kv-cache (627ebbc6) to rebase/upstream-sync-april-2026 (7f320bb8), picking up the upstream-sync work on the fork.

Testing TheTom/llama-cpp-turboquant#101

cc @TheTom

Move the TurboQuant llama.cpp fork pin from feature/turboquant-kv-cache (627ebbc6) to rebase/upstream-sync-april-2026 (7f320bb8), picking up the upstream-sync work on the fork. Assisted-by: Claude:claude-opus-4-7

apollosenvy · 2026-04-23T05:51:51Z

Pulled and sanity-checked on AMD / ROCm 7.2.1 (7900 XTX, gfx1100, wave32). HIP path looks good on the new pin.

Build

cmake -B build-hip -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release
cmake --build build-hip -j 12

Clean build, 0 errors, only a benign -Wmissing-prototypes in ggml-turbo-quant.c. The three fattn-vec-instance-f16-turbo{2,3,4}_0.cu template instances added in b8b1d49b3 link in correctly, so no more undefined references on ROCm.

Smoke tests (Qwen3-8B-Q4_K_M, -fa on -ngl 99)

K type	V type	pp t/s	tg t/s	VRAM used	Output
q8_0	turbo4	223.3	89.9	~12.2 GiB	coherent
f16	turbo3	228.8	90.9	~11.9 GiB	coherent

Both ran to the -n limit with sensible completions. No OOM, no crash, no hipError, no NaNs.

What this covers from PR #101's community-testing checklist

AMD HIP build with variadic shfl macros
OOM fix (d7b533446 / 4d754604e) validated: turbo3 + turbo4 V-cache both fit and decode on 24 GiB
F16-K + TURBO-V dispatch case (58bbe5518) exercised cleanly
turbo V unpad gate on V type (156592051) behaves correctly with turbo V + f16 K

Not covered here: gfx1200 (RX 9060 XT), head_dim > 256 (Gemma 4 full-attention), multi-GPU, prefill-heavy workloads against long contexts.

CI note

The only pull_request failure on this PR is backend-jobs (cublas, 13, ...) which timed out at 6 h inside the webui cp -rT rename into tools/grpc-server/ (GitHub Actions wall-clock ceiling, not a build error). Unrelated to the pin itself. The fork's own llama.cpp CI matrix (android-arm64 / ggml-ci-* / etc.) is failing upstream-wide, not specific to this rebase.

LGTM for the HIP slice. Ship it.

cc @TheTom @mudler

TheTom · 2026-04-23T14:42:38Z

Thank you!

mudler · 2026-04-25T12:04:32Z

Closing ad this seems now fixed and merged already in #9497 . Thanks for your support!

chore(turboquant): bump fork pin to rebase/upstream-sync-april-2026

5f7a0c3

Move the TurboQuant llama.cpp fork pin from feature/turboquant-kv-cache (627ebbc6) to rebase/upstream-sync-april-2026 (7f320bb8), picking up the upstream-sync work on the fork. Assisted-by: Claude:claude-opus-4-7

mudler closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[test] chore(turboquant): bump fork pin to rebase/upstream-sync-april-2026#9493

[test] chore(turboquant): bump fork pin to rebase/upstream-sync-april-2026#9493
mudler wants to merge 1 commit intomasterfrom
bump/turboquant-upstream-sync-april-2026

mudler commented Apr 22, 2026 •

edited

Loading

Uh oh!

apollosenvy commented Apr 23, 2026

Uh oh!

TheTom commented Apr 23, 2026

Uh oh!

mudler commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mudler commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apollosenvy commented Apr 23, 2026

Build

Smoke tests (Qwen3-8B-Q4_K_M, -fa on -ngl 99)

What this covers from PR #101's community-testing checklist

CI note

Uh oh!

TheTom commented Apr 23, 2026

Uh oh!

mudler commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mudler commented Apr 22, 2026 •

edited

Loading