fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 by localai-bot · Pull Request #9688 · mudler/LocalAI

localai-bot · 2026-05-06T13:43:02Z

Summary

Two independent CI breakages on master, bundled here because each fix is a one-liner.

rerankers backend tests fail (cpu / cublas12)

requirements-cpu.txt and requirements-cublas12.txt pin torch==2.4.1 but leave transformers unpinned. The latest transformers (5.x) registers a custom op in transformers/integrations/moe.py:

```python
torch.library.custom_op("transformers::grouped_mm_fallback", _grouped_mm_fallback, mutates_args=())
```

`_grouped_mm_fallback`'s signature uses string-typed annotations ('torch.Tensor'). torch 2.4.1's infer_schema does not understand them and raises ValueError: Parameter input has unsupported type torch.Tensor. That import error prevents the gRPC server from starting, so all 5 rerankers tests fail with Connection refused against 127.0.0.1:50051.

Fix: bump torch==2.4.1 → torch==2.7.1. Matches the pin used by the transformers backend in this repo and is the most common torch version across our Python backends. Leaves transformers unpinned so we keep tracking upstream.

vllm-omni build fails on aarch64

vllm-omni's setup.py resolves dependencies dynamically and loads requirements/cuda.txt on cuda hosts, which pins fa3-fwd==0.0.3. fa3-fwd ships only manylinux_2_24_x86_64 wheels and has no source distribution, so on aarch64 (e.g. l4t13 / SBSA cu130) uv fails with:

```
Because fa3-fwd==0.0.3 has no wheels with a matching platform tag (e.g.,
`manylinux_2_39_aarch64`) and vllm-omni==... depends on fa3-fwd==0.0.3,
we can conclude that vllm-omni cannot be used.
```

Fix: in install.sh, after cloning vllm-omni, strip fa3-fwd from requirements/cuda.txt when building on aarch64. This is safe — fa3-fwd is a soft runtime dep:

vllm_omni/diffusion/attention/backends/utils/fa.py wraps from fa3_fwd_interface import ... in try/except ImportError and falls back through FA3 source build → FA2 (flash_attn) → vLLM's wrapper.
vllm_omni/diffusion/attention/backends/flash_attn.py only raises if every FA backend is missing, with a message offering SDPA as a final fallback.

Test plan

CI green on tests-apple / tests-linux for the rerankers backend
aarch64 build (l4t13 / SBSA) reaches the pip install -e . step for vllm-omni without resolver failure

Assisted-by: Claude:claude-opus-4-7-1m [Claude Code]

Two unrelated CI breakages bundled together since both are one-liners: - rerankers: bump torch 2.4.1 -> 2.7.1 on cpu/cublas12. The unpinned transformers resolves to 5.x, whose moe.py registers a custom_op with string-typed `'torch.Tensor'` annotations that torch 2.4.1's infer_schema rejects, blocking the gRPC server from starting and failing all 5 backend tests with "Connection refused" on :50051. Matches the version used by the transformers backend. - vllm-omni: strip fa3-fwd from the upstream requirements/cuda.txt before resolving on aarch64. fa3-fwd 0.0.3 ships only an x86_64 wheel and has no sdist, making the cuda profile unsatisfiable on Jetson/SBSA. fa3-fwd is a soft runtime dep — vllm-omni's attention backends fall back to FA2 then SDPA when it's missing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler merged commit 4e154b5 into master May 6, 2026
55 checks passed

mudler deleted the fix/ci-rerankers-vllm-omni-aarch64 branch May 6, 2026 15:07

localai-bot added the bug Something isn't working label May 9, 2026

BrewTestBot mentioned this pull request May 11, 2026

localai 4.2.0 Homebrew/homebrew-core#282016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64#9688

fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64#9688
mudler merged 1 commit into
masterfrom
fix/ci-rerankers-vllm-omni-aarch64

localai-bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 6, 2026

Summary

rerankers backend tests fail (cpu / cublas12)

vllm-omni build fails on aarch64

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants