fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels by localai-bot · Pull Request #9950 · mudler/LocalAI

localai-bot · 2026-05-22T19:37:29Z

Summary

Restores the L4T13 (JetPack 7 / NVIDIA Thor) backends by retiring the now-broken pypi.jetson-ai-lab.io/sbsa/cu130 mirror in favor of PyPI's official aarch64 + cu130 wheels - per the PyTorch team's April 2026 announcement.

Three logical changes, split across three commits:

#	Commit	What
1	`fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels`	The original ABI-mismatch fix (vllm backend)
2	`refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt`	Mechanical follow-up - with `[tool.uv.sources]` gone, `pyproject.toml` is dead weight; replace with the standard `requirements-${profile}.txt` pattern
3	`fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels`	Same root cause + fix applied to the other two L4T13 backends

The bug

vllm crashed at import on cuda13-nvidia-l4t-arm64-vllm images:

ImportError: /backends/cuda13-nvidia-l4t-arm64-vllm/venv/lib/python3.12/site-packages/vllm/_C.abi3.so:
undefined symbol: _ZN3c1013MessageLoggerC1EPKciib

That demangles to c10::MessageLogger::MessageLogger(char const*, int, int, bool) - the torch 2.10 form. Verified by nm -D on the actual wheels from pypi.jetson-ai-lab.io/sbsa/cu130:

vllm-0.20.0+cu130-cp312-cp312-linux_aarch64.whl: _C.abi3.so references the 4-arg (char const*, int, int, bool) signature (built against torch 2.10).
torch-2.10.0-cp312-cp312-linux_aarch64.whl: libc10.so exports that symbol.
torch-2.11.0-cp312-cp312-linux_aarch64.whl: replaced it with MessageLogger(c10::SourceLocation, int, bool) - old form gone.

All three L4T13 backends pinned their torch / vllm / sglang to that mirror with no version pins and uv picked the latest torch (2.11.0) next to the older-ABI vllm/sglang wheel → undefined symbol at import. vllm-omni hadn't been reported broken yet but carries the exact same anti-pattern.

The fix

PyPI now publishes aarch64 + cu130 manylinux wheels directly:

package	PyPI wheel	notes
torch	`torch-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl`	`__version__ = '2.11.0+cu130'`, `cuda: '13.0'`
torchvision	`torchvision-0.26.0-cp312-cp312-manylinux_2_28_aarch64.whl`
torchaudio	`torchaudio-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl`
vllm	`vllm-0.20.0-cp38-abi3-manylinux_2_35_aarch64.whl`	`Requires-Dist: torch==2.11.0`, `torchvision==0.26.0`, `torchaudio==2.11.0`
sglang	`sglang-0.5.12-cp312-cp312-manylinux_2_34_aarch64.whl`	`Requires-Dist: torch==2.11.0`, `torchaudio==2.11.0`

ABI verified end-to-end on the PyPI vllm wheel: _C.abi3.so references c10::MessageLogger::MessageLogger(c10::SourceLocation, int, bool) and libc10.so exports that exact symbol.

flash-attn is dropped everywhere: PyPI has no aarch64 wheel, but vLLM 0.20+ bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel.

For sglang, the new requirements-l4t13-after.txt matches the cublas13 split (sglang[all]>=0.5.11). The [all] extra was historically dropped on aarch64 because the transitive decord dep had no aarch64 wheel; sglang has since switched to decord2 on aarch64 (which does), so [all] is now safe and unblocks Gemma 4 / MTP recipes on Jetson Thor that were previously blocked by the 0.5.1.post2 cap on the L4T mirror.

Cleanup

pyproject.toml for vllm and sglang existed only to host [tool.uv.sources]. With that block gone, those files did nothing the standard requirements-${profile}.txt flow couldn't. Replaced with the same two-file pattern every other build profile uses; the special l4t13 elif branches in both install.shs vanish entirely. libbackend.sh's installRequirements already handles the requirements-install.txt build-deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and runProtogen. vllm-omni's l4t13 vllm-install branch collapses into the cublas13 branch (both now: pip install vllm --torch-backend=auto).

Test plan

CI image build for -nvidia-l4t-cuda-13-arm64-vllm completes (installRequirements resolves torch 2.11.0 + vllm 0.20.0 from PyPI without an extra index).
CI image build for -nvidia-l4t-cuda-13-arm64-sglang completes (torch 2.11.0 + sglang 0.5.12, [all] extra resolves with decord2 on aarch64).
CI image build for -nvidia-l4t-cuda-13-arm64-vllm-omni completes (PyPI vllm + clone-from-source vllm-omni).
Confirm vllm import no longer raises undefined symbol: _ZN3c1013MessageLoggerC1EPKciib on Jetson Thor.
Confirm the backend starts via NATS install on nvidia-thor without timing out at startup.
Smoke-test a small chat completion against vllm + sglang backends on Thor.

The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…*.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-*.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…eels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 3 commits May 22, 2026 19:32

localai-bot changed the title ~~fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels~~ fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels May 22, 2026

mudler merged commit 5cda4f1 into master May 22, 2026
70 checks passed

mudler deleted the worktree-fix-vllm-l4t13-torch-abi branch May 22, 2026 21:01

localai-bot added the bug Something isn't working label May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels#9950

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels#9950
mudler merged 4 commits into
masterfrom
worktree-fix-vllm-l4t13-torch-abi

localai-bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The bug

The fix

Cleanup

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

localai-bot commented May 22, 2026 •

edited

Loading