Skip to content

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels#9950

Merged
mudler merged 4 commits into
masterfrom
worktree-fix-vllm-l4t13-torch-abi
May 22, 2026
Merged

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels#9950
mudler merged 4 commits into
masterfrom
worktree-fix-vllm-l4t13-torch-abi

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

@localai-bot localai-bot commented May 22, 2026

Summary

Restores the L4T13 (JetPack 7 / NVIDIA Thor) backends by retiring the now-broken pypi.jetson-ai-lab.io/sbsa/cu130 mirror in favor of PyPI's official aarch64 + cu130 wheels - per the PyTorch team's April 2026 announcement.

Three logical changes, split across three commits:

# Commit What
1 fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The original ABI-mismatch fix (vllm backend)
2 refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt Mechanical follow-up - with [tool.uv.sources] gone, pyproject.toml is dead weight; replace with the standard requirements-${profile}.txt pattern
3 fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause + fix applied to the other two L4T13 backends

The bug

vllm crashed at import on cuda13-nvidia-l4t-arm64-vllm images:

ImportError: /backends/cuda13-nvidia-l4t-arm64-vllm/venv/lib/python3.12/site-packages/vllm/_C.abi3.so:
undefined symbol: _ZN3c1013MessageLoggerC1EPKciib

That demangles to c10::MessageLogger::MessageLogger(char const*, int, int, bool) - the torch 2.10 form. Verified by nm -D on the actual wheels from pypi.jetson-ai-lab.io/sbsa/cu130:

  • vllm-0.20.0+cu130-cp312-cp312-linux_aarch64.whl: _C.abi3.so references the 4-arg (char const*, int, int, bool) signature (built against torch 2.10).
  • torch-2.10.0-cp312-cp312-linux_aarch64.whl: libc10.so exports that symbol.
  • torch-2.11.0-cp312-cp312-linux_aarch64.whl: replaced it with MessageLogger(c10::SourceLocation, int, bool) - old form gone.

All three L4T13 backends pinned their torch / vllm / sglang to that mirror with no version pins and uv picked the latest torch (2.11.0) next to the older-ABI vllm/sglang wheel → undefined symbol at import. vllm-omni hadn't been reported broken yet but carries the exact same anti-pattern.

The fix

PyPI now publishes aarch64 + cu130 manylinux wheels directly:

package PyPI wheel notes
torch torch-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl __version__ = '2.11.0+cu130', cuda: '13.0'
torchvision torchvision-0.26.0-cp312-cp312-manylinux_2_28_aarch64.whl
torchaudio torchaudio-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl
vllm vllm-0.20.0-cp38-abi3-manylinux_2_35_aarch64.whl Requires-Dist: torch==2.11.0, torchvision==0.26.0, torchaudio==2.11.0
sglang sglang-0.5.12-cp312-cp312-manylinux_2_34_aarch64.whl Requires-Dist: torch==2.11.0, torchaudio==2.11.0

ABI verified end-to-end on the PyPI vllm wheel: _C.abi3.so references c10::MessageLogger::MessageLogger(c10::SourceLocation, int, bool) and libc10.so exports that exact symbol.

flash-attn is dropped everywhere: PyPI has no aarch64 wheel, but vLLM 0.20+ bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel.

For sglang, the new requirements-l4t13-after.txt matches the cublas13 split (sglang[all]>=0.5.11). The [all] extra was historically dropped on aarch64 because the transitive decord dep had no aarch64 wheel; sglang has since switched to decord2 on aarch64 (which does), so [all] is now safe and unblocks Gemma 4 / MTP recipes on Jetson Thor that were previously blocked by the 0.5.1.post2 cap on the L4T mirror.

Cleanup

pyproject.toml for vllm and sglang existed only to host [tool.uv.sources]. With that block gone, those files did nothing the standard requirements-${profile}.txt flow couldn't. Replaced with the same two-file pattern every other build profile uses; the special l4t13 elif branches in both install.shs vanish entirely. libbackend.sh's installRequirements already handles the requirements-install.txt build-deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and runProtogen. vllm-omni's l4t13 vllm-install branch collapses into the cublas13 branch (both now: pip install vllm --torch-backend=auto).

Test plan

  • CI image build for -nvidia-l4t-cuda-13-arm64-vllm completes (installRequirements resolves torch 2.11.0 + vllm 0.20.0 from PyPI without an extra index).
  • CI image build for -nvidia-l4t-cuda-13-arm64-sglang completes (torch 2.11.0 + sglang 0.5.12, [all] extra resolves with decord2 on aarch64).
  • CI image build for -nvidia-l4t-cuda-13-arm64-vllm-omni completes (PyPI vllm + clone-from-source vllm-omni).
  • Confirm vllm import no longer raises undefined symbol: _ZN3c1013MessageLoggerC1EPKciib on Jetson Thor.
  • Confirm the backend starts via NATS install on nvidia-thor without timing out at startup.
  • Smoke-test a small chat completion against vllm + sglang backends on Thor.

mudler added 3 commits May 22, 2026 19:32
The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from
pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no
version pins. That mirror started shipping torch 2.11.0 next to a
vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10
ABI, so uv landed on the mismatched pair and vllm crashed at import:

  ImportError: vllm/_C.abi3.so: undefined symbol:
  _ZN3c1013MessageLoggerC1EPKciib

(c10::MessageLogger's constructor signature changed between torch 2.10 and
2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so
exported only the 2.11 form.)

Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130
manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires-
Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That
makes uv's resolver produce an ABI-consistent set automatically, so the
mirror and the [tool.uv.sources] pinning are no longer needed.

flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but
vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the
main wheel, so the Dao-AILab package isn't required at runtime.

Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…*.txt

pyproject.toml only existed because uv pip install -r requirements.txt
doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv.
sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the
file no longer carries any logic the requirements-*.txt path can't.

Replace with the same two-file pattern every other build profile uses:

  - requirements-l4t13.txt       (accelerate / torch / transformers /
                                  bitsandbytes - matches cublas13's split)
  - requirements-l4t13-after.txt (vllm; runs after the base resolve so
                                  the cu130 torch wheel lands first)

install.sh's whole l4t13 elif branch goes away; libbackend.sh's
installRequirements already handles the requirements-install.txt build-
deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the
runProtogen call, so falling through to the standard else: branch
produces identical install behavior with less surface area.

No functional change at install time - same wheels, same order.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…eels

Same root cause and same fix as the vllm backend in the previous commits:
the L4T13 sglang and vllm-omni backends both pulled their accelerator
stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version
pins, so they would silently land on the same torch 2.11 vs cu130-built
wheel ABI mismatch the moment the mirror published an out-of-sync pair.

sglang
------

- Drop pyproject.toml + [tool.uv.sources]. The historical comment said
  the [all] extra was unsafe on aarch64 because of decord, but sglang
  0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312
  aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin
  and stop being capped at the 0.5.1.post2 the L4T mirror shipped.
  That unblocks Gemma 4 / MTP recipes on Jetson Thor.
- New requirements-l4t13.txt mirrors the cublas13 split (accelerate /
  torch / torchvision / torchaudio / transformers), requirements-l4t13-
  after.txt carries sglang[all]>=0.5.11.
- install.sh's l4t13 elif branch goes away; falls through to the
  standard installRequirements path.

vllm-omni
---------

- requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and
  drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its
  own vllm_flash_attn fa2 + fa3 internally).
- install.sh's l4t13 vllm-install branch collapses into the cublas13
  branch since both now just run `pip install vllm --torch-backend=auto`
  against PyPI.
- --index-strategy=unsafe-best-match is dropped from the top-level
  l4t13 guard; without the L4T mirror in the picture it had no purpose.

The from-source vllm-omni install on top still keeps its existing
`sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround -
fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn.

Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@localai-bot localai-bot changed the title fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels May 22, 2026
CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the
[diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist
depends on scikit_build_core without declaring it in build-system.
requires, so under --no-build-isolation uv can't build it from source:

    × Failed to build `xatlas==0.0.11`
    ├─▶ The build backend returned an error
    ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1)
        ModuleNotFoundError: No module named 'scikit_build_core'
    help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12)
          depends on `xatlas`

Upstream sglang explicitly gates st_attn and vsa on
`platform_machine != aarch64` inside the same [diffusion] extra but
forgot xatlas - same class of bug that bit the old decord pin.

Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base
sglang.srt symbols (Engine, ServerArgs, FunctionCallParser,
ReasoningParser); the [all] extras are optional accelerators not
required at import time. cublas13 (x86_64) keeps [all] because xatlas
has x86_64 wheels there.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 5cda4f1 into master May 22, 2026
70 checks passed
@mudler mudler deleted the worktree-fix-vllm-l4t13-torch-abi branch May 22, 2026 21:01
@localai-bot localai-bot added the bug Something isn't working label May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants