fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels#9950
Merged
Conversation
The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…*.txt
pyproject.toml only existed because uv pip install -r requirements.txt
doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv.
sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the
file no longer carries any logic the requirements-*.txt path can't.
Replace with the same two-file pattern every other build profile uses:
- requirements-l4t13.txt (accelerate / torch / transformers /
bitsandbytes - matches cublas13's split)
- requirements-l4t13-after.txt (vllm; runs after the base resolve so
the cu130 torch wheel lands first)
install.sh's whole l4t13 elif branch goes away; libbackend.sh's
installRequirements already handles the requirements-install.txt build-
deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the
runProtogen call, so falling through to the standard else: branch
produces identical install behavior with less surface area.
No functional change at install time - same wheels, same order.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…eels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the
[diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist
depends on scikit_build_core without declaring it in build-system.
requires, so under --no-build-isolation uv can't build it from source:
× Failed to build `xatlas==0.0.11`
├─▶ The build backend returned an error
╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1)
ModuleNotFoundError: No module named 'scikit_build_core'
help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12)
depends on `xatlas`
Upstream sglang explicitly gates st_attn and vsa on
`platform_machine != aarch64` inside the same [diffusion] extra but
forgot xatlas - same class of bug that bit the old decord pin.
Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base
sglang.srt symbols (Engine, ServerArgs, FunctionCallParser,
ReasoningParser); the [all] extras are optional accelerators not
required at import time. cublas13 (x86_64) keeps [all] because xatlas
has x86_64 wheels there.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores the L4T13 (JetPack 7 / NVIDIA Thor) backends by retiring the now-broken
pypi.jetson-ai-lab.io/sbsa/cu130mirror in favor of PyPI's official aarch64 + cu130 wheels - per the PyTorch team's April 2026 announcement.Three logical changes, split across three commits:
fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheelsrefactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt[tool.uv.sources]gone,pyproject.tomlis dead weight; replace with the standardrequirements-${profile}.txtpatternfix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheelsThe bug
vllm crashed at import on
cuda13-nvidia-l4t-arm64-vllmimages:That demangles to
c10::MessageLogger::MessageLogger(char const*, int, int, bool)- the torch 2.10 form. Verified bynm -Don the actual wheels frompypi.jetson-ai-lab.io/sbsa/cu130:vllm-0.20.0+cu130-cp312-cp312-linux_aarch64.whl:_C.abi3.soreferences the 4-arg(char const*, int, int, bool)signature (built against torch 2.10).torch-2.10.0-cp312-cp312-linux_aarch64.whl:libc10.soexports that symbol.torch-2.11.0-cp312-cp312-linux_aarch64.whl: replaced it withMessageLogger(c10::SourceLocation, int, bool)- old form gone.All three L4T13 backends pinned their torch / vllm / sglang to that mirror with no version pins and uv picked the latest torch (2.11.0) next to the older-ABI vllm/sglang wheel → undefined symbol at import. vllm-omni hadn't been reported broken yet but carries the exact same anti-pattern.
The fix
PyPI now publishes aarch64 + cu130 manylinux wheels directly:
torch-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl__version__ = '2.11.0+cu130',cuda: '13.0'torchvision-0.26.0-cp312-cp312-manylinux_2_28_aarch64.whltorchaudio-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whlvllm-0.20.0-cp38-abi3-manylinux_2_35_aarch64.whlRequires-Dist: torch==2.11.0,torchvision==0.26.0,torchaudio==2.11.0sglang-0.5.12-cp312-cp312-manylinux_2_34_aarch64.whlRequires-Dist: torch==2.11.0,torchaudio==2.11.0ABI verified end-to-end on the PyPI vllm wheel:
_C.abi3.soreferencesc10::MessageLogger::MessageLogger(c10::SourceLocation, int, bool)andlibc10.soexports that exact symbol.flash-attnis dropped everywhere: PyPI has no aarch64 wheel, but vLLM 0.20+ bundles its ownvllm_flash_attn(fa2 + fa3) inside the main wheel.For sglang, the new
requirements-l4t13-after.txtmatches the cublas13 split (sglang[all]>=0.5.11). The[all]extra was historically dropped on aarch64 because the transitivedecorddep had no aarch64 wheel; sglang has since switched todecord2on aarch64 (which does), so[all]is now safe and unblocks Gemma 4 / MTP recipes on Jetson Thor that were previously blocked by the0.5.1.post2cap on the L4T mirror.Cleanup
pyproject.tomlfor vllm and sglang existed only to host[tool.uv.sources]. With that block gone, those files did nothing the standardrequirements-${profile}.txtflow couldn't. Replaced with the same two-file pattern every other build profile uses; the speciall4t13elif branches in bothinstall.shs vanish entirely.libbackend.sh'sinstallRequirementsalready handles therequirements-install.txtbuild-deps pass, theC_INCLUDE_PATHexport forPORTABLE_PYTHON, andrunProtogen. vllm-omni'sl4t13vllm-install branch collapses into thecublas13branch (both now:pip install vllm --torch-backend=auto).Test plan
-nvidia-l4t-cuda-13-arm64-vllmcompletes (installRequirements resolves torch 2.11.0 + vllm 0.20.0 from PyPI without an extra index).-nvidia-l4t-cuda-13-arm64-sglangcompletes (torch 2.11.0 + sglang 0.5.12,[all]extra resolves withdecord2on aarch64).-nvidia-l4t-cuda-13-arm64-vllm-omnicompletes (PyPI vllm + clone-from-source vllm-omni).vllmimport no longer raisesundefined symbol: _ZN3c1013MessageLoggerC1EPKciibon Jetson Thor.nvidia-thorwithout timing out at startup.