docs(agents): capture vllm backend lessons + runtime lib packaging by mudler · Pull Request #9333 · mudler/LocalAI

mudler · 2026-04-13T09:08:13Z

New .agents/vllm-backend.md with everything that's easy to get wrong on the vllm/vllm-omni backends:

Use vLLM's native ToolParserManager / ReasoningParserManager — do not write regex-based parsers. Selection is explicit via Options[], defaults live in core/config/parser_defaults.json.
Concrete parsers don't always accept the tools= kwarg the abstract base declares; try/except TypeError is mandatory.
ChatDelta.tool_calls is the contract — Reply.message text alone won't surface tool calls in /v1/chat/completions.
vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu. Newer wheels declare torch==2.10.0+cpu which only exists on the PyTorch test channel and pulls an incompatible torchvision.
SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL symptom + FROM_SOURCE=true escape hatch are documented.
libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C silently fails to register torch ops if they're missing.
backend_hooks system: hooks_llamacpp / hooks_vllm split + the '*' / '' / named-backend keys.
ToProto() must serialize ToolCallID and Reasoning — easy to miss when adding fields to schema.Message.

Also extended .agents/adding-backends.md with a generic 'Bundling runtime shared libraries' section: Dockerfile.python is FROM scratch, package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH, and how to verify packaging without trusting the host (extract image, boot in fresh ubuntu container).

Index in AGENTS.md updated.

Description

This PR fixes #

Notes for Reviewers

Signed commits

Yes, I signed my commits.

New .agents/vllm-backend.md with everything that's easy to get wrong on the vllm/vllm-omni backends: - Use vLLM's native ToolParserManager / ReasoningParserManager — do not write regex-based parsers. Selection is explicit via Options[], defaults live in core/config/parser_defaults.json. - Concrete parsers don't always accept the tools= kwarg the abstract base declares; try/except TypeError is mandatory. - ChatDelta.tool_calls is the contract — Reply.message text alone won't surface tool calls in /v1/chat/completions. - vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu. Newer wheels declare torch==2.10.0+cpu which only exists on the PyTorch test channel and pulls an incompatible torchvision. - SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL symptom + FROM_SOURCE=true escape hatch are documented. - libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C silently fails to register torch ops if they're missing. - backend_hooks system: hooks_llamacpp / hooks_vllm split + the '*' / '' / named-backend keys. - ToProto() must serialize ToolCallID and Reasoning — easy to miss when adding fields to schema.Message. Also extended .agents/adding-backends.md with a generic 'Bundling runtime shared libraries' section: Dockerfile.python is FROM scratch, package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH, and how to verify packaging without trusting the host (extract image, boot in fresh ubuntu container). Index in AGENTS.md updated.

mudler force-pushed the docs/vllm-agents-notes branch from 2e0c4bc to 486a04c Compare April 13, 2026 09:09

mudler merged commit daa0272 into master Apr 13, 2026
16 of 18 checks passed

mudler deleted the docs/vllm-agents-notes branch April 13, 2026 09:09

localai-bot added the kind/documentation Improvements or additions to documentation label May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(agents): capture vllm backend lessons + runtime lib packaging#9333

docs(agents): capture vllm backend lessons + runtime lib packaging#9333
mudler merged 1 commit into
masterfrom
docs/vllm-agents-notes

mudler commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mudler commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants