Skip to content

docs(agents): capture vllm backend lessons + runtime lib packaging#9333

Merged
mudler merged 1 commit into
masterfrom
docs/vllm-agents-notes
Apr 13, 2026
Merged

docs(agents): capture vllm backend lessons + runtime lib packaging#9333
mudler merged 1 commit into
masterfrom
docs/vllm-agents-notes

Conversation

@mudler
Copy link
Copy Markdown
Owner

@mudler mudler commented Apr 13, 2026

New .agents/vllm-backend.md with everything that's easy to get wrong on the vllm/vllm-omni backends:

  • Use vLLM's native ToolParserManager / ReasoningParserManager — do not write regex-based parsers. Selection is explicit via Options[], defaults live in core/config/parser_defaults.json.
  • Concrete parsers don't always accept the tools= kwarg the abstract base declares; try/except TypeError is mandatory.
  • ChatDelta.tool_calls is the contract — Reply.message text alone won't surface tool calls in /v1/chat/completions.
  • vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu. Newer wheels declare torch==2.10.0+cpu which only exists on the PyTorch test channel and pulls an incompatible torchvision.
  • SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL symptom + FROM_SOURCE=true escape hatch are documented.
  • libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C silently fails to register torch ops if they're missing.
  • backend_hooks system: hooks_llamacpp / hooks_vllm split + the '*' / '' / named-backend keys.
  • ToProto() must serialize ToolCallID and Reasoning — easy to miss when adding fields to schema.Message.

Also extended .agents/adding-backends.md with a generic 'Bundling runtime shared libraries' section: Dockerfile.python is FROM scratch, package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH, and how to verify packaging without trusting the host (extract image, boot in fresh ubuntu container).

Index in AGENTS.md updated.

Description

This PR fixes #

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:

- Use vLLM's native ToolParserManager / ReasoningParserManager — do
  not write regex-based parsers. Selection is explicit via Options[],
  defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
  base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
  won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
  Newer wheels declare torch==2.10.0+cpu which only exists on the
  PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
  symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
  silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
  '*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
  when adding fields to schema.Message.

Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).

Index in AGENTS.md updated.
@mudler mudler force-pushed the docs/vllm-agents-notes branch from 2e0c4bc to 486a04c Compare April 13, 2026 09:09
@mudler mudler merged commit daa0272 into master Apr 13, 2026
16 of 18 checks passed
@mudler mudler deleted the docs/vllm-agents-notes branch April 13, 2026 09:09
@localai-bot localai-bot added the kind/documentation Improvements or additions to documentation label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants