Skip to content

security: bound chat-template rendering with minijinja fuel to limit DoS from untrusted model templates #129

@inureyes

Description

@inureyes

Background

mlxcel renders model-supplied chat templates (tokenizer_config.json chat_template / chat_template.jinja) through minijinja in src/server/chat_template.rs. Unlike CVE-2026-5760 (an RCE in a Python framework), mlxcel's minijinja-based path cannot reach code execution (see the companion regression-test issue). The realistic residual risk of rendering an untrusted template is CPU/memory denial-of-service: a deliberately pathological template (e.g. deeply nested {% for i in range(...) %} loops, or expressions building very large strings) can consume unbounded CPU/memory during rendering.

minijinja provides Environment::set_fuel(Some(n)) to cap the number of operations executed per render, returning an error once the budget is exhausted. Today configure_environment() (src/server/chat_template.rs:586) sets no fuel budget, so rendering is unbounded. Rendering happens both at model-load time (the supports_tools probe) and per request.

Threat model / severity

  • Severity: LOW (defense-in-depth) for the current architecture. Template source comes only from (1) the model the operator loads at startup, or (2) the operator's --chat-template flag. HTTP clients cannot supply template source, and the request model field does not trigger loading of arbitrary models — model loading is operator-controlled at startup (src/server/model_worker.rs, src/server/startup.rs). Triggering this therefore requires the operator to deliberately load a malicious third-party model.
  • Severity rises to MEDIUM in a multi-tenant / hosted deployment where untrusted parties could cause arbitrary (e.g. Hugging Face) models to be loaded. This bound is the right preventive control before such a deployment.

Tasks

  • Enable the minijinja fuel cargo feature (add "fuel" to the minijinja features in Cargo.toml, alongside the existing loop_controls). set_fuel is gated behind this feature.
  • In configure_environment(), call env.set_fuel(Some(N)) with a budget large enough never to affect legitimate templates (tool-rich Qwen/Nemotron-style templates with long histories do substantial work — choose a generous bound and document the rationale in a comment).
  • Ensure fuel exhaustion surfaces as a clean rendering error (the render path already returns Result) and maps to a sensible server response — never a panic.
  • Validate no regression against real templates: run the existing audit cargo test --release --lib test_all_local_model_templates_render -- --ignored --nocapture over the local models/ directory and confirm every model still renders within budget.
  • Add a unit test with a pathological template (e.g. a large nested-range loop) asserting it now errors via fuel exhaustion instead of running unboundedly.

Acceptance criteria

  • A pathological template render returns an error promptly instead of consuming unbounded CPU/memory.
  • All locally-available real model templates still render successfully (no false positives), verified via the audit test.
  • The chosen set_fuel budget and its rationale are documented in configure_environment.

References

  • CVE-2026-5760 / CERT VU#915947 (motivating context — note mlxcel is NOT RCE-vulnerable; this is DoS defense-in-depth)
  • minijinja Environment::set_fuel documentation
  • src/server/chat_template.rs:586 (configure_environment)

Companion hardening issue: #128

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions