Skip to content

fix: chat_template.jinja downloader allow-list and base-model UX warning#134

Merged
inureyes merged 4 commits into
mainfrom
fix/issue-132-chat-template-jinja-downloader-base-model-ux
May 28, 2026
Merged

fix: chat_template.jinja downloader allow-list and base-model UX warning#134
inureyes merged 4 commits into
mainfrom
fix/issue-132-chat-template-jinja-downloader-base-model-ux

Conversation

@inureyes
Copy link
Copy Markdown
Member

Summary

Two stacked root causes that together produced garbage output when running mlxcel run gemma-4-e4b-4bit: the downloader silently dropped chat_template.jinja (so instruction-tuned models had no working template), and the UX warning for base models gave no actionable guidance.

What changed

  • src/downloader/filters.rs — Added has_extension(base, "jinja") to the extension allow-list, immediately after the tiktoken clause. HuggingFace ships chat templates as chat_template.jinja; the previous allow-list covered every other relevant extension but missed this one, so the file was silently filtered out during snapshot download. The is_safe_relative_path guard that runs first is unchanged; no new attack surface is opened.
  • src/downloader/tests.rs — Added assert!(is_wanted_file("chat_template.jinja")) immediately after the existing chat_template.json assertion in allow_list_includes_configs_and_tokenizer_files.
  • src/commands/chat.rs — Replaced the single-line Note: eprintln with an 8-line block that explains the base-model cause, warns that responses will be incoherent, names the -it suffix convention for instruction-tuned Hub variants, and tells the user to pass --no-chat-template for silent raw-text mode or mlxcel generate -p for one-shot completion. The --no-chat-template path remains completely silent.

Side-effect: src/server/chat_template.rs fallback now fires

The loader at src/server/chat_template.rs:80-89 already had a fallback that reads chat_template.jinja from the model directory. That fallback was dead code because the file was never downloaded. Once Sub-task A lands, the fallback starts working automatically for any model that ships its template this way — no changes to that file were needed or made.

Verification

cargo check --lib --tests
# → Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 44s (0 errors)

cargo test --lib downloader::tests
# → running 46 tests
# → test result: ok. 45 passed; 0 failed; 1 ignored (live_download_smoke_test, expected)

cargo clippy --lib --tests -- -D warnings
# → Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 29s (0 warnings)

End-to-end download verification (Sub-task A.3 — actually re-downloading mlx-community/gemma-4-e4b-it-4bit, confirming chat_template.jinja lands in the model store, and confirming mlxcel run produces coherent multi-turn chat) is deferred to the reviewer / orchestrator. It requires network access and a full model download and is out of scope for this automated implementation step. Please run this verification before merging.

Closes #132

inureyes added 2 commits May 28, 2026 22:07
HuggingFace ships chat templates as chat_template.jinja, not inlined in tokenizer_config.json. The extension allow-list covered safetensors, json, tiktoken, model, and txt but not jinja, so the file was silently dropped during snapshot download. The fallback in src/server/chat_template.rs that reads chat_template.jinja was dead code for any model using this convention.

Adds has_extension(base, "jinja") after the tiktoken clause and adds assert!(is_wanted_file("chat_template.jinja")) immediately after the existing chat_template.json assertion in the test suite.
The previous single-line Note was not actionable. Users running a base (non-instruction-tuned) model got incoherent, repetitive output with no explanation of why or what to do next.

The new warning block explains that the model likely has no chat template because it is a base model, states that responses will be incoherent, names the -it suffix convention for instruction-tuned variants on the Hub, and tells the user how to proceed without the warning (--no-chat-template for silent raw-text mode, or mlxcel generate -p for one-shot completion). The --no-chat-template path remains completely silent.
@inureyes inureyes added status:review Under review type:bug Bug fixes, error corrections, or issue resolutions priority:high High priority area:models Model architectures, weights, loading, metadata area:inference Generation, sampling, decoding (incl. speculative, DRY) area:cli Command-line interface / CLI flags labels May 28, 2026
@inureyes
Copy link
Copy Markdown
Member Author

Security & Performance Review — verdict: APPROVE

Targeted review of the .jinja allow-list addition (src/downloader/filters.rs:72-75), the chat.rs warning (src/commands/chat.rs:188-198), and the regression test (src/downloader/tests.rs:173), with a focused look at trust boundaries and the minijinja sandbox.

Watchdog-safe verification

cargo check --lib --tests, cargo clippy --lib --tests -- -D warnings, and cargo test --lib downloader all clean. 107 passed, 1 ignored (live_download_smoke_test, expected), 0 failed.

Findings by severity

CRITICAL — none.

HIGH — none.

MEDIUM — none.

LOW (informational; not auto-fixed per policy):

  1. src/downloader/filters.rs:73 — compound-extension allow widens the dust-bin for spurious jinja files. A hostile repo can ship pytorch_model.bin.jinja, evil.gguf.jinja, etc. and have them written to disk. The deny-list runs before the allow-list but its has_extension checks are suffix-based, so *.bin.jinja is not denied. Impact: disk-quota waste only — the loader at src/server/chat_template.rs:87 only ever opens the exact basename chat_template.jinja, so other *.jinja files are inert. No code execution, no traversal, no symlink escape. The user already opted into this repo by typing its name; the trust model is unchanged.

  2. src/server/chat_template.rs:88 — no size cap on read_to_string(chat_template.jinja). A multi-GB chat template would be loaded fully into memory and then parsed by minijinja, exhausting RAM. Not net-new: chat_template.json/tokenizer_config.json/tokenizer.json already have the same shape (no size cap on the per-file download or on read_to_string at load time). The trust model is "user opted into this repo," so this is consistent with existing posture. If a future hardening pass adds size caps, it should cover all four files uniformly, not just .jinja.

  3. minijinja sandbox surface. The environment is constructed fresh per render with no template loader (Environment::new() default — no set_loader, no add_loader, no include_loader), no fuel limit, no recursion limit. A hostile template could craft a CPU/RAM bomb (deep {% for %} nesting, giant string replace chains). Again, the trust model is user-typed repo id; out of scope to harden here. minijinja 2.20.0 is pinned via Cargo.toml:126 with only the loop_controls feature, which does not add I/O capabilities. The unknown-method callback at src/server/chat_template.rs:616-870 adds string/dict shims only — no filesystem, no network, no exec.

  4. has_extension allocates a String per call (filters.rs:206-209). Adds one more such call per file in the allow-list path due to the new .jinja branch. For ~50-file repos this is sub-microsecond noise and was the pre-existing pattern. Could be replaced with str::ends_with_ignore_ascii_case to drop the allocation; not introduced by this PR.

Items explicitly checked and cleared

  • Path traversal via chat_template.jinja-style names: is_safe_relative_path runs first and rejects absolute paths, .., \, doubled slashes, leading/trailing /, and any non-Component::Normal segment. Unchanged by this PR. Existing regression tests at src/downloader/tests.rs:244-257 pin it.
  • Symlink / Zip Slip / TOCTOU on the write side: download_repo at src/downloader/mod.rs:685-820 canonicalizes the destination once, then canonicalizes every per-file parent and asserts starts_with(canonical_local) before writing. The tempfile is opened with O_CREAT|O_EXCL|O_NOFOLLOW on Unix (open_tempfile_no_symlink, mod.rs:490-520). Allow-list changes do not touch this layer.
  • chat.rs warning string injection: all literals, no model-path interpolation, no terminal control sequences. Clean.
  • Performance of the new branch: one additional O(1) has_extension test per file (allocation cost only). The 8-line eprintln! block fires once at REPL startup, not per turn. Negligible.

Recommendation

APPROVE for merge. Carry the LOW items as future-work suggestions; none rise to a release blocker on their own and none are introduced by this PR.

Two Fixed entries under the new ### Fixed subsection in [Unreleased]:
- chat_template.jinja now included in downloader allow-list (*.jinja extension added to is_wanted_file in filters.rs), with detail on the guard ordering that keeps the attack surface unchanged (#132, PR #134).
- mlxcel run base-model warning upgraded from a one-liner to an actionable multi-line block that names the likely cause, suggests the -it Hub variant, and documents the --no-chat-template / mlxcel generate -p escape hatches (#132, PR #134).
@inureyes
Copy link
Copy Markdown
Member Author

PR Finalization Complete

Summary

  • CHANGELOG.md: Added ### Fixed subsection under ## [Unreleased] with two entries matching the v0.1.0 prose style — one for the *.jinja allow-list fix, one for the actionable base-model warning. Both reference fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader #132 and PR fix: chat_template.jinja downloader allow-list and base-model UX warning #134.
  • debian/changelog: Left untouched per spec — no unreleased stanza pattern exists in that file; it syncs from CHANGELOG.md at release-prep time.
  • docs/ko/: Not touched — directory does not exist yet; reserved for a dedicated translation PR.
  • English docs (installation.md, supported-models.md, adding-models.md, README.md): Scanned for base-model / chat-template references. None of the existing sections are invalidated or improved by this fix; no doc edits made.
  • Tests: Downloader regression test (assert!(is_wanted_file("chat_template.jinja"))) already present at src/downloader/tests.rs:173. Chat warning is inline eprintln! — not refactored to a helper (consistent with pre-existing architecture, LOW finding L2 noted by reviewer as acceptable).
  • Lint/Format: cargo check --lib --tests → 0 errors / 0 warnings (sanity gate; no Rust code changed).

Commit pushed

612f774 docs: add [Unreleased] CHANGELOG entries for PR #134

All checks passing. Ready for merge.

@inureyes inureyes added status:done Completed and removed status:review Under review labels May 28, 2026
@inureyes inureyes merged commit b976d9e into main May 28, 2026
4 checks passed
@inureyes inureyes deleted the fix/issue-132-chat-template-jinja-downloader-base-model-ux branch May 28, 2026 13:22
inureyes added a commit that referenced this pull request May 28, 2026
…136)

Models without a `chat_template` field in `tokenizer_config.json` and no `chat_template.jinja` (typically base / non-instruction-tuned models) previously fell back to bare content-only concatenation in `concat_plaintext`. With no role markers at all the model saw a flat blob of prior turns and tended to collapse into an echo loop — base models are completion models and the most natural continuation of an unstructured prompt is to parrot the user's last line indefinitely. PR #134 (issue #132) addressed this from the avoidance side by warning users that the model is likely a base variant and pointing at the `-it` counterpart; it deliberately left the fallback path itself untouched.

This complements that by improving the fallback path for users who still proceed (intentionally, or with a model that simply has no `-it` variant). The implicit "no template found" path now uses a generic `User: ... Assistant: ...` pseudo-template with a trailing `Assistant:` cue (no newline) that nudges the model to produce an assistant turn next instead of completing its own prompt with another `User:` line.

Three render paths, dispatched in `render_prompt`:

- `--no-chat-template` (explicit user opt-in) keeps the existing raw concatenation. This is the offline `generate --no-chat-template` parallel for completion-style usage and must not change.
- Chat template present: unchanged. Template render failure now falls back to the structured form rather than raw concat, since by then we already know the user is in chat mode.
- No template + not opted out: new `concat_userassistant_fallback`. Labels `user` / `assistant` / `system` explicitly; unknown roles (e.g. `tool`) are preserved verbatim with the same `Role: ` pattern instead of silently merging into the prior turn.

The upstream `processor.is_none()` warning still fires and still names base-model behavior as the underlying cause; only the closing two lines change to describe what the fallback actually does now (`--no-chat-template` becomes the documented escape hatch for raw concat).

Tests in `src/commands/chat_tests.rs`:

- `concat_plaintext_joins_turns_with_newlines` clarified as the `--no-chat-template` path
- `user_assistant_fallback_labels_all_turns_and_cues_assistant` covers the multi-turn render and pins the trailing `Assistant:` cue (no newline)
- `user_assistant_fallback_marks_unknown_roles_instead_of_dropping_them` covers the `tool`-style fallback
- `render_prompt_without_template_uses_user_assistant_fallback` covers the implicit-fallback dispatch
- `render_prompt_no_chat_template_flag_uses_raw_concatenation` pins that the explicit opt-in still gets raw concat

Closes #133
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:cli Command-line interface / CLI flags area:inference Generation, sampling, decoding (incl. speculative, DRY) area:models Model architectures, weights, loading, metadata priority:high High priority status:done Completed type:bug Bug fixes, error corrections, or issue resolutions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader

1 participant