fix: chat_template.jinja downloader allow-list and base-model UX warning by inureyes · Pull Request #134 · lablup/mlxcel

inureyes · 2026-05-28T13:07:54Z

Summary

Two stacked root causes that together produced garbage output when running mlxcel run gemma-4-e4b-4bit: the downloader silently dropped chat_template.jinja (so instruction-tuned models had no working template), and the UX warning for base models gave no actionable guidance.

What changed

src/downloader/filters.rs — Added has_extension(base, "jinja") to the extension allow-list, immediately after the tiktoken clause. HuggingFace ships chat templates as chat_template.jinja; the previous allow-list covered every other relevant extension but missed this one, so the file was silently filtered out during snapshot download. The is_safe_relative_path guard that runs first is unchanged; no new attack surface is opened.
src/downloader/tests.rs — Added assert!(is_wanted_file("chat_template.jinja")) immediately after the existing chat_template.json assertion in allow_list_includes_configs_and_tokenizer_files.
src/commands/chat.rs — Replaced the single-line Note: eprintln with an 8-line block that explains the base-model cause, warns that responses will be incoherent, names the -it suffix convention for instruction-tuned Hub variants, and tells the user to pass --no-chat-template for silent raw-text mode or mlxcel generate -p for one-shot completion. The --no-chat-template path remains completely silent.

Side-effect: `src/server/chat_template.rs` fallback now fires

The loader at src/server/chat_template.rs:80-89 already had a fallback that reads chat_template.jinja from the model directory. That fallback was dead code because the file was never downloaded. Once Sub-task A lands, the fallback starts working automatically for any model that ships its template this way — no changes to that file were needed or made.

Verification

cargo check --lib --tests
# → Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 44s (0 errors)

cargo test --lib downloader::tests
# → running 46 tests
# → test result: ok. 45 passed; 0 failed; 1 ignored (live_download_smoke_test, expected)

cargo clippy --lib --tests -- -D warnings
# → Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 29s (0 warnings)

End-to-end download verification (Sub-task A.3 — actually re-downloading mlx-community/gemma-4-e4b-it-4bit, confirming chat_template.jinja lands in the model store, and confirming mlxcel run produces coherent multi-turn chat) is deferred to the reviewer / orchestrator. It requires network access and a full model download and is out of scope for this automated implementation step. Please run this verification before merging.

Closes #132

HuggingFace ships chat templates as chat_template.jinja, not inlined in tokenizer_config.json. The extension allow-list covered safetensors, json, tiktoken, model, and txt but not jinja, so the file was silently dropped during snapshot download. The fallback in src/server/chat_template.rs that reads chat_template.jinja was dead code for any model using this convention. Adds has_extension(base, "jinja") after the tiktoken clause and adds assert!(is_wanted_file("chat_template.jinja")) immediately after the existing chat_template.json assertion in the test suite.

The previous single-line Note was not actionable. Users running a base (non-instruction-tuned) model got incoherent, repetitive output with no explanation of why or what to do next. The new warning block explains that the model likely has no chat template because it is a base model, states that responses will be incoherent, names the -it suffix convention for instruction-tuned variants on the Hub, and tells the user how to proceed without the warning (--no-chat-template for silent raw-text mode, or mlxcel generate -p for one-shot completion). The --no-chat-template path remains completely silent.

inureyes · 2026-05-28T13:16:09Z

Security & Performance Review — verdict: APPROVE

Targeted review of the .jinja allow-list addition (src/downloader/filters.rs:72-75), the chat.rs warning (src/commands/chat.rs:188-198), and the regression test (src/downloader/tests.rs:173), with a focused look at trust boundaries and the minijinja sandbox.

Watchdog-safe verification

cargo check --lib --tests, cargo clippy --lib --tests -- -D warnings, and cargo test --lib downloader all clean. 107 passed, 1 ignored (live_download_smoke_test, expected), 0 failed.

Findings by severity

CRITICAL — none.

HIGH — none.

MEDIUM — none.

LOW (informational; not auto-fixed per policy):

src/downloader/filters.rs:73 — compound-extension allow widens the dust-bin for spurious jinja files. A hostile repo can ship pytorch_model.bin.jinja, evil.gguf.jinja, etc. and have them written to disk. The deny-list runs before the allow-list but its has_extension checks are suffix-based, so *.bin.jinja is not denied. Impact: disk-quota waste only — the loader at src/server/chat_template.rs:87 only ever opens the exact basename chat_template.jinja, so other *.jinja files are inert. No code execution, no traversal, no symlink escape. The user already opted into this repo by typing its name; the trust model is unchanged.
src/server/chat_template.rs:88 — no size cap on read_to_string(chat_template.jinja). A multi-GB chat template would be loaded fully into memory and then parsed by minijinja, exhausting RAM. Not net-new: chat_template.json/tokenizer_config.json/tokenizer.json already have the same shape (no size cap on the per-file download or on read_to_string at load time). The trust model is "user opted into this repo," so this is consistent with existing posture. If a future hardening pass adds size caps, it should cover all four files uniformly, not just .jinja.
minijinja sandbox surface. The environment is constructed fresh per render with no template loader (Environment::new() default — no set_loader, no add_loader, no include_loader), no fuel limit, no recursion limit. A hostile template could craft a CPU/RAM bomb (deep {% for %} nesting, giant string replace chains). Again, the trust model is user-typed repo id; out of scope to harden here. minijinja 2.20.0 is pinned via Cargo.toml:126 with only the loop_controls feature, which does not add I/O capabilities. The unknown-method callback at src/server/chat_template.rs:616-870 adds string/dict shims only — no filesystem, no network, no exec.
has_extension allocates a String per call (filters.rs:206-209). Adds one more such call per file in the allow-list path due to the new .jinja branch. For ~50-file repos this is sub-microsecond noise and was the pre-existing pattern. Could be replaced with str::ends_with_ignore_ascii_case to drop the allocation; not introduced by this PR.

Items explicitly checked and cleared

Path traversal via chat_template.jinja-style names: is_safe_relative_path runs first and rejects absolute paths, .., \, doubled slashes, leading/trailing /, and any non-Component::Normal segment. Unchanged by this PR. Existing regression tests at src/downloader/tests.rs:244-257 pin it.
Symlink / Zip Slip / TOCTOU on the write side: download_repo at src/downloader/mod.rs:685-820 canonicalizes the destination once, then canonicalizes every per-file parent and asserts starts_with(canonical_local) before writing. The tempfile is opened with O_CREAT|O_EXCL|O_NOFOLLOW on Unix (open_tempfile_no_symlink, mod.rs:490-520). Allow-list changes do not touch this layer.
chat.rs warning string injection: all literals, no model-path interpolation, no terminal control sequences. Clean.
Performance of the new branch: one additional O(1) has_extension test per file (allocation cost only). The 8-line eprintln! block fires once at REPL startup, not per turn. Negligible.

Recommendation

APPROVE for merge. Carry the LOW items as future-work suggestions; none rise to a release blocker on their own and none are introduced by this PR.

Two Fixed entries under the new ### Fixed subsection in [Unreleased]: - chat_template.jinja now included in downloader allow-list (*.jinja extension added to is_wanted_file in filters.rs), with detail on the guard ordering that keeps the attack surface unchanged (#132, PR #134). - mlxcel run base-model warning upgraded from a one-liner to an actionable multi-line block that names the likely cause, suggests the -it Hub variant, and documents the --no-chat-template / mlxcel generate -p escape hatches (#132, PR #134).

inureyes · 2026-05-28T13:18:55Z

PR Finalization Complete

Summary

CHANGELOG.md: Added ### Fixed subsection under ## [Unreleased] with two entries matching the v0.1.0 prose style — one for the *.jinja allow-list fix, one for the actionable base-model warning. Both reference fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader #132 and PR fix: chat_template.jinja downloader allow-list and base-model UX warning #134.
debian/changelog: Left untouched per spec — no unreleased stanza pattern exists in that file; it syncs from CHANGELOG.md at release-prep time.
docs/ko/: Not touched — directory does not exist yet; reserved for a dedicated translation PR.
English docs (installation.md, supported-models.md, adding-models.md, README.md): Scanned for base-model / chat-template references. None of the existing sections are invalidated or improved by this fix; no doc edits made.
Tests: Downloader regression test (assert!(is_wanted_file("chat_template.jinja"))) already present at src/downloader/tests.rs:173. Chat warning is inline eprintln! — not refactored to a helper (consistent with pre-existing architecture, LOW finding L2 noted by reviewer as acceptable).
Lint/Format: cargo check --lib --tests → 0 errors / 0 warnings (sanity gate; no Rust code changed).

Commit pushed

612f774 docs: add [Unreleased] CHANGELOG entries for PR #134

All checks passing. Ready for merge.

…136) Models without a `chat_template` field in `tokenizer_config.json` and no `chat_template.jinja` (typically base / non-instruction-tuned models) previously fell back to bare content-only concatenation in `concat_plaintext`. With no role markers at all the model saw a flat blob of prior turns and tended to collapse into an echo loop — base models are completion models and the most natural continuation of an unstructured prompt is to parrot the user's last line indefinitely. PR #134 (issue #132) addressed this from the avoidance side by warning users that the model is likely a base variant and pointing at the `-it` counterpart; it deliberately left the fallback path itself untouched. This complements that by improving the fallback path for users who still proceed (intentionally, or with a model that simply has no `-it` variant). The implicit "no template found" path now uses a generic `User: ... Assistant: ...` pseudo-template with a trailing `Assistant:` cue (no newline) that nudges the model to produce an assistant turn next instead of completing its own prompt with another `User:` line. Three render paths, dispatched in `render_prompt`: - `--no-chat-template` (explicit user opt-in) keeps the existing raw concatenation. This is the offline `generate --no-chat-template` parallel for completion-style usage and must not change. - Chat template present: unchanged. Template render failure now falls back to the structured form rather than raw concat, since by then we already know the user is in chat mode. - No template + not opted out: new `concat_userassistant_fallback`. Labels `user` / `assistant` / `system` explicitly; unknown roles (e.g. `tool`) are preserved verbatim with the same `Role: ` pattern instead of silently merging into the prior turn. The upstream `processor.is_none()` warning still fires and still names base-model behavior as the underlying cause; only the closing two lines change to describe what the fallback actually does now (`--no-chat-template` becomes the documented escape hatch for raw concat). Tests in `src/commands/chat_tests.rs`: - `concat_plaintext_joins_turns_with_newlines` clarified as the `--no-chat-template` path - `user_assistant_fallback_labels_all_turns_and_cues_assistant` covers the multi-turn render and pins the trailing `Assistant:` cue (no newline) - `user_assistant_fallback_marks_unknown_roles_instead_of_dropping_them` covers the `tool`-style fallback - `render_prompt_without_template_uses_user_assistant_fallback` covers the implicit-fallback dispatch - `render_prompt_no_chat_template_flag_uses_raw_concatenation` pins that the explicit opt-in still gets raw concat Closes #133

inureyes added 2 commits May 28, 2026 22:07

inureyes added status:done Completed and removed status:review Under review labels May 28, 2026

style: apply cargo fmt to chat warning eprintln blocks

7de32af

inureyes merged commit b976d9e into main May 28, 2026
4 checks passed

inureyes deleted the fix/issue-132-chat-template-jinja-downloader-base-model-ux branch May 28, 2026 13:22

eupthere mentioned this pull request May 28, 2026

fix: improve interactive REPL fallback for models without chat_template #133

Closed

2 tasks

inureyes mentioned this pull request May 28, 2026

fix: structured User/Assistant fallback when no chat template exists #136

Merged

This was referenced May 29, 2026

fix: chat REPL repeats output from 2nd message #135

Closed

fix: make base-model warning's instruct advice family-aware #137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: chat_template.jinja downloader allow-list and base-model UX warning#134

fix: chat_template.jinja downloader allow-list and base-model UX warning#134
inureyes merged 4 commits into
mainfrom
fix/issue-132-chat-template-jinja-downloader-base-model-ux

inureyes commented May 28, 2026

Uh oh!

inureyes commented May 28, 2026

Uh oh!

inureyes commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inureyes commented May 28, 2026

Summary

What changed

Side-effect: src/server/chat_template.rs fallback now fires

Verification

Uh oh!

inureyes commented May 28, 2026

Security & Performance Review — verdict: APPROVE

Watchdog-safe verification

Findings by severity

Items explicitly checked and cleared

Recommendation

Uh oh!

inureyes commented May 28, 2026

PR Finalization Complete

Summary

Commit pushed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Side-effect: `src/server/chat_template.rs` fallback now fires