fix: chat_template.jinja downloader allow-list and base-model UX warning#134
Conversation
HuggingFace ships chat templates as chat_template.jinja, not inlined in tokenizer_config.json. The extension allow-list covered safetensors, json, tiktoken, model, and txt but not jinja, so the file was silently dropped during snapshot download. The fallback in src/server/chat_template.rs that reads chat_template.jinja was dead code for any model using this convention.
Adds has_extension(base, "jinja") after the tiktoken clause and adds assert!(is_wanted_file("chat_template.jinja")) immediately after the existing chat_template.json assertion in the test suite.
The previous single-line Note was not actionable. Users running a base (non-instruction-tuned) model got incoherent, repetitive output with no explanation of why or what to do next. The new warning block explains that the model likely has no chat template because it is a base model, states that responses will be incoherent, names the -it suffix convention for instruction-tuned variants on the Hub, and tells the user how to proceed without the warning (--no-chat-template for silent raw-text mode, or mlxcel generate -p for one-shot completion). The --no-chat-template path remains completely silent.
Security & Performance Review — verdict: APPROVETargeted review of the Watchdog-safe verification
Findings by severityCRITICAL — none. HIGH — none. MEDIUM — none. LOW (informational; not auto-fixed per policy):
Items explicitly checked and cleared
RecommendationAPPROVE for merge. Carry the LOW items as future-work suggestions; none rise to a release blocker on their own and none are introduced by this PR. |
Two Fixed entries under the new ### Fixed subsection in [Unreleased]: - chat_template.jinja now included in downloader allow-list (*.jinja extension added to is_wanted_file in filters.rs), with detail on the guard ordering that keeps the attack surface unchanged (#132, PR #134). - mlxcel run base-model warning upgraded from a one-liner to an actionable multi-line block that names the likely cause, suggests the -it Hub variant, and documents the --no-chat-template / mlxcel generate -p escape hatches (#132, PR #134).
PR Finalization CompleteSummary
Commit pushed
All checks passing. Ready for merge. |
…136) Models without a `chat_template` field in `tokenizer_config.json` and no `chat_template.jinja` (typically base / non-instruction-tuned models) previously fell back to bare content-only concatenation in `concat_plaintext`. With no role markers at all the model saw a flat blob of prior turns and tended to collapse into an echo loop — base models are completion models and the most natural continuation of an unstructured prompt is to parrot the user's last line indefinitely. PR #134 (issue #132) addressed this from the avoidance side by warning users that the model is likely a base variant and pointing at the `-it` counterpart; it deliberately left the fallback path itself untouched. This complements that by improving the fallback path for users who still proceed (intentionally, or with a model that simply has no `-it` variant). The implicit "no template found" path now uses a generic `User: ... Assistant: ...` pseudo-template with a trailing `Assistant:` cue (no newline) that nudges the model to produce an assistant turn next instead of completing its own prompt with another `User:` line. Three render paths, dispatched in `render_prompt`: - `--no-chat-template` (explicit user opt-in) keeps the existing raw concatenation. This is the offline `generate --no-chat-template` parallel for completion-style usage and must not change. - Chat template present: unchanged. Template render failure now falls back to the structured form rather than raw concat, since by then we already know the user is in chat mode. - No template + not opted out: new `concat_userassistant_fallback`. Labels `user` / `assistant` / `system` explicitly; unknown roles (e.g. `tool`) are preserved verbatim with the same `Role: ` pattern instead of silently merging into the prior turn. The upstream `processor.is_none()` warning still fires and still names base-model behavior as the underlying cause; only the closing two lines change to describe what the fallback actually does now (`--no-chat-template` becomes the documented escape hatch for raw concat). Tests in `src/commands/chat_tests.rs`: - `concat_plaintext_joins_turns_with_newlines` clarified as the `--no-chat-template` path - `user_assistant_fallback_labels_all_turns_and_cues_assistant` covers the multi-turn render and pins the trailing `Assistant:` cue (no newline) - `user_assistant_fallback_marks_unknown_roles_instead_of_dropping_them` covers the `tool`-style fallback - `render_prompt_without_template_uses_user_assistant_fallback` covers the implicit-fallback dispatch - `render_prompt_no_chat_template_flag_uses_raw_concatenation` pins that the explicit opt-in still gets raw concat Closes #133
Summary
Two stacked root causes that together produced garbage output when running
mlxcel run gemma-4-e4b-4bit: the downloader silently droppedchat_template.jinja(so instruction-tuned models had no working template), and the UX warning for base models gave no actionable guidance.What changed
src/downloader/filters.rs— Addedhas_extension(base, "jinja")to the extension allow-list, immediately after thetiktokenclause. HuggingFace ships chat templates aschat_template.jinja; the previous allow-list covered every other relevant extension but missed this one, so the file was silently filtered out during snapshot download. Theis_safe_relative_pathguard that runs first is unchanged; no new attack surface is opened.src/downloader/tests.rs— Addedassert!(is_wanted_file("chat_template.jinja"))immediately after the existingchat_template.jsonassertion inallow_list_includes_configs_and_tokenizer_files.src/commands/chat.rs— Replaced the single-lineNote:eprintln with an 8-line block that explains the base-model cause, warns that responses will be incoherent, names the-itsuffix convention for instruction-tuned Hub variants, and tells the user to pass--no-chat-templatefor silent raw-text mode ormlxcel generate -pfor one-shot completion. The--no-chat-templatepath remains completely silent.Side-effect:
src/server/chat_template.rsfallback now firesThe loader at
src/server/chat_template.rs:80-89already had a fallback that readschat_template.jinjafrom the model directory. That fallback was dead code because the file was never downloaded. Once Sub-task A lands, the fallback starts working automatically for any model that ships its template this way — no changes to that file were needed or made.Verification
End-to-end download verification (Sub-task A.3 — actually re-downloading
mlx-community/gemma-4-e4b-it-4bit, confirmingchat_template.jinjalands in the model store, and confirmingmlxcel runproduces coherent multi-turn chat) is deferred to the reviewer / orchestrator. It requires network access and a full model download and is out of scope for this automated implementation step. Please run this verification before merging.Closes #132