Problem / Background
Running mlxcel run gemma-4-e4b-4bit (resolves to mlx-community/gemma-4-e4b-4bit) produces incoherent, repetitive output every turn. Example session:
mlxcel run gemma-4-e4b-4bit
Note: no chat template found for this model; sending raw text per turn.
>>> 하이
, 햇살뱅뱅
준비한
준비한
준비한
... (20+ repetitions, runs to max_tokens)
>>> 왜 저렇게 답변이 나올까?
왜 그럴까요?
왜 그까지만요?
... (repeats until max_tokens)
Investigation shows this is two distinct problems stacked on top of each other, both of which should be fixed together so the user-visible failure goes away on the next release.
Root cause 1 — User ran a base (non-instruction-tuned) model with no UX guard
mlx-community/gemma-4-e4b-4bit's cardData.base_model on HuggingFace is google/gemma-4-e4b (NOT google/gemma-4-e4b-it). Verified via the HF API: its tokenizer_config.json has no chat_template field, and the repo siblings list does not include chat_template.jinja. It genuinely ships without a chat template because base models are not instruction-tuned. Feeding raw multi-turn chat into a base model produces exactly the kind of repetition/drift the user saw — this is expected base-model behavior, not a generation bug.
mlxcel run currently still loads the base model, falls through to the raw-text path in src/commands/chat.rs:181-190, prints a single one-line Note: warning, and drops the user into an interactive prompt with no further guidance. The instruct-tuned counterparts (mlx-community/gemma-4-e4b-it-4bit, -6bit, -8bit, -mxfp4, etc.) are clearly named on the Hub and ship the correct chat template.
Root cause 2 — chat_template.jinja is filtered out by the downloader allow-list (latent bug)
Even if the user had asked for mlx-community/gemma-4-e4b-it-4bit instead, chat would still be broken. The downloader filter is_wanted_file in src/downloader/filters.rs:45-95 has an exact-name allow-list ("vocab" | "merges" | "added_tokens" | "special_tokens_map" | "tokenizer_config" | "tokenizer" | "generation_config" | "preprocessor_config" | "processor_config" | "chat_template") for files without a recognizable extension. But the actual HuggingFace convention is chat_template.jinja (with a .jinja extension). The extension allow-list above it covers safetensors, json, tiktoken, model, and a constrained txt, but not jinja. So chat_template.jinja matches neither branch and is rejected by the downloader.
Verified concretely:
mlx-community/gemma-4-e4b-it-4bit's siblings on the HF API include chat_template.jinja, and the file is non-empty.
- The chat-template loader at
src/server/chat_template.rs:80-89 explicitly falls back to reading chat_template.jinja from the model directory if tokenizer_config.json's chat_template field is empty or missing. That fallback path is currently dead because the file is never downloaded.
- The existing test in
src/downloader/tests.rs:172 only asserts chat_template.json is accepted — it does not cover chat_template.jinja.
This is a latent correctness bug that silently degrades chat for any model in this family (anything that ships its chat template as chat_template.jinja instead of inlining it into tokenizer_config.json).
Proposed Solution
Ship both fixes as a single PR so the user-visible failure mode (mlxcel run gemma-4-e4b-*) is fully addressed at once.
Sub-task A — Downloader allow-list fix (root cause 2)
In src/downloader/filters.rs::is_wanted_file, accept chat_template.jinja. Cleanest fix is to add has_extension(base, \"jinja\") to the extension allow-list (the only .jinja files HF ships are chat templates, so this is a safe class to whitelist). Add a regression test in src/downloader/tests.rs next to the existing chat_template.json assertion.
Sub-task B — Base-model UX guard (root cause 1)
In src/commands/chat.rs around line 188, when processor.is_none() && !opts.no_chat_template, replace the single-line Note: warning with a more informative message that tells the user why chat will be incoherent and what to do about it:
- State that the model appears to be a base / non-instruction-tuned model (or otherwise ships without a chat template) and that chat responses will likely be incoherent.
- Suggest looking for an
-it (instruction-tuned) variant of the same model family on the Hub.
- Explain how to proceed anyway: pass
--no-chat-template to suppress this notice, or use mlxcel generate -p ... for raw-text completion.
- The explicit
--no-chat-template path must remain silent and unchanged (no regression).
Stretch (optional, do not block this issue): when the resolved repo id matches a base-model naming pattern (<name> without an -it suffix) and <name>-it exists on the Hub or in the local store, name the suggestion explicitly. If too large, split into a follow-up issue.
Acceptance Criteria
Technical Considerations
Files and anchors
src/downloader/filters.rs:45-95 — is_wanted_file allow-list. Both the extension branch and the exact-name branch currently miss chat_template.jinja.
src/downloader/tests.rs:172 — existing test covers chat_template.json; add a .jinja case alongside it.
src/server/chat_template.rs:80-89 — loader's chat_template.jinja fallback, dead code today because of root cause 2. Should start firing once Sub-task A lands.
src/commands/chat.rs:181-190 — warning location for Sub-task B. The explicit --no-chat-template branch at L181 must stay silent.
Repro
mlxcel run gemma-4-e4b-4bit
>>> 하이
[incoherent, repetitive output]
After the fix:
mlxcel run gemma-4-e4b-4bit still loads, but the warning now explains the base-model situation and suggests mlxcel run gemma-4-e4b-it-4bit.
mlxcel run gemma-4-e4b-it-4bit downloads chat_template.jinja, applies the template, and produces coherent multi-turn chat replies.
Out of scope (call out as future work, do not block this issue)
- Automatic
-it suggestion via Hub lookup (Sub-task B stretch above).
- Generalizing the
.jinja allow-list into a broader template-file policy beyond chat_template.jinja.
Problem / Background
Running
mlxcel run gemma-4-e4b-4bit(resolves tomlx-community/gemma-4-e4b-4bit) produces incoherent, repetitive output every turn. Example session:Investigation shows this is two distinct problems stacked on top of each other, both of which should be fixed together so the user-visible failure goes away on the next release.
Root cause 1 — User ran a base (non-instruction-tuned) model with no UX guard
mlx-community/gemma-4-e4b-4bit'scardData.base_modelon HuggingFace isgoogle/gemma-4-e4b(NOTgoogle/gemma-4-e4b-it). Verified via the HF API: itstokenizer_config.jsonhas nochat_templatefield, and the repo siblings list does not includechat_template.jinja. It genuinely ships without a chat template because base models are not instruction-tuned. Feeding raw multi-turn chat into a base model produces exactly the kind of repetition/drift the user saw — this is expected base-model behavior, not a generation bug.mlxcel runcurrently still loads the base model, falls through to the raw-text path insrc/commands/chat.rs:181-190, prints a single one-lineNote:warning, and drops the user into an interactive prompt with no further guidance. The instruct-tuned counterparts (mlx-community/gemma-4-e4b-it-4bit,-6bit,-8bit,-mxfp4, etc.) are clearly named on the Hub and ship the correct chat template.Root cause 2 —
chat_template.jinjais filtered out by the downloader allow-list (latent bug)Even if the user had asked for
mlx-community/gemma-4-e4b-it-4bitinstead, chat would still be broken. The downloader filteris_wanted_fileinsrc/downloader/filters.rs:45-95has an exact-name allow-list ("vocab" | "merges" | "added_tokens" | "special_tokens_map" | "tokenizer_config" | "tokenizer" | "generation_config" | "preprocessor_config" | "processor_config" | "chat_template") for files without a recognizable extension. But the actual HuggingFace convention ischat_template.jinja(with a.jinjaextension). The extension allow-list above it coverssafetensors,json,tiktoken,model, and a constrainedtxt, but notjinja. Sochat_template.jinjamatches neither branch and is rejected by the downloader.Verified concretely:
mlx-community/gemma-4-e4b-it-4bit's siblings on the HF API includechat_template.jinja, and the file is non-empty.src/server/chat_template.rs:80-89explicitly falls back to readingchat_template.jinjafrom the model directory iftokenizer_config.json'schat_templatefield is empty or missing. That fallback path is currently dead because the file is never downloaded.src/downloader/tests.rs:172only assertschat_template.jsonis accepted — it does not coverchat_template.jinja.This is a latent correctness bug that silently degrades chat for any model in this family (anything that ships its chat template as
chat_template.jinjainstead of inlining it intotokenizer_config.json).Proposed Solution
Ship both fixes as a single PR so the user-visible failure mode (
mlxcel run gemma-4-e4b-*) is fully addressed at once.Sub-task A — Downloader allow-list fix (root cause 2)
In
src/downloader/filters.rs::is_wanted_file, acceptchat_template.jinja. Cleanest fix is to addhas_extension(base, \"jinja\")to the extension allow-list (the only.jinjafiles HF ships are chat templates, so this is a safe class to whitelist). Add a regression test insrc/downloader/tests.rsnext to the existingchat_template.jsonassertion.Sub-task B — Base-model UX guard (root cause 1)
In
src/commands/chat.rsaround line 188, whenprocessor.is_none() && !opts.no_chat_template, replace the single-lineNote:warning with a more informative message that tells the user why chat will be incoherent and what to do about it:-it(instruction-tuned) variant of the same model family on the Hub.--no-chat-templateto suppress this notice, or usemlxcel generate -p ...for raw-text completion.--no-chat-templatepath must remain silent and unchanged (no regression).Stretch (optional, do not block this issue): when the resolved repo id matches a base-model naming pattern (
<name>without an-itsuffix) and<name>-itexists on the Hub or in the local store, name the suggestion explicitly. If too large, split into a follow-up issue.Acceptance Criteria
src/downloader/filters.rs::is_wanted_fileacceptschat_template.jinja(e.g. by addinghas_extension(base, \"jinja\")to the extension allow-list).src/downloader/tests.rshas a new regression assertionassert!(is_wanted_file(\"chat_template.jinja\"))next to the existingchat_template.jsonassertion at line 172.mlx-community/gemma-4-e4b-it-4bit(or another model in the family) with mlxcel, confirmchat_template.jinjalands in the model store, confirmmlxcel runno longer printsno chat template foundfor it, and confirm it produces coherent multi-turn chat output. Document the verification (commands + observed output) in the PR description.src/commands/chat.rswarning at ~L188 is replaced with a multi-line message that explains the likely base-model cause, suggests the-itvariant, and tells the user how to proceed (--no-chat-templateormlxcel generate -p).--no-chat-templatepath remains silent (no regression of the explicit raw-text mode).mlxcel run gemma-4-e4b-4bitstill loads (it's a base model, that's the user's call) but the warning text now makes it obvious why chat will be bad and points at the-itvariant.Technical Considerations
Files and anchors
src/downloader/filters.rs:45-95—is_wanted_fileallow-list. Both the extension branch and the exact-name branch currently misschat_template.jinja.src/downloader/tests.rs:172— existing test coverschat_template.json; add a.jinjacase alongside it.src/server/chat_template.rs:80-89— loader'schat_template.jinjafallback, dead code today because of root cause 2. Should start firing once Sub-task A lands.src/commands/chat.rs:181-190— warning location for Sub-task B. The explicit--no-chat-templatebranch at L181 must stay silent.Repro
After the fix:
mlxcel run gemma-4-e4b-4bitstill loads, but the warning now explains the base-model situation and suggestsmlxcel run gemma-4-e4b-it-4bit.mlxcel run gemma-4-e4b-it-4bitdownloadschat_template.jinja, applies the template, and produces coherent multi-turn chat replies.Out of scope (call out as future work, do not block this issue)
-itsuggestion via Hub lookup (Sub-task B stretch above)..jinjaallow-list into a broader template-file policy beyondchat_template.jinja.