Skip to content

fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader #132

@inureyes

Description

@inureyes

Problem / Background

Running mlxcel run gemma-4-e4b-4bit (resolves to mlx-community/gemma-4-e4b-4bit) produces incoherent, repetitive output every turn. Example session:

mlxcel run gemma-4-e4b-4bit
Note: no chat template found for this model; sending raw text per turn.
>>> 하이
, 햇살뱅뱅
준비한
준비한
준비한
... (20+ repetitions, runs to max_tokens)
>>> 왜 저렇게 답변이 나올까?
왜 그럴까요?
왜 그까지만요?
... (repeats until max_tokens)

Investigation shows this is two distinct problems stacked on top of each other, both of which should be fixed together so the user-visible failure goes away on the next release.

Root cause 1 — User ran a base (non-instruction-tuned) model with no UX guard

mlx-community/gemma-4-e4b-4bit's cardData.base_model on HuggingFace is google/gemma-4-e4b (NOT google/gemma-4-e4b-it). Verified via the HF API: its tokenizer_config.json has no chat_template field, and the repo siblings list does not include chat_template.jinja. It genuinely ships without a chat template because base models are not instruction-tuned. Feeding raw multi-turn chat into a base model produces exactly the kind of repetition/drift the user saw — this is expected base-model behavior, not a generation bug.

mlxcel run currently still loads the base model, falls through to the raw-text path in src/commands/chat.rs:181-190, prints a single one-line Note: warning, and drops the user into an interactive prompt with no further guidance. The instruct-tuned counterparts (mlx-community/gemma-4-e4b-it-4bit, -6bit, -8bit, -mxfp4, etc.) are clearly named on the Hub and ship the correct chat template.

Root cause 2 — chat_template.jinja is filtered out by the downloader allow-list (latent bug)

Even if the user had asked for mlx-community/gemma-4-e4b-it-4bit instead, chat would still be broken. The downloader filter is_wanted_file in src/downloader/filters.rs:45-95 has an exact-name allow-list ("vocab" | "merges" | "added_tokens" | "special_tokens_map" | "tokenizer_config" | "tokenizer" | "generation_config" | "preprocessor_config" | "processor_config" | "chat_template") for files without a recognizable extension. But the actual HuggingFace convention is chat_template.jinja (with a .jinja extension). The extension allow-list above it covers safetensors, json, tiktoken, model, and a constrained txt, but not jinja. So chat_template.jinja matches neither branch and is rejected by the downloader.

Verified concretely:

  • mlx-community/gemma-4-e4b-it-4bit's siblings on the HF API include chat_template.jinja, and the file is non-empty.
  • The chat-template loader at src/server/chat_template.rs:80-89 explicitly falls back to reading chat_template.jinja from the model directory if tokenizer_config.json's chat_template field is empty or missing. That fallback path is currently dead because the file is never downloaded.
  • The existing test in src/downloader/tests.rs:172 only asserts chat_template.json is accepted — it does not cover chat_template.jinja.

This is a latent correctness bug that silently degrades chat for any model in this family (anything that ships its chat template as chat_template.jinja instead of inlining it into tokenizer_config.json).

Proposed Solution

Ship both fixes as a single PR so the user-visible failure mode (mlxcel run gemma-4-e4b-*) is fully addressed at once.

Sub-task A — Downloader allow-list fix (root cause 2)

In src/downloader/filters.rs::is_wanted_file, accept chat_template.jinja. Cleanest fix is to add has_extension(base, \"jinja\") to the extension allow-list (the only .jinja files HF ships are chat templates, so this is a safe class to whitelist). Add a regression test in src/downloader/tests.rs next to the existing chat_template.json assertion.

Sub-task B — Base-model UX guard (root cause 1)

In src/commands/chat.rs around line 188, when processor.is_none() && !opts.no_chat_template, replace the single-line Note: warning with a more informative message that tells the user why chat will be incoherent and what to do about it:

  • State that the model appears to be a base / non-instruction-tuned model (or otherwise ships without a chat template) and that chat responses will likely be incoherent.
  • Suggest looking for an -it (instruction-tuned) variant of the same model family on the Hub.
  • Explain how to proceed anyway: pass --no-chat-template to suppress this notice, or use mlxcel generate -p ... for raw-text completion.
  • The explicit --no-chat-template path must remain silent and unchanged (no regression).

Stretch (optional, do not block this issue): when the resolved repo id matches a base-model naming pattern (<name> without an -it suffix) and <name>-it exists on the Hub or in the local store, name the suggestion explicitly. If too large, split into a follow-up issue.

Acceptance Criteria

  • Sub-task A.1src/downloader/filters.rs::is_wanted_file accepts chat_template.jinja (e.g. by adding has_extension(base, \"jinja\") to the extension allow-list).
  • Sub-task A.2src/downloader/tests.rs has a new regression assertion assert!(is_wanted_file(\"chat_template.jinja\")) next to the existing chat_template.json assertion at line 172.
  • Sub-task A.3 — End-to-end verification: re-download mlx-community/gemma-4-e4b-it-4bit (or another model in the family) with mlxcel, confirm chat_template.jinja lands in the model store, confirm mlxcel run no longer prints no chat template found for it, and confirm it produces coherent multi-turn chat output. Document the verification (commands + observed output) in the PR description.
  • Sub-task B.1src/commands/chat.rs warning at ~L188 is replaced with a multi-line message that explains the likely base-model cause, suggests the -it variant, and tells the user how to proceed (--no-chat-template or mlxcel generate -p).
  • Sub-task B.2--no-chat-template path remains silent (no regression of the explicit raw-text mode).
  • Sub-task B.3 — Manual reproduction: mlxcel run gemma-4-e4b-4bit still loads (it's a base model, that's the user's call) but the warning text now makes it obvious why chat will be bad and points at the -it variant.

Technical Considerations

Files and anchors

  • src/downloader/filters.rs:45-95is_wanted_file allow-list. Both the extension branch and the exact-name branch currently miss chat_template.jinja.
  • src/downloader/tests.rs:172 — existing test covers chat_template.json; add a .jinja case alongside it.
  • src/server/chat_template.rs:80-89 — loader's chat_template.jinja fallback, dead code today because of root cause 2. Should start firing once Sub-task A lands.
  • src/commands/chat.rs:181-190 — warning location for Sub-task B. The explicit --no-chat-template branch at L181 must stay silent.

Repro

mlxcel run gemma-4-e4b-4bit
>>> 하이
[incoherent, repetitive output]

After the fix:

  • mlxcel run gemma-4-e4b-4bit still loads, but the warning now explains the base-model situation and suggests mlxcel run gemma-4-e4b-it-4bit.
  • mlxcel run gemma-4-e4b-it-4bit downloads chat_template.jinja, applies the template, and produces coherent multi-turn chat replies.

Out of scope (call out as future work, do not block this issue)

  • Automatic -it suggestion via Hub lookup (Sub-task B stretch above).
  • Generalizing the .jinja allow-list into a broader template-file policy beyond chat_template.jinja.

Metadata

Metadata

Assignees

Labels

area:cliCommand-line interface / CLI flagsarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)area:modelsModel architectures, weights, loading, metadatapriority:highHigh prioritystatus:doneCompletedtype:bugBug fixes, error corrections, or issue resolutions

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions