fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader

## Problem / Background

Running `mlxcel run gemma-4-e4b-4bit` (resolves to `mlx-community/gemma-4-e4b-4bit`) produces incoherent, repetitive output every turn. Example session:

```
mlxcel run gemma-4-e4b-4bit
Note: no chat template found for this model; sending raw text per turn.
>>> 하이
, 햇살뱅뱅
준비한
준비한
준비한
... (20+ repetitions, runs to max_tokens)
>>> 왜 저렇게 답변이 나올까?
왜 그럴까요?
왜 그까지만요?
... (repeats until max_tokens)
```

Investigation shows this is **two distinct problems stacked on top of each other**, both of which should be fixed together so the user-visible failure goes away on the next release.

### Root cause 1 — User ran a base (non-instruction-tuned) model with no UX guard

`mlx-community/gemma-4-e4b-4bit`'s `cardData.base_model` on HuggingFace is `google/gemma-4-e4b` (NOT `google/gemma-4-e4b-it`). Verified via the HF API: its `tokenizer_config.json` has no `chat_template` field, and the repo siblings list does not include `chat_template.jinja`. It genuinely ships without a chat template because base models are not instruction-tuned. Feeding raw multi-turn chat into a base model produces exactly the kind of repetition/drift the user saw — this is expected base-model behavior, not a generation bug.

`mlxcel run` currently still loads the base model, falls through to the raw-text path in `src/commands/chat.rs:181-190`, prints a single one-line `Note:` warning, and drops the user into an interactive prompt with no further guidance. The instruct-tuned counterparts (`mlx-community/gemma-4-e4b-it-4bit`, `-6bit`, `-8bit`, `-mxfp4`, etc.) are clearly named on the Hub and ship the correct chat template.

### Root cause 2 — `chat_template.jinja` is filtered out by the downloader allow-list (latent bug)

Even if the user had asked for `mlx-community/gemma-4-e4b-it-4bit` instead, chat would still be broken. The downloader filter `is_wanted_file` in `src/downloader/filters.rs:45-95` has an exact-name allow-list (`"vocab" | "merges" | "added_tokens" | "special_tokens_map" | "tokenizer_config" | "tokenizer" | "generation_config" | "preprocessor_config" | "processor_config" | "chat_template"`) for files without a recognizable extension. But the actual HuggingFace convention is `chat_template.jinja` (with a `.jinja` extension). The extension allow-list above it covers `safetensors`, `json`, `tiktoken`, `model`, and a constrained `txt`, but **not `jinja`**. So `chat_template.jinja` matches neither branch and is rejected by the downloader.

Verified concretely:

- `mlx-community/gemma-4-e4b-it-4bit`'s siblings on the HF API include `chat_template.jinja`, and the file is non-empty.
- The chat-template loader at `src/server/chat_template.rs:80-89` explicitly falls back to reading `chat_template.jinja` from the model directory if `tokenizer_config.json`'s `chat_template` field is empty or missing. That fallback path is currently dead because the file is never downloaded.
- The existing test in `src/downloader/tests.rs:172` only asserts `chat_template.json` is accepted — it does not cover `chat_template.jinja`.

This is a latent correctness bug that silently degrades chat for any model in this family (anything that ships its chat template as `chat_template.jinja` instead of inlining it into `tokenizer_config.json`).

## Proposed Solution

Ship both fixes as a single PR so the user-visible failure mode (`mlxcel run gemma-4-e4b-*`) is fully addressed at once.

### Sub-task A — Downloader allow-list fix (root cause 2)

In `src/downloader/filters.rs::is_wanted_file`, accept `chat_template.jinja`. Cleanest fix is to add `has_extension(base, \"jinja\")` to the extension allow-list (the only `.jinja` files HF ships are chat templates, so this is a safe class to whitelist). Add a regression test in `src/downloader/tests.rs` next to the existing `chat_template.json` assertion.

### Sub-task B — Base-model UX guard (root cause 1)

In `src/commands/chat.rs` around line 188, when `processor.is_none() && !opts.no_chat_template`, replace the single-line `Note:` warning with a more informative message that tells the user **why** chat will be incoherent and **what to do about it**:

- State that the model appears to be a base / non-instruction-tuned model (or otherwise ships without a chat template) and that chat responses will likely be incoherent.
- Suggest looking for an `-it` (instruction-tuned) variant of the same model family on the Hub.
- Explain how to proceed anyway: pass `--no-chat-template` to suppress this notice, or use `mlxcel generate -p ...` for raw-text completion.
- The explicit `--no-chat-template` path must remain silent and unchanged (no regression).

Stretch (optional, do not block this issue): when the resolved repo id matches a base-model naming pattern (`<name>` without an `-it` suffix) and `<name>-it` exists on the Hub or in the local store, name the suggestion explicitly. If too large, split into a follow-up issue.

## Acceptance Criteria

- [ ] **Sub-task A.1** — `src/downloader/filters.rs::is_wanted_file` accepts `chat_template.jinja` (e.g. by adding `has_extension(base, \"jinja\")` to the extension allow-list).
- [ ] **Sub-task A.2** — `src/downloader/tests.rs` has a new regression assertion `assert!(is_wanted_file(\"chat_template.jinja\"))` next to the existing `chat_template.json` assertion at line 172.
- [ ] **Sub-task A.3** — End-to-end verification: re-download `mlx-community/gemma-4-e4b-it-4bit` (or another model in the family) with mlxcel, confirm `chat_template.jinja` lands in the model store, confirm `mlxcel run` no longer prints `no chat template found` for it, and confirm it produces coherent multi-turn chat output. Document the verification (commands + observed output) in the PR description.
- [ ] **Sub-task B.1** — `src/commands/chat.rs` warning at ~L188 is replaced with a multi-line message that explains the likely base-model cause, suggests the `-it` variant, and tells the user how to proceed (`--no-chat-template` or `mlxcel generate -p`).
- [ ] **Sub-task B.2** — `--no-chat-template` path remains silent (no regression of the explicit raw-text mode).
- [ ] **Sub-task B.3** — Manual reproduction: `mlxcel run gemma-4-e4b-4bit` still loads (it's a base model, that's the user's call) but the warning text now makes it obvious why chat will be bad and points at the `-it` variant.

## Technical Considerations

### Files and anchors

- `src/downloader/filters.rs:45-95` — `is_wanted_file` allow-list. Both the extension branch and the exact-name branch currently miss `chat_template.jinja`.
- `src/downloader/tests.rs:172` — existing test covers `chat_template.json`; add a `.jinja` case alongside it.
- `src/server/chat_template.rs:80-89` — loader's `chat_template.jinja` fallback, dead code today because of root cause 2. Should start firing once Sub-task A lands.
- `src/commands/chat.rs:181-190` — warning location for Sub-task B. The explicit `--no-chat-template` branch at L181 must stay silent.

### Repro

```
mlxcel run gemma-4-e4b-4bit
>>> 하이
[incoherent, repetitive output]
```

After the fix:

- `mlxcel run gemma-4-e4b-4bit` still loads, but the warning now explains the base-model situation and suggests `mlxcel run gemma-4-e4b-it-4bit`.
- `mlxcel run gemma-4-e4b-it-4bit` downloads `chat_template.jinja`, applies the template, and produces coherent multi-turn chat replies.

### Out of scope (call out as future work, do not block this issue)

- Automatic `-it` suggestion via Hub lookup (Sub-task B stretch above).
- Generalizing the `.jinja` allow-list into a broader template-file policy beyond `chat_template.jinja`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader #132

Problem / Background

Root cause 1 — User ran a base (non-instruction-tuned) model with no UX guard

Root cause 2 — `chat_template.jinja` is filtered out by the downloader allow-list (latent bug)

Proposed Solution

Sub-task A — Downloader allow-list fix (root cause 2)

Sub-task B — Base-model UX guard (root cause 1)

Acceptance Criteria

Technical Considerations

Files and anchors

Repro

Out of scope (call out as future work, do not block this issue)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(chat): garbage output for base/non-instruct models and missing chat_template.jinja in downloader #132

Description

Problem / Background

Root cause 1 — User ran a base (non-instruction-tuned) model with no UX guard

Root cause 2 — chat_template.jinja is filtered out by the downloader allow-list (latent bug)

Proposed Solution

Sub-task A — Downloader allow-list fix (root cause 2)

Sub-task B — Base-model UX guard (root cause 1)

Acceptance Criteria

Technical Considerations

Files and anchors

Repro

Out of scope (call out as future work, do not block this issue)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Root cause 2 — `chat_template.jinja` is filtered out by the downloader allow-list (latent bug)