v0.7.27 — bundled AI hotfix (chatml + latency + banner copy)

Three bugs from the first real use of the bundled-Qwen image in
v0.7.26:

1. Summarize / explain / ask AI ran forever

chat_format="qwen" isn't stable across llama-cpp-python versions —
on 0.2.90+ it's either unregistered or aliased to a template that
doesn't match Qwen 2.5's actual ChatML format. Result: the model
never emits the EOS token, so generation runs all the way to
max_tokens at ~5-15 tok/sec on a shared CPU — 30-60 s per call,
felt like a hang.

Fixed by:

chat_format="chatml" — Qwen 2.5's actual conversation template,
stable in llama-cpp-python.
Explicit stop tokens <|im_end|> + <|endoftext|> as belt-and-
suspenders against a GGUF metadata quirk suppressing the EOS.
Greedy decoding (temperature=0.0) — deterministic + slightly
faster sampler.
n_ctx halved (4096 → 2048) — prompt eval scales O(ctx²) and our
prompts comfortably fit 2 k.
Per-method max_tokens caps tightened across the board.

Method	Before	After
summarize_result	320	180
explain_sql	300	200
optimize_sql	420	260
generate_sql	400	240
infographic_brief	480	300

Worst-case wall clock on a 5 tok/sec shared CPU: now ~35-60 s per
call instead of ~60-100 s plus runaway.

2. "Failed to get API key" banner for bundled

The admin AI page banner literally said "check that the API key is
valid" when any non-mock provider fell back to mock. Wrong copy for
bundled, which has no API key concept.

Fixed by a new fallbackHint(provider) helper that returns provider-
specific guidance:

Provider	Hint
`bundled`	verify the bundled GGUF model exists at `/app/models/` and that llama-cpp-python is installed
`ollama`	verify the Ollama server is reachable at `OLLAMA_BASE_URL` and the configured model is pulled
`openai` / `anthropic`	verify the API key is valid and the configured model is correct

The registry's WARNING log line was also updated to drop the
misleading "Check API key + bundled SDK install" tail and list the
actual first-line failure mode per provider.

3. Pydantic schema-shadow warnings

UserWarning: Field name "schema" in "ResultContext" / "InfographicContext" shadows an attribute in parent "BaseModel" fired on every server start.
Silenced at the model level with ConfigDict(protected_namespaces=()).

Upgrade

docker pull ghcr.io/sanniheruwala/rednotebook-ai:v0.7.27

HF Space: bump the FROM tag in the Space's Dockerfile from
:v0.7.26 to :v0.7.27.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.27 — bundled AI hotfix

Choose a tag to compare

Sorry, something went wrong.