feat(default): promote Phi-3.5-mini to recommended default model by unamedkr · Pull Request #66 · quantumaikr/quant.cpp

unamedkr · 2026-04-12T03:07:39Z

Summary

Phi-3 architecture support landed in #65 and validated end-to-end as the best speed/quality combo we ship — vocab 32K + 3.8B params makes the lm_head matmul the fastest of any registered model. Promote Phi-3.5-mini to the default everywhere.

Default model progression:

Llama-3.2-1B (original)
SmolLM2-1.7B (feat(feedback): Quick Wins from 2026-04-12 external user report #59 — switched after vocab-size benchmark feedback)
Phi-3.5-mini (this PR)

Changes

Code

_MODEL_REGISTRY reordered with Phi-3.5-mini first, with a comment block marking it as the default and explaining the reasoning
cmd_chat_default (no-subcommand chat) now picks Phi-3.5-mini
Module docstring + Model.from_pretrained example use Phi-3.5-mini
CLI --help epilog: examples lead with `phi-3.5-mini`; backwards-compat block mentions `smollm2` / `llama3.2:1b` as alternatives

Docs

README.md Quick Start: Phi-3.5-mini is the recommended default. CLI examples and Python `from_pretrained` example updated. Benchmark/perf sections still reference SmolLM2/Llama because those are historical measurement data.
README.ko.md: same changes mirrored in Korean.
bindings/python/README.md (PyPI README): replaced "Basic question answering" with "Quick start (auto-download)" using `from_pretrained`. Added a multi-turn chat example using `m.chat()` + KV cache reuse, and API reference entries for `Model.chat()` and `Model.from_pretrained()`.

Verified

`ctest --test-dir build` → 35/35 passed
Full build clean (no new warnings)
Phi-3.5-mini end-to-end inference test still produces coherent multi-paragraph output:

"Name three planets in the solar system."
→ "Sure, here are names of three planets from our Solar System: 1. Earth — Our home planet... 2. Mars — Known as the Red Planet... 3. Jupiter — The largest planet in our solar..."
`available_models()` returns `['Llama-3.2-1B', 'Phi-3.5-mini', 'Qwen3.5-0.8B', 'SmolLM2-1.7B', 'SmolLM2-135M']`
`MODEL_ALIASES['phi-3.5-mini']` and friends resolve correctly
`cmd_chat_default` source confirms `args.model = "Phi-3.5-mini"`
`quantcpp --help` epilog reflects the new defaults

Test plan

Unit tests pass
Phi-3.5-mini end-to-end inference (regression)
CLI `--help` shows new examples
`available_models()` includes Phi-3.5-mini
Manual: `pip install -e .` from this branch + `quantcpp` (no args) downloads Phi-3.5-mini and starts chat

🤖 Generated with Claude Code

Phi-3 architecture support landed in #65 and validated end-to-end as the best speed/quality combo we ship (vocab 32K + 3.8B params makes the lm_head matmul the fastest of any registered model). Promote it to the default everywhere. ## Code - `_MODEL_REGISTRY` reordered with Phi-3.5-mini first; comment block marks it as the default and explains the reasoning - `cmd_chat_default` (no-subcommand chat) now picks Phi-3.5-mini - Module docstring + `Model.from_pretrained` example use Phi-3.5-mini - CLI `--help` epilog: examples lead with `phi-3.5-mini` and the backwards-compat block mentions `smollm2` / `llama3.2:1b` as alternatives instead ## Docs - README.md: Quick Start renamed Phi-3.5-mini as the recommended default; CLI examples and Python `from_pretrained` example updated. Benchmark/perf sections still reference SmolLM2/Llama models because those are historical measurement data. - README.ko.md: same changes mirrored in Korean. - bindings/python/README.md (PyPI README): replaced "Basic question answering" with "Quick start (auto-download)" using `from_pretrained`. Added a multi-turn chat example using `m.chat()` + KV cache reuse, and an API reference entry for `Model.chat()` and `Model.from_pretrained()`. ## Verified - ctest --test-dir build → 35/35 passed - Full build clean (no new warnings) - Phi-3.5-mini end-to-end inference test still produces coherent multi-paragraph output ("Name three planets..." → Earth, Mars, Jupiter with descriptions) - `available_models()` returns Phi-3.5-mini in the list - `MODEL_ALIASES['phi-3.5-mini']` and friends resolve correctly - `cmd_chat_default` source confirms `args.model = "Phi-3.5-mini"` - `quantcpp --help` epilog reflects the new defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unamedkr merged commit eb4f7d1 into main Apr 12, 2026
2 of 3 checks passed

unamedkr deleted the feat/phi3-default branch April 12, 2026 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(default): promote Phi-3.5-mini to recommended default model#66

feat(default): promote Phi-3.5-mini to recommended default model#66
unamedkr merged 1 commit intomainfrom
feat/phi3-default

unamedkr commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

unamedkr commented Apr 12, 2026

Summary

Changes

Code

Docs

Verified

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant