feat(default): promote Phi-3.5-mini to recommended default model#66
Merged
feat(default): promote Phi-3.5-mini to recommended default model#66
Conversation
Phi-3 architecture support landed in #65 and validated end-to-end as the best speed/quality combo we ship (vocab 32K + 3.8B params makes the lm_head matmul the fastest of any registered model). Promote it to the default everywhere. ## Code - `_MODEL_REGISTRY` reordered with Phi-3.5-mini first; comment block marks it as the default and explains the reasoning - `cmd_chat_default` (no-subcommand chat) now picks Phi-3.5-mini - Module docstring + `Model.from_pretrained` example use Phi-3.5-mini - CLI `--help` epilog: examples lead with `phi-3.5-mini` and the backwards-compat block mentions `smollm2` / `llama3.2:1b` as alternatives instead ## Docs - README.md: Quick Start renamed Phi-3.5-mini as the recommended default; CLI examples and Python `from_pretrained` example updated. Benchmark/perf sections still reference SmolLM2/Llama models because those are historical measurement data. - README.ko.md: same changes mirrored in Korean. - bindings/python/README.md (PyPI README): replaced "Basic question answering" with "Quick start (auto-download)" using `from_pretrained`. Added a multi-turn chat example using `m.chat()` + KV cache reuse, and an API reference entry for `Model.chat()` and `Model.from_pretrained()`. ## Verified - ctest --test-dir build → 35/35 passed - Full build clean (no new warnings) - Phi-3.5-mini end-to-end inference test still produces coherent multi-paragraph output ("Name three planets..." → Earth, Mars, Jupiter with descriptions) - `available_models()` returns Phi-3.5-mini in the list - `MODEL_ALIASES['phi-3.5-mini']` and friends resolve correctly - `cmd_chat_default` source confirms `args.model = "Phi-3.5-mini"` - `quantcpp --help` epilog reflects the new defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phi-3 architecture support landed in #65 and validated end-to-end as the best speed/quality combo we ship — vocab 32K + 3.8B params makes the lm_head matmul the fastest of any registered model. Promote Phi-3.5-mini to the default everywhere.
Default model progression:
Llama-3.2-1B(original)SmolLM2-1.7B(feat(feedback): Quick Wins from 2026-04-12 external user report #59 — switched after vocab-size benchmark feedback)Phi-3.5-mini(this PR)Changes
Code
_MODEL_REGISTRYreordered with Phi-3.5-mini first, with a comment block marking it as the default and explaining the reasoningcmd_chat_default(no-subcommand chat) now picks Phi-3.5-miniModel.from_pretrainedexample use Phi-3.5-mini--helpepilog: examples lead with `phi-3.5-mini`; backwards-compat block mentions `smollm2` / `llama3.2:1b` as alternativesDocs
Verified
Test plan
🤖 Generated with Claude Code