Skip to content

feat(default): promote Phi-3.5-mini to recommended default model#66

Merged
unamedkr merged 1 commit intomainfrom
feat/phi3-default
Apr 12, 2026
Merged

feat(default): promote Phi-3.5-mini to recommended default model#66
unamedkr merged 1 commit intomainfrom
feat/phi3-default

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

Summary

Phi-3 architecture support landed in #65 and validated end-to-end as the best speed/quality combo we ship — vocab 32K + 3.8B params makes the lm_head matmul the fastest of any registered model. Promote Phi-3.5-mini to the default everywhere.

Default model progression:

  1. Llama-3.2-1B (original)
  2. SmolLM2-1.7B (feat(feedback): Quick Wins from 2026-04-12 external user report #59 — switched after vocab-size benchmark feedback)
  3. Phi-3.5-mini (this PR)

Changes

Code

  • _MODEL_REGISTRY reordered with Phi-3.5-mini first, with a comment block marking it as the default and explaining the reasoning
  • cmd_chat_default (no-subcommand chat) now picks Phi-3.5-mini
  • Module docstring + Model.from_pretrained example use Phi-3.5-mini
  • CLI --help epilog: examples lead with `phi-3.5-mini`; backwards-compat block mentions `smollm2` / `llama3.2:1b` as alternatives

Docs

  • README.md Quick Start: Phi-3.5-mini is the recommended default. CLI examples and Python `from_pretrained` example updated. Benchmark/perf sections still reference SmolLM2/Llama because those are historical measurement data.
  • README.ko.md: same changes mirrored in Korean.
  • bindings/python/README.md (PyPI README): replaced "Basic question answering" with "Quick start (auto-download)" using `from_pretrained`. Added a multi-turn chat example using `m.chat()` + KV cache reuse, and API reference entries for `Model.chat()` and `Model.from_pretrained()`.

Verified

  • `ctest --test-dir build` → 35/35 passed
  • Full build clean (no new warnings)
  • Phi-3.5-mini end-to-end inference test still produces coherent multi-paragraph output:

    "Name three planets in the solar system."
    → "Sure, here are names of three planets from our Solar System: 1. Earth — Our home planet... 2. Mars — Known as the Red Planet... 3. Jupiter — The largest planet in our solar..."

  • `available_models()` returns `['Llama-3.2-1B', 'Phi-3.5-mini', 'Qwen3.5-0.8B', 'SmolLM2-1.7B', 'SmolLM2-135M']`
  • `MODEL_ALIASES['phi-3.5-mini']` and friends resolve correctly
  • `cmd_chat_default` source confirms `args.model = "Phi-3.5-mini"`
  • `quantcpp --help` epilog reflects the new defaults

Test plan

  • Unit tests pass
  • Phi-3.5-mini end-to-end inference (regression)
  • CLI `--help` shows new examples
  • `available_models()` includes Phi-3.5-mini
  • Manual: `pip install -e .` from this branch + `quantcpp` (no args) downloads Phi-3.5-mini and starts chat

🤖 Generated with Claude Code

Phi-3 architecture support landed in #65 and validated end-to-end as
the best speed/quality combo we ship (vocab 32K + 3.8B params makes
the lm_head matmul the fastest of any registered model). Promote it
to the default everywhere.

## Code

- `_MODEL_REGISTRY` reordered with Phi-3.5-mini first; comment block
  marks it as the default and explains the reasoning
- `cmd_chat_default` (no-subcommand chat) now picks Phi-3.5-mini
- Module docstring + `Model.from_pretrained` example use Phi-3.5-mini
- CLI `--help` epilog: examples lead with `phi-3.5-mini` and the
  backwards-compat block mentions `smollm2` / `llama3.2:1b` as
  alternatives instead

## Docs

- README.md: Quick Start renamed Phi-3.5-mini as the recommended
  default; CLI examples and Python `from_pretrained` example updated.
  Benchmark/perf sections still reference SmolLM2/Llama models because
  those are historical measurement data.
- README.ko.md: same changes mirrored in Korean.
- bindings/python/README.md (PyPI README): replaced "Basic question
  answering" with "Quick start (auto-download)" using `from_pretrained`.
  Added a multi-turn chat example using `m.chat()` + KV cache reuse,
  and an API reference entry for `Model.chat()` and
  `Model.from_pretrained()`.

## Verified

- ctest --test-dir build → 35/35 passed
- Full build clean (no new warnings)
- Phi-3.5-mini end-to-end inference test still produces coherent
  multi-paragraph output ("Name three planets..." → Earth, Mars,
  Jupiter with descriptions)
- `available_models()` returns Phi-3.5-mini in the list
- `MODEL_ALIASES['phi-3.5-mini']` and friends resolve correctly
- `cmd_chat_default` source confirms `args.model = "Phi-3.5-mini"`
- `quantcpp --help` epilog reflects the new defaults

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr merged commit eb4f7d1 into main Apr 12, 2026
2 of 3 checks passed
@unamedkr unamedkr deleted the feat/phi3-default branch April 12, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant