quantcpp recommend: vocab-aware model selection for target hardware

## Summary

Add a `quantcpp recommend` command that suggests the optimal model based on hardware specs and user priorities (speed vs quality).

## Motivation

Our testing revealed a counter-intuitive finding: **vocab size dominates speed, not parameter count**.

```
SmolLM2-1.7B (vocab 49K):   23 tok/s  ← bigger model, FASTER
Llama-3.2-1B (vocab 128K):   2.3 tok/s ← smaller model, SLOWER
Phi-3.5-mini (vocab 32K):    6.5 tok/s ← best speed/quality ratio
```

Most users don't know this and pick models by parameter count alone, getting disappointed by speed.

## Proposed UX

```bash
quantcpp recommend
# Hardware: Apple M3, 16GB RAM
# Priority: balanced
#
# Recommended: Phi-3.5-mini (Q8_0)
#   Speed:   ~6.5 tok/s
#   Quality: MMLU 65.5, GSM8K 76.9
#   Size:    4.1 GB
#   Vocab:   32K (fastest in 3-4B class)
#
# Alternatives:
#   SmolLM2-1.7B (Q8) — 23 tok/s, lower quality
#   Qwen3-4B (Q4)     — best quality, ~2 tok/s

quantcpp recommend --priority speed
# → SmolLM2-1.7B

quantcpp recommend --priority quality
# → Qwen3-4B (Q4_K_M)
```

## Implementation

Speed prediction formula (empirically derived):
```
estimated_tok_s = base_tok_s * (base_vocab / model_vocab) * (base_params / model_params)^0.5
```

## Priority: P2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantcpp recommend: vocab-aware model selection for target hardware #88

Summary

Motivation

Proposed UX

Implementation

Priority: P2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

quantcpp recommend: vocab-aware model selection for target hardware #88

Description

Summary

Motivation

Proposed UX

Implementation

Priority: P2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions