Derived from llmfit — find which LLMs actually fit on your hardware. Try it at llmsizer.com.
- Detects your GPU via WebGL
- Estimates memory for each model across quantization levels (Q2_K through F16)
- Predicts speed (tokens/sec) based on your GPU's memory bandwidth
- Scores and ranks 5,000+ models by quality, speed, fit, and context length
- Shows what fits — perfect, good, marginal, or won't run
Static React SPA — everything runs in your browser. No backend required.
Built with TypeScript, Vite, and a model database auto-updated weekly from HuggingFace.
npm install
npm run devnpm testpublic/models.json is generated by a Python scraper with a pre-quantized-repo sizing fix. Stdlib only — no pip deps.
# curated list only
python3 scripts/scrape_hf_models.py > public/models.json
# curated + top-N trending models (what's currently shipped)
python3 scripts/scrape_hf_models.py --discover -n 800 > public/models.jsonFor AWQ/GPTQ/MLX/BNB repos, the scraper sums real .safetensors file sizes
from the HF tree API and writes weight_gb, since HuggingFace's
safetensors.total reports packed-tensor element counts (~8× too small
for 4-bit quantized weights). The UI engine uses weight_gb directly when
present instead of applying a generic Q4_K_M formula.
To patch pre-quantized entries in an already-scraped models.json
without re-scraping, run:
python3 scripts/fix_quantized_entries.pyMIT — see LICENSE.
The scrapers in scripts/ are derived from llmfit (MIT, © 2026 Alex Jones); the upstream notice is reproduced in NOTICE.
