Releases: signerless/llm-checker
Release list
v3.7.4
Published to npm as llm-checker@3.7.4. Folds in everything since 3.7.0 (3.7.1–3.7.4). Full suite 48/48.
Highlights since 3.7.0:
- Correct MoE memory sizing on ALL paths: weights are sized by the TOTAL parameter count and a real observed artifact size always wins, so a large MoE (e.g. a 236B / 397B-A17B model) can no longer falsely "fit" small hardware. Active params drive speed only.
- A size-unknown Ollama variant (e.g.
:latest) no longer inheritsmodel_sizes[0]:qwen3:latestis sized ~9B (not 30B) and stops poisoning the realqwen3:30bsize map (a 19GB model that was falsely fitting 16GB). - Multi-GPU VRAM is no longer double-counted (a 2x24=48GB box stays 48GB).
- Recommendation diversity (3.7.1): the registry surfaces Hugging Face / GPT4All models, not just Ollama — quant/shard variants of the same model collapse to one distinct pick, and a source that scores close to the top is guaranteed a slot. Use
--runtime vllm|mlx|llama.cpp|transformersor--sourceto target explicitly. - Registry CLI validation (3.7.3):
registry-search/registry-recommendreject invalid--source/--format/--runtime/--optimizewith a clear error, and never silently fall back to the built-in catalog when no artifacts match. - Registry ingestor data quality (3.7.4): LoRA adapters and optimizer/training files are no longer ingested as models; F16/FP16/BF16 are precisions (not quantizations); GPT4All sizes/canonical ids fixed; dead index dropped. Regenerated seed: 3 sources, 3,259 repos, 32,779 artifacts.
filterByCategoryand other guards hardened against malformed input; MCPcli_execnow exposes the registry commands.
Full notes: docs/reference/changelog.md
v3.7.0
Published to npm as llm-checker@3.7.0. Adds a packaged multi-source model registry and wires it into the recommendation flow. Full suite green at 44/44.
Highlights:
- Multi-source registry: a packaged snapshot of ~3,259 repos / ~33,736 exact installable/downloadable artifacts from Hugging Face, Ollama, and GPT4All, with per-source install commands (
hf download ...,ollama pull ...). Newregistry-sync,registry-search, andregistry-recommendcommands. recommend(and thecheckrecommendation card) now source candidates from the registry through the canonical deterministic scoring core, with--runtime autoplus Ollama / vLLM / MLX / llama.cpp / Transformers targeting; falls back to the Ollama catalog when the registry is empty or unavailable.- Mixture-of-Experts memory sizing fixed: MoE models (e.g.
Mixtral-8x7B,Qwen3-397B-A17B) are sized by their TOTAL parameter count (all experts are resident under Ollama / Metal / vLLM), re-derived from the model name so a stale/under-reported DB value can never make a huge model falsely "fit" small hardware. The packaged seed DB was regenerated so stored MoE totals are correct (Mixtral-8x7B 7B→56B; Qwen3.5-397B-A17B 17B→397B total / 17B active). - Packaged
src/data/seed/models.dbis ~45 MB unpacked (tarball ~6.5 MB).
Carries everything from 3.6.1 (issue #88 scoring unification, #95 hardware VRAM, #97 MCP hardening, #86/#98 Windows UI). Full notes: docs/reference/changelog.md
v3.6.1
Published to npm as llm-checker@3.6.1. First npm release since 3.5.15 — also carries the previously-unpublished 3.6.0 batch. Every fix below ships with an integration test; the full suite is green at 39/39.
Highlights:
- Fixes #88 (root cause):
check,recommend, andsmart-recommendnow rank through one canonical scoring core (src/models/scoring-core.js), so they agree on the best model and the high-capacity right-sizing floor applies everywhere — tiny 2B–8B models no longer out-rank large models on high-end hardware. (#96) - Corrects GPU-VRAM detection for high-end / multi-GPU machines: workstation/datacenter cards (RTX PRO 6000, A100, H100, L40, …) no longer collapse to a generic 8 GB, fixes the GB normalization "dead zone", and guards a
willModelFitdivide-by-zero. A dual RTX PRO 6000 box now reports ~192 GB instead of ~16 GB. (#95) - Hardens the Claude MCP server: reads hardware facts from
hw-detect --jsoninstead of regex-scraping CLI text, fixes bogus tokens/sec, runscompare_modelssequentially, syncs the advertised version frompackage.json, and makes the module importable for testing. (#97) - Fixes the Windows interactive-panel flicker (#86): resolves full-panel height overflow on 46–49 row terminals, adds debounced terminal resize handling, and stops the banner pulse from clearing the whole screen 8×/second. (#98)
Full notes: docs/reference/changelog.md
v3.5.13
Published to npm as llm-checker@3.5.13.
Highlights:
- Ships a prebuilt Ollama SQLite catalog with 229 models and 7176 variants.
- Adds weekly model DB update workflow and seed DB refresh tooling.
- Cleans the interactive panel with animated status verbs instead of verbose progress bars.
- AI Run now shows tokens/sec beside model responses.
- Recommendation scoring avoids stale aliases, all-zero pulls, and cloud-only local picks.
v3.5.11
3.5.11 — Windows Ollama Host Normalization Follow-up
- Fixed the remaining Windows Ollama client path where
OLLAMA_HOSTcould be inherited as a wildcard bind address such as0.0.0.0or[::] - Wildcard bind hosts now normalize back to
localhostfor client requests - Missing Ollama ports now default to
11434 - Kept the native-
fetchretry fallback in the release path for retryable network failures such asfetch failed - Updated CLI/docs guidance so custom client endpoints use
OLLAMA_BASE_URL - Added regression coverage for wildcard-host normalization
npm package: llm-checker@3.5.11
v3.5.9
v3.5.9
- Fixed the remaining Ollama localhost bypasses in selector flows.
- Deterministic speed probes now use the shared Ollama client instead of a hardcoded
http://localhost:11434endpoint. - AI evaluator chat requests now use the same resolved Ollama base URL path as the rest of the CLI.
- Added selector-specific regression coverage for Windows-style
localhostfailure with successful127.0.0.1fallback. - The separate Windows backend wording question (
Best backend: cpuwithRuntime assist: Vulkan) remains tracked in #71.
npm:
llm-checker@3.5.9
v3.5.8
Windows Ollama localhost fallback + Vulkan assist visibility
- retry Ollama availability across localhost loopback candidates and persist the working base URL
- filter fake Windows remote display adapters from fallback GPU inventory
- surface Vulkan runtime assist metadata for integrated Windows GPU paths
- improve hw-detect output for integrated/shared-memory acceleration paths
- add regression tests for loopback fallback and Windows GPU reporting
Published npm package: llm-checker@3.5.8
v3.5.7
Highlights
- fixed Windows CPU detection noise on modern Windows builds where
wmicis retired - fixed oversized local Ollama recommendation edge cases on CPU-backed systems
- bumped package and MCP server metadata to
3.5.7
Included fixes
#66WMIC retired#67Recommended model with memory requirement more than I have
Validation
npm testpassed (26/26)npm pack --dry-runpassed
v3.5.6
Integrated GPU Inventory & Hybrid Visibility
- Added first-class integrated GPU inventory handling in unified hardware summaries.
- Hybrid systems now keep both dedicated and integrated GPU models visible.
- Integrated-only systems keep GPU visibility even when the runtime backend remains CPU.
- Recommendation, tiering, and token-speed estimation now use canonical integrated-GPU signals more consistently.
- CLI output now shows dedicated vs integrated GPU inventory explicitly.
- Added regression coverage for hybrid and integrated-only detection paths.
v3.5.4
v3.5.4
Fixed
- Linux hybrid GPU detection fallback now includes
lspciparsing and improved dedicated-GPU enrichment (#58). - AMD ROCm VRAM unit parsing fixed to prevent massively overreported memory (#59).
Added
- Fine-tuning suitability labels in
check,recommend, andai-checkoutputs (Full FT / LoRA / QLoRA support bands) (#60).
Tests
- Added regression tests for ROCm VRAM parsing, hybrid GPU fallback detection, and fine-tuning classification.
Commit: ec1df33