Skip to content

Releases: signerless/llm-checker

v3.7.4

Choose a tag to compare

@signerless signerless released this 20 Jun 12:43
6264c76

Published to npm as llm-checker@3.7.4. Folds in everything since 3.7.0 (3.7.1–3.7.4). Full suite 48/48.

Highlights since 3.7.0:

  • Correct MoE memory sizing on ALL paths: weights are sized by the TOTAL parameter count and a real observed artifact size always wins, so a large MoE (e.g. a 236B / 397B-A17B model) can no longer falsely "fit" small hardware. Active params drive speed only.
  • A size-unknown Ollama variant (e.g. :latest) no longer inherits model_sizes[0]: qwen3:latest is sized ~9B (not 30B) and stops poisoning the real qwen3:30b size map (a 19GB model that was falsely fitting 16GB).
  • Multi-GPU VRAM is no longer double-counted (a 2x24=48GB box stays 48GB).
  • Recommendation diversity (3.7.1): the registry surfaces Hugging Face / GPT4All models, not just Ollama — quant/shard variants of the same model collapse to one distinct pick, and a source that scores close to the top is guaranteed a slot. Use --runtime vllm|mlx|llama.cpp|transformers or --source to target explicitly.
  • Registry CLI validation (3.7.3): registry-search/registry-recommend reject invalid --source/--format/--runtime/--optimize with a clear error, and never silently fall back to the built-in catalog when no artifacts match.
  • Registry ingestor data quality (3.7.4): LoRA adapters and optimizer/training files are no longer ingested as models; F16/FP16/BF16 are precisions (not quantizations); GPT4All sizes/canonical ids fixed; dead index dropped. Regenerated seed: 3 sources, 3,259 repos, 32,779 artifacts.
  • filterByCategory and other guards hardened against malformed input; MCP cli_exec now exposes the registry commands.

Full notes: docs/reference/changelog.md

v3.7.0

Choose a tag to compare

@signerless signerless released this 20 Jun 09:20
4977b92

Published to npm as llm-checker@3.7.0. Adds a packaged multi-source model registry and wires it into the recommendation flow. Full suite green at 44/44.

Highlights:

  • Multi-source registry: a packaged snapshot of ~3,259 repos / ~33,736 exact installable/downloadable artifacts from Hugging Face, Ollama, and GPT4All, with per-source install commands (hf download ..., ollama pull ...). New registry-sync, registry-search, and registry-recommend commands.
  • recommend (and the check recommendation card) now source candidates from the registry through the canonical deterministic scoring core, with --runtime auto plus Ollama / vLLM / MLX / llama.cpp / Transformers targeting; falls back to the Ollama catalog when the registry is empty or unavailable.
  • Mixture-of-Experts memory sizing fixed: MoE models (e.g. Mixtral-8x7B, Qwen3-397B-A17B) are sized by their TOTAL parameter count (all experts are resident under Ollama / Metal / vLLM), re-derived from the model name so a stale/under-reported DB value can never make a huge model falsely "fit" small hardware. The packaged seed DB was regenerated so stored MoE totals are correct (Mixtral-8x7B 7B→56B; Qwen3.5-397B-A17B 17B→397B total / 17B active).
  • Packaged src/data/seed/models.db is ~45 MB unpacked (tarball ~6.5 MB).

Carries everything from 3.6.1 (issue #88 scoring unification, #95 hardware VRAM, #97 MCP hardening, #86/#98 Windows UI). Full notes: docs/reference/changelog.md

v3.6.1

Choose a tag to compare

@signerless signerless released this 19 Jun 19:57

Published to npm as llm-checker@3.6.1. First npm release since 3.5.15 — also carries the previously-unpublished 3.6.0 batch. Every fix below ships with an integration test; the full suite is green at 39/39.

Highlights:

  • Fixes #88 (root cause): check, recommend, and smart-recommend now rank through one canonical scoring core (src/models/scoring-core.js), so they agree on the best model and the high-capacity right-sizing floor applies everywhere — tiny 2B–8B models no longer out-rank large models on high-end hardware. (#96)
  • Corrects GPU-VRAM detection for high-end / multi-GPU machines: workstation/datacenter cards (RTX PRO 6000, A100, H100, L40, …) no longer collapse to a generic 8 GB, fixes the GB normalization "dead zone", and guards a willModelFit divide-by-zero. A dual RTX PRO 6000 box now reports ~192 GB instead of ~16 GB. (#95)
  • Hardens the Claude MCP server: reads hardware facts from hw-detect --json instead of regex-scraping CLI text, fixes bogus tokens/sec, runs compare_models sequentially, syncs the advertised version from package.json, and makes the module importable for testing. (#97)
  • Fixes the Windows interactive-panel flicker (#86): resolves full-panel height overflow on 46–49 row terminals, adds debounced terminal resize handling, and stops the banner pulse from clearing the whole screen 8×/second. (#98)

Full notes: docs/reference/changelog.md

v3.5.13

Choose a tag to compare

@signerless signerless released this 03 May 13:20

Published to npm as llm-checker@3.5.13.

Highlights:

  • Ships a prebuilt Ollama SQLite catalog with 229 models and 7176 variants.
  • Adds weekly model DB update workflow and seed DB refresh tooling.
  • Cleans the interactive panel with animated status verbs instead of verbose progress bars.
  • AI Run now shows tokens/sec beside model responses.
  • Recommendation scoring avoids stale aliases, all-zero pulls, and cloud-only local picks.

v3.5.11

Choose a tag to compare

@signerless signerless released this 27 Mar 20:10

3.5.11 — Windows Ollama Host Normalization Follow-up

  • Fixed the remaining Windows Ollama client path where OLLAMA_HOST could be inherited as a wildcard bind address such as 0.0.0.0 or [::]
  • Wildcard bind hosts now normalize back to localhost for client requests
  • Missing Ollama ports now default to 11434
  • Kept the native-fetch retry fallback in the release path for retryable network failures such as fetch failed
  • Updated CLI/docs guidance so custom client endpoints use OLLAMA_BASE_URL
  • Added regression coverage for wildcard-host normalization

npm package: llm-checker@3.5.11

v3.5.9

Choose a tag to compare

@signerless signerless released this 26 Mar 09:27
bf5eda1

v3.5.9

  • Fixed the remaining Ollama localhost bypasses in selector flows.
  • Deterministic speed probes now use the shared Ollama client instead of a hardcoded http://localhost:11434 endpoint.
  • AI evaluator chat requests now use the same resolved Ollama base URL path as the rest of the CLI.
  • Added selector-specific regression coverage for Windows-style localhost failure with successful 127.0.0.1 fallback.
  • The separate Windows backend wording question (Best backend: cpu with Runtime assist: Vulkan) remains tracked in #71.

npm:

  • llm-checker@3.5.9

v3.5.8

Choose a tag to compare

@signerless signerless released this 25 Mar 22:02
435b4f0

Windows Ollama localhost fallback + Vulkan assist visibility

  • retry Ollama availability across localhost loopback candidates and persist the working base URL
  • filter fake Windows remote display adapters from fallback GPU inventory
  • surface Vulkan runtime assist metadata for integrated Windows GPU paths
  • improve hw-detect output for integrated/shared-memory acceleration paths
  • add regression tests for loopback fallback and Windows GPU reporting

Published npm package: llm-checker@3.5.8

v3.5.7

Choose a tag to compare

@signerless signerless released this 25 Mar 20:37

Highlights

  • fixed Windows CPU detection noise on modern Windows builds where wmic is retired
  • fixed oversized local Ollama recommendation edge cases on CPU-backed systems
  • bumped package and MCP server metadata to 3.5.7

Included fixes

  • #66 WMIC retired
  • #67 Recommended model with memory requirement more than I have

Validation

  • npm test passed (26/26)
  • npm pack --dry-run passed

v3.5.6

Choose a tag to compare

@signerless signerless released this 13 Mar 15:21

Integrated GPU Inventory & Hybrid Visibility

  • Added first-class integrated GPU inventory handling in unified hardware summaries.
  • Hybrid systems now keep both dedicated and integrated GPU models visible.
  • Integrated-only systems keep GPU visibility even when the runtime backend remains CPU.
  • Recommendation, tiering, and token-speed estimation now use canonical integrated-GPU signals more consistently.
  • CLI output now shows dedicated vs integrated GPU inventory explicitly.
  • Added regression coverage for hybrid and integrated-only detection paths.

v3.5.4

Choose a tag to compare

@signerless signerless released this 05 Mar 09:25

v3.5.4

Fixed

  • Linux hybrid GPU detection fallback now includes lspci parsing and improved dedicated-GPU enrichment (#58).
  • AMD ROCm VRAM unit parsing fixed to prevent massively overreported memory (#59).

Added

  • Fine-tuning suitability labels in check, recommend, and ai-check outputs (Full FT / LoRA / QLoRA support bands) (#60).

Tests

  • Added regression tests for ROCm VRAM parsing, hybrid GPU fallback detection, and fine-tuning classification.

Commit: ec1df33