Release v0.8.0 · mohitsoni48/TurboLLM

TurboLLM v0.8.0 — Research v2, chat portability, engine lifecycle, and an auto-tune overhaul.

Added

Research v2 — pluggable web-search providers (Tavily / Kagi / SearXNG); a deterministic retrieval service with a confidence loop and a sources panel; and a heuristic referee that flags reply claims not supported by their cited sources.
Chat portability — share a chat via a LAN link or a debug snapshot, and export/import chats as .turbollm-chat.json (imported chats are fully continuable).
Agentic tool security — SSRF/RFC-1918 block on fetch_url and a confirmation gate on run_code.
vLLM load controls — max model length, GPU memory utilization, max concurrent sequences, dtype, KV-cache dtype, enforce-eager, trust-remote-code.
Engine lifecycle — 3-state engine rows (Install / Update / Disable / Enable / Delete) for both the catalog engines (vLLM / MLX / TurboQuant) and the llama.cpp backends.
"All" models view — list models unfiltered by the active engine, with compatibility badges.
Auto-tune — live prefill-% progress and a Save / Cancel results dialog.

Auto-tune rewritten — binary search over GPU offload, a realistic bench prompt (min(50k, 0.75 × ctx)), a 3-minute-per-test cap, GPU settle between candidates, and a spill-aware peak confirmation (a config that spills VRAM to system memory is PCIe-bottlenecked, so throughput peaks at the no-spill edge).
Stop / restart / load now act as kill switches — they cancel a running auto-tune and abort in-flight chat generations.
The model load dialog is driven by the active engine kind (vLLM shows its real controls, not MLX copy); slim custom scrollbar; real GPU-layer count instead of "99".
turbollm launch claude raises the request timeout so slow local models don't trigger retries.

Install / upgrade: npx turbollm@latest