mtop 1.0.0
First release.
mtop is one terminal window for your local AI: loaded models and the VRAM they hold, GPU state, every request with its tok/s, and a throughput sparkline.
What's in 1.0:
- Models, GPU, requests and throughput panes. Zero config — run
mtop, it finds Ollama on localhost. - Per-request tok/s via a pass-through proxy on
127.0.0.1:4321. Ollama has no metrics endpoint; the response stream is the only place those numbers exist, so mtop sits in the middle and reads them as they pass. PointOLLAMA_HOSTat it. - Model unload: arrows to select,
uto evict. Models that blow past their expiry get marked overdue. -idle-unload 15mevicts anything that hasn't served a request in 15 minutes, for the times ollama forgets to.- Binaries for Windows, Linux and macOS below. GPU stats are NVIDIA-only for now; AMD and Apple Silicon are next on the roadmap, along with llama.cpp and LM Studio.
The gif in the README is a real run against a live model, not a mockup.