Skip to content

mtop 1.0.0

Choose a tag to compare

@eladser eladser released this 09 Jun 21:08
· 18 commits to main since this release

First release.

mtop is one terminal window for your local AI: loaded models and the VRAM they hold, GPU state, every request with its tok/s, and a throughput sparkline.

What's in 1.0:

  • Models, GPU, requests and throughput panes. Zero config — run mtop, it finds Ollama on localhost.
  • Per-request tok/s via a pass-through proxy on 127.0.0.1:4321. Ollama has no metrics endpoint; the response stream is the only place those numbers exist, so mtop sits in the middle and reads them as they pass. Point OLLAMA_HOST at it.
  • Model unload: arrows to select, u to evict. Models that blow past their expiry get marked overdue.
  • -idle-unload 15m evicts anything that hasn't served a request in 15 minutes, for the times ollama forgets to.
  • Binaries for Windows, Linux and macOS below. GPU stats are NVIDIA-only for now; AMD and Apple Silicon are next on the roadmap, along with llama.cpp and LM Studio.

The gif in the README is a real run against a live model, not a mockup.