Skip to content

mtop 1.1.0

Choose a tag to compare

@eladser eladser released this 10 Jun 01:15
· 17 commits to main since this release

mtop now watches more than ollama.

  • llama.cpp, LM Studio and vLLM show up in the models pane next to ollama, detected on their usual ports (8080, 1234, 8000 — all configurable, empty flag to skip). llama.cpp wants --metrics on launch for the kv-cache numbers.
  • The proxy counts OpenAI-style requests too (/v1/chat/completions, /v1/completions), so clients of llama.cpp and LM Studio get tok/s in the requests pane. Their responses carry no timings, so that number is tokens over wall time — close to decode speed, not identical. Ollama requests still use ollama's own timings.
  • AMD GPUs through rocm-smi. Apple Silicon shows unified-memory use; real GPU utilization needs root for powermetrics, so that's still open.
  • Press c for a per-model table: requests, average tok/s, p50/p95, tokens out.
  • The proxy port serves /metrics in prometheus format.
  • Status line turns red when GPU memory passes 93% or temperature passes 87°C.
  • ~/.mtop.conf holds MTOP_* settings for hosts you don't want to retype.

The gif in the README is from a live run, as always.