mtop 1.1.0
mtop now watches more than ollama.
- llama.cpp, LM Studio and vLLM show up in the models pane next to ollama, detected on their usual ports (8080, 1234, 8000 — all configurable, empty flag to skip). llama.cpp wants
--metricson launch for the kv-cache numbers. - The proxy counts OpenAI-style requests too (
/v1/chat/completions,/v1/completions), so clients of llama.cpp and LM Studio get tok/s in the requests pane. Their responses carry no timings, so that number is tokens over wall time — close to decode speed, not identical. Ollama requests still use ollama's own timings. - AMD GPUs through rocm-smi. Apple Silicon shows unified-memory use; real GPU utilization needs root for powermetrics, so that's still open.
- Press
cfor a per-model table: requests, average tok/s, p50/p95, tokens out. - The proxy port serves
/metricsin prometheus format. - Status line turns red when GPU memory passes 93% or temperature passes 87°C.
~/.mtop.confholds MTOP_* settings for hosts you don't want to retype.
The gif in the README is from a live run, as always.