Release mtop 1.1.0 · eladser/mtop

mtop now watches more than ollama.

llama.cpp, LM Studio and vLLM show up in the models pane next to ollama, detected on their usual ports (8080, 1234, 8000 — all configurable, empty flag to skip). llama.cpp wants --metrics on launch for the kv-cache numbers.
The proxy counts OpenAI-style requests too (/v1/chat/completions, /v1/completions), so clients of llama.cpp and LM Studio get tok/s in the requests pane. Their responses carry no timings, so that number is tokens over wall time — close to decode speed, not identical. Ollama requests still use ollama's own timings.
AMD GPUs through rocm-smi. Apple Silicon shows unified-memory use; real GPU utilization needs root for powermetrics, so that's still open.
Press c for a per-model table: requests, average tok/s, p50/p95, tokens out.
The proxy port serves /metrics in prometheus format.
Status line turns red when GPU memory passes 93% or temperature passes 87°C.
~/.mtop.conf holds MTOP_* settings for hosts you don't want to retype.

The gif in the README is from a live run, as always.

Provide feedback