Watch more than one box, see more per request, and a couple of new numbers.
- Multi-host: give
-ollamaa comma list and mtop stacks the models and GPUs from each machine, tagged by host. Handy if you run models on a couple of boxes. - GPU util and memory now draw as sparklines over time, next to the live numbers.
- Request inspector: run with
-inspect, pressi, and you get the last request's prompt, completion, and a load/prompt/decode timing split. Off by default; the text it captures is stripped of control bytes so a model can't smuggle escape sequences into your terminal. - Session energy on the TOK/S line: watt-hours used and tokens per watt-hour. It's whole-GPU power, so read it as a rough efficiency number.
compare -openai <url>runs the comparison against llama.cpp, LM Studio or vLLM, not just ollama.-mem-alertand-temp-alertto set the alert thresholds instead of the built-in 93% and 87C.
brew and scoop pick this up as usual; winget follows once Microsoft merges the bump.