Skip to content

mtop 1.3.0

Latest

Choose a tag to compare

@eladser eladser released this 18 Jun 08:52

Watch more than one box, see more per request, and a couple of new numbers.

  • Multi-host: give -ollama a comma list and mtop stacks the models and GPUs from each machine, tagged by host. Handy if you run models on a couple of boxes.
  • GPU util and memory now draw as sparklines over time, next to the live numbers.
  • Request inspector: run with -inspect, press i, and you get the last request's prompt, completion, and a load/prompt/decode timing split. Off by default; the text it captures is stripped of control bytes so a model can't smuggle escape sequences into your terminal.
  • Session energy on the TOK/S line: watt-hours used and tokens per watt-hour. It's whole-GPU power, so read it as a rough efficiency number.
  • compare -openai <url> runs the comparison against llama.cpp, LM Studio or vLLM, not just ollama.
  • -mem-alert and -temp-alert to set the alert thresholds instead of the built-in 93% and 87C.

brew and scoop pick this up as usual; winget follows once Microsoft merges the bump.