Skip to content

sabudaye/scheduler_utilization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scheduler_utilization

Non blocking Erlang VM scheduler utilization calculator.

Because Erlang CPU usage as reported from top isn’t the most reliable value (due to schedulers doing idle spinning to avoid going to sleep and impacting latency), a metric exists that is based on scheduler wall time - erlang.statistics(scheduler_wall_time). For any time interval, Scheduler wall time can be used as a measure of how "busy" a scheduler is. A scheduler is busy when:

  • Executing process code
  • Executing linked-in driver or NIF code
  • Executing BIFs, or any other runtime handling
  • Garbage collecting
  • Handling any other memory management

But the scheduler wall time isn't easy to read, it doesn't look like CPU usage in top:

iex(2)> :erlang.statistics(:scheduler_wall_time)
[
  {5, 87867, 27330418785},
  {7, 66353, 27330355600},
  {8, 44196, 27330389465},
  {2, 52264, 27330478740},
  {4, 61475, 27330400810},
  {1, 519905, 27330489650},
  {6, 63038, 27330349455},
  {3, 31529886, 27330400703},
  {9, 33784, 35213659141},
  {10, 23211, 35213659600},
  {11, 27003, 35213659867},
  {12, 24868, 35213660144},
  {13, 21020, 35213660403},
  {14, 23677, 35213660684},
  {15, 15166, 35213660949},
  {16, 20541, 35213661214}
]

So there is another metric based on difference between two samples of scheduler wall time - scheduler.utilization/1:

iex(3)> :scheduler.utilization(1)
[
  {:total, 2.6368630920183277e-5, '0.0%'},
  {:weighted, 5.2737261840366555e-5, '0.0%'},
  {:normal, 1, 2.355525364207035e-5, '0.0%'},
  {:normal, 2, 2.323553121047311e-5, '0.0%'},
  {:normal, 3, 2.5641358543427174e-4, '0.0%'},
  {:normal, 4, 2.4300431925188596e-5, '0.0%'},
  {:normal, 5, 2.5207908203652448e-5, '0.0%'},
  {:normal, 6, 2.141886904716157e-5, '0.0%'},
  {:normal, 7, 2.502426597604747e-5, '0.0%'},
  {:normal, 8, 2.274278445407965e-5, '0.0%'},
  {:cpu, 9, 0.0, '0.0%'},
  {:cpu, 10, 0.0, '0.0%'},
  {:cpu, 11, 0.0, '0.0%'},
  {:cpu, 12, 0.0, '0.0%'},
  {:cpu, 13, 0.0, '0.0%'},
  {:cpu, 14, 0.0, '0.0%'},
  {:cpu, 15, 0.0, '0.0%'},
  {:cpu, 16, 0.0, '0.0%'}
]

Like top would report something closer to this for each core. The higher the ratio, the higher the workload.

Hovewer, this metrics is implemented using sleep, which blocks the calling process, so all other messages in the process mailbox will wait until sleep ends. That's not very useful for telemetry. Unfortunately, Recon, a popular library for Erlang VM diagnostics, does the same

How then the scheduler utilization can be calculated without blocking the calling process? We would need another process to calculate scheduler utilization value and store it in the ets and timer:send_after/1 to update that value periodically, so the calling process would just basically read the value from ets. That's exactly what this application does in the nutshell.

Build

$ rebar3 compile

Usage

scheduler_utilization.total().
scheduler_utilization.weighted().

For the difference between total and wighted read the Erlang/OTP reference manual

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published