-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler:sample/0
reports inconsistent values from child processes
#5425
Comments
scheduler:sample/0
reports inconsistent values from spawn/1
scheduler:sample/0
reports inconsistent values from child processes
When `:scheduler.sample/0` is called from a process for the first time, _if and only if none of its parent processes have themselves called `:scheduler.sample/0`_, the values obtained from the call will refer to the time the child process has been alive. As such, calculating the scheduler utilization from two calls obtained from different child processes returns meaningless values. This has been [reported as a bug in the OTP repository][bug]. [bug]: erlang/otp#5425 As a workaround, this commit calls `:scheduler.sample/0` from `Appsignal`, right before the Erlang probe is initialised, ensuring that the Task child processes receive coherent values.
When `:scheduler.sample/0` is called from a process for the first time, _if and only if none of its parent processes have themselves called `:scheduler.sample/0`_, the values obtained from the call will refer to the time the child process has been alive. As such, calculating the scheduler utilization from two calls obtained from different child processes returns meaningless values. This has been [reported as a bug in the OTP repository][bug]. [bug]: erlang/otp#5425 As a workaround, this commit calls `:scheduler.sample/0` from `Appsignal`, right before the Erlang probe is initialised, ensuring that the Task child processes receive coherent values.
When `:scheduler.sample/0` is called from a process for the first time, _if and only if none of its parent processes have themselves called `:scheduler.sample/0`_, the values obtained from the call will refer to the time the child process has been alive. As such, calculating the scheduler utilization from two calls obtained from different child processes returns meaningless values. This has been [reported as a bug in the OTP repository][bug]. [bug]: erlang/otp#5425 As a workaround, this commit calls `:scheduler.sample/0` from `Appsignal`, right before the Erlang probe is initialised, ensuring that the Task child processes receive coherent values.
When `:scheduler.sample/0` is called from a process for the first time, _if and only if none of its parent processes have themselves called `:scheduler.sample/0`_, the values obtained from the call will refer to the time the child process has been alive. As such, calculating the scheduler utilization from two calls obtained from different child processes returns meaningless values. This has been [reported as a bug in the OTP repository][bug]. [bug]: erlang/otp#5425 As a workaround, this commit calls `:scheduler.sample/0` from `Appsignal`, right before the Erlang probe is initialised, ensuring that the Task child processes receive coherent values.
When `:scheduler.sample/0` is called from a process for the first time, _if and only if none of its parent processes have themselves called `:scheduler.sample/0`_, the values obtained from the call will refer to the time the child process has been alive. As such, calculating the scheduler utilization from two calls obtained from different child processes returns meaningless values. This has been [reported as a bug in the OTP repository][bug]. [bug]: erlang/otp#5425 As a workaround, this commit calls `:scheduler.sample/0` from `Appsignal`, right before the Erlang probe is initialised, ensuring that the Task child processes receive coherent values.
I think this is a matter of lacking documentation. In particular the documentation of And when the node global scheduler_wall_time flag is reset to false, the time counters are reset to zero. So to get correct values the process that did the first call to |
@sverker I agree that, if this is the intended behaviour, it should be documented as such. There's a "leaky abstraction" of sorts in having |
I agree. |
@unflxw Thanks for the report and feedback. |
Describe the bug
When calling
scheduler:sample/0
from a spawned process, time values for thenormal
schedulers will only be reported starting from the moment the child process was started. This makes values reported from different processes inconsistent with each other. When passing samples from different child processes toscheduler:utilization
, the resulting utilization values will be meaningless.However, this behaviour only happens if the parent process has not itself called
scheduler:sample/0
before -- after the parent process callsscheduler:sample/0
, the values reported from the child processes will be consistent with each other.[Note: I'm describing this in terms of
scheduler:sample/0
, but the same issue can be observed by callingerlang:statistics(scheduler_wall_time)
directly]To Reproduce
To simplify the resulting output, we'll call
element(4, hd(element(2, scheduler:sample())))
, obtaining the total time utilization of the first normal scheduler. Note that the behaviour can be observed for the total and active time values of all schedulers.From an ERL shell, without any previous calls to
scheduler:sample()
:Expected behavior
The time values returned by these two functions should be at least
1000000
microseconds apart, so that calculating the utilization from them returns meaningful values.Alternatively, if the value is meant to have a per-process scope, then it should be documented as such, and then the values of parent processes should not leak to child processes. Note that the following does correctly return values from the spawned processes that are at least
1000000
microseconds apart:That is, the moment you call
scheduler:sample/0
on the parent process, values obtained by calling it from its child processes have time values that are consistent with the parent process. My hunch is that callingscheduler:sample/0
from the parent process initialises some scheduler time counters that get passed to the child process.EDIT: it seems that calling
erlang:system_flag(scheduler_wall_time, true)
from the parent process has the same effect. Callingscheduler:sample()
implicitly sets that flag, but when it is called in a child process, the flag is not propagated to the parent.Affected versions
Tested on OTP 24. Older versions may be affected as well.
Additional context
I've found this issue when fetching samples from independent
Elixir.Task
processes and calculating the utilization.The text was updated successfully, but these errors were encountered: