-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[system/pressure] Add plugin to monitor pressure stall information (psi) #1302
Conversation
Adds a plugin to monitor the pressure stall information (psi) as reported by the Linux kernel. - groups averages per resource - rate/derive totals for ease of reading - resources, intervals and scopes configurable See: https://www.kernel.org/doc/html/latest/accounting/psi.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contributing this plugin!
Please take a look at my comments below.
In addition I have the following suggestions:
- please take a look at the issues reported by
shellcheck
- I would recommend to use a less generic name for this plugin (not just "pressure"). How about
pressure_stall
,linux_pressure
orlinux_psi
?
Thank you!
plugins/system/pressure
Outdated
if [ -d "${pressure_dir}" ]; then | ||
printf "yes\n" | ||
else | ||
printf "no (%s not found)\n" ${pressure_dir} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add quotes. See the output of shellcheck
for such issues.
plugins/system/pressure
Outdated
} | ||
|
||
get_pressure_value() { | ||
resource=$1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use local
for variables inside of functions in order to keep the namespace clean.
plugins/system/pressure
Outdated
resource=$1 | ||
interval=$2 | ||
scope=${3:-some} | ||
grep "$scope" ${pressure_dir}//${resource} | grep -o -E "${interval}=[0-9]{1,}(\.[0-9]{1,}){0,1}" | cut -d '=' -f 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one slash is sufficient for the grep
path, oder?
Please quote the path.
Regarding {1,}
in the regular expression: wouldn't be +
an easier readable way of expressing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that depends on your definition of "easy". The way it currently is set, the regex works the same way throughout. It's all the same pattern. Whereas your solution (using +
) brings in a different notation, where one reading this will need to think about it again.
Off course one could argue that this is still common regex that most everybody knows by heart, if they know regex at all, and that regular changes can help to keep you awake. But still its opinion either way around.
The double slash is a typo by the way. I'm going to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you thought about the possible ways of expressing this. So I am happy with your choice.
plugins/system/pressure
Outdated
;; | ||
esac | ||
|
||
printf "%s\n" $printable_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, the linebreak is not relevant here?
(trailing linebreaks are automatically removed in a Command Substitution)
plugins/system/pressure
Outdated
done | ||
} | ||
|
||
output_usage() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simply redirect output_usage
to stderr when it is called?
(in order to remove all these redirects here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about this. But to be honest I prefer to be able to call the function without needing to remember to redirect the output. It's not like adding a parameter and somehow feels wrong to me from a programming standpoint.
plugins/system/pressure
Outdated
} | ||
|
||
output_usage() { | ||
printf >&2 "%s - munin plugin to graph pressure stall information for CPU, Memory and IO as reported by the Linux kernel.\n" ${0##*/} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe suggest to run munin-doc PLUGIN_NAME
instead of repeating the content of the documentation header here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Researching munin-doc PLUGIN_NAME
, I decided against using it. The reason being, that this will only work when the script is called in a munin-context. In such a context, people are typically aware of the possibility to just look at the header of the script. The output_usage()
function is mainly there for situation where the context is not clear or where the script is called without munin.
plugins/system/pressure
Outdated
config) | ||
iterate_config | ||
;; | ||
*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch
is also an acceptable alternative for "no argument".
plugins/system/pressure
Outdated
|
||
1) | ||
case $1 in | ||
auto|autoconf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
is not specified. Please treat it as an error.
Hello @sumpfralle, thanks for the feedback. Sorry it took so long to answer, I've been busy. I'm going to go through the issues and resolve them if needed. On some of them I've got a different opinion though. I'll add them right next to your code-quotes so they are easier to relate. |
…ovements Address several issues regarding the psi-plugin (pressure stall information). Fixes: - Use local for variables in functions. - Add fetch as a valid parameter and remove auto. - Remove double slash in path for get_pressure_value() and quote it. - Remove line break in return value of get_printable_name(). - Quote variables to avoid splitting/globbing. - Rename pressure plugin to linux_psi. References: - munin-monitoring#1302
Hello, I've fixed several of the issues in 2ee1161. The once not changed I elaborated above. Also I renamed the plugin to Have a nice day HaseHarald |
Your changes look good to me. Thank you! I merged your plugin now. btw.: just today another contributor proposed another munin plugin for this topic in munin#1462. Maybe you want to take a look at it. |
Adds a plugin to monitor the pressure stall information (psi) as reported by the Linux kernel.
See: https://www.kernel.org/doc/html/latest/accounting/psi.html
I've written this plugin because I read about the metric and wanted to get a feel for what's "normal" on these values. Therefore I did not include any warnings and am not certain that all metrics are actually useful. For example I'm not sure if it helps to monitor the 10 and 60 second averages long term. So I made this stuff configurable. Still this might help someone else, or someone has an idea on an improvement. So here it is for everyone to use.