Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[system/pressure] Add plugin to monitor pressure stall information (psi) #1302

Merged
merged 2 commits into from
Apr 1, 2022

Conversation

HaseHarald
Copy link
Contributor

Adds a plugin to monitor the pressure stall information (psi) as reported by the Linux kernel.

  • groups averages per resource
  • rate/derive totals for ease of reading
  • resources, intervals and scopes configurable

See: https://www.kernel.org/doc/html/latest/accounting/psi.html

I've written this plugin because I read about the metric and wanted to get a feel for what's "normal" on these values. Therefore I did not include any warnings and am not certain that all metrics are actually useful. For example I'm not sure if it helps to monitor the 10 and 60 second averages long term. So I made this stuff configurable. Still this might help someone else, or someone has an idea on an improvement. So here it is for everyone to use.

Adds a plugin to monitor the pressure stall information (psi) as reported by the Linux kernel.

- groups averages per resource
- rate/derive totals for ease of reading
- resources, intervals and scopes configurable

See: https://www.kernel.org/doc/html/latest/accounting/psi.html
Copy link
Collaborator

@sumpfralle sumpfralle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing this plugin!

Please take a look at my comments below.

In addition I have the following suggestions:

  • please take a look at the issues reported by shellcheck
  • I would recommend to use a less generic name for this plugin (not just "pressure"). How about pressure_stall, linux_pressure or linux_psi?

Thank you!

if [ -d "${pressure_dir}" ]; then
printf "yes\n"
else
printf "no (%s not found)\n" ${pressure_dir}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add quotes. See the output of shellcheck for such issues.

}

get_pressure_value() {
resource=$1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use local for variables inside of functions in order to keep the namespace clean.

resource=$1
interval=$2
scope=${3:-some}
grep "$scope" ${pressure_dir}//${resource} | grep -o -E "${interval}=[0-9]{1,}(\.[0-9]{1,}){0,1}" | cut -d '=' -f 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one slash is sufficient for the grep path, oder?

Please quote the path.

Regarding {1,} in the regular expression: wouldn't be + an easier readable way of expressing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that depends on your definition of "easy". The way it currently is set, the regex works the same way throughout. It's all the same pattern. Whereas your solution (using +) brings in a different notation, where one reading this will need to think about it again.

Off course one could argue that this is still common regex that most everybody knows by heart, if they know regex at all, and that regular changes can help to keep you awake. But still its opinion either way around.

The double slash is a typo by the way. I'm going to fix that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you thought about the possible ways of expressing this. So I am happy with your choice.

;;
esac

printf "%s\n" $printable_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, the linebreak is not relevant here?
(trailing linebreaks are automatically removed in a Command Substitution)

done
}

output_usage() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simply redirect output_usage to stderr when it is called?
(in order to remove all these redirects here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this. But to be honest I prefer to be able to call the function without needing to remember to redirect the output. It's not like adding a parameter and somehow feels wrong to me from a programming standpoint.

}

output_usage() {
printf >&2 "%s - munin plugin to graph pressure stall information for CPU, Memory and IO as reported by the Linux kernel.\n" ${0##*/}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe suggest to run munin-doc PLUGIN_NAME instead of repeating the content of the documentation header here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Researching munin-doc PLUGIN_NAME, I decided against using it. The reason being, that this will only work when the script is called in a munin-context. In such a context, people are typically aware of the possibility to just look at the header of the script. The output_usage() function is mainly there for situation where the context is not clear or where the script is called without munin.

config)
iterate_config
;;
*)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch is also an acceptable alternative for "no argument".


1)
case $1 in
auto|autoconf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto is not specified. Please treat it as an error.

@HaseHarald
Copy link
Contributor Author

Hello @sumpfralle,

thanks for the feedback. Sorry it took so long to answer, I've been busy. I'm going to go through the issues and resolve them if needed. On some of them I've got a different opinion though. I'll add them right next to your code-quotes so they are easier to relate.

…ovements

Address several issues regarding the psi-plugin (pressure stall information).

Fixes:
- Use local for variables in functions.
- Add fetch as a valid parameter and remove auto.
- Remove double slash in path for get_pressure_value() and quote it.
- Remove line break in return value of get_printable_name().
- Quote variables to avoid splitting/globbing.
- Rename pressure plugin to linux_psi.

References:
- munin-monitoring#1302
@HaseHarald
Copy link
Contributor Author

Hello,

I've fixed several of the issues in 2ee1161. The once not changed I elaborated above.

Also I renamed the plugin to linux_psi which really is the better name.

Have a nice day

HaseHarald

@sumpfralle sumpfralle merged commit bdec9c4 into munin-monitoring:master Apr 1, 2022
@sumpfralle
Copy link
Collaborator

Your changes look good to me. Thank you!

I merged your plugin now.

btw.: just today another contributor proposed another munin plugin for this topic in munin#1462. Maybe you want to take a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants