Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: [fs] plugin needs to reflect user disk space usage #1658

Closed
HLFH opened this issue May 12, 2020 · 5 comments
Closed

Bug: [fs] plugin needs to reflect user disk space usage #1658

HLFH opened this issue May 12, 2020 · 5 comments

Comments

@HLFH
Copy link
Contributor

HLFH commented May 12, 2020

Bug description

python-pystache is required for the mustache templating.
When I start glances with 2 mins refresh time, I want to execute an action that alerts me when my disk usage is above 90%.
What I see is if I use the "%" character, everything breaks with error AttributeError: 'NoneType' object has no attribute 'get_stats_display', so I need to use two "%%" characters to escape it.
The second issue is I want to get an alert email, and I am getting two alert emails related to disk usage at the same time.
When I see actions.py, I observe:

Goal: avoid to execute the same command twice

Technically, the action is being executed twice, since I am getting two identical alert emails at the same time.
When I execute the echo command directly from the command line outside of glances and its conf, I get one email.
So there is definitely an issue within glances actions.

Versions

  • Glances & psutil (glances -V): Glances v3.1.4.1 with PsUtil v5.7.0
  • Operating System (lsb_release -a): LSB Version: 1.4, Arch Linux rolling

Packages: glances & python-pystache

Conf

[fs]
disable=False
# Define the list of hidden file system (comma-separated regexp)
hide=/boot.*,/snap.*
# Define filesystem space thresholds in %
# Default values if not defined: 50/70/90
# It is also possible to define per mount point value
# Example: /_careful=40
careful=50
warning=70
critical=90
critical_action_repeat=echo -e "Used filesystem disk space for {{device_name}} is at {{percent}}%%.\nPlease cleanup the filesystem to clear the alert.\nScaleway server: $(uname -rn)" | mail -s "CRITICAL: disk usage above 90%%" -r postmaster@example.com hlfh@example.com
# Allow additional file system types (comma-separated FS type)
#allow=zfs

systemd /etc/systemd/system/glances.service

[Unit]
Description=Glances Server

[Service]
ExecStart=/usr/bin/glances --quiet -t 120

[Install]
WantedBy=multi-user.target

Logs

What I am getting in debug mode:

2020-05-12 15:58:35,928 -- INFO -- Start Glances 3.1.4.1
2020-05-12 15:58:35,928 -- INFO -- CPython 3.8.2 and psutil 5.7.0 detected
@HLFH
Copy link
Contributor Author

HLFH commented Jun 17, 2020

I have a third issue.
My disk usage for / mountpoint was at 91.9%. The alert notice email did not action while critical was set at 90. To action it, I reduced the number to 85. I don't understand why it did not action at 90 default setting.

@HLFH
Copy link
Contributor Author

HLFH commented Jun 18, 2020

I see that mustache use is restricted to the action's command line where mustache is one of the attribute of the current plugin section. If an action executes an external file that is using the mustache syntax related to some attributes of the current plugin section, it will not display them. Sounds logic.

@HLFH
Copy link
Contributor Author

HLFH commented Jun 18, 2020

I updated my conf, now using an external Python script. Still getting the main issue.

Here is my critical_action_repeat line within glances.conf for fs plugin.

critical_action_repeat=echo {{percent}} > /tmp/fs.alert && python /etc/glances/actions.d/fs-critical.py

Mustache attributes are here being saved to /tmp/fs.alert to facilitate the accurate use of an external Python script.

Here is my /etc/glances/actions.d/fs-critical.py file.

import subprocess
system = subprocess.check_output(['uname', '-rn']).decode('utf-8')
percent = open('/tmp/fs.alert', 'r').readline().rstrip()
body = 'Used filesystem disk space for /dev/sda3 is at ' + percent + '%.\nPlease cleanup the filesystem to clear the alert.\nScaleway server: ' + str(system)
ps = subprocess.Popen(('echo', '-e', body), stdout=subprocess.PIPE)
subprocess.call(['mail', '-s', 'CRITICAL: disk usage above 90%', '-r', 'postmaster@example.com', 'hlfh@example.com'], stdin=ps.stdout)

When an external python file is used, the AttributeError: 'NoneType' object has no attribute 'get_stats_display' is fixed and you no longer need to double the percentage "%%" character to escape it.

Remaining issues:

  • duplicate email with _action_repeat (_action without _repeat suffix does not trigger the action twice at the same time) ;
  • the {{percent}} attribute of the fs plugin does not match exactly the space threshold. Example: if the {{percent}} attribute of the fs plugin is of 91.9%, that critical is set to 90, critical_action is not triggered.

What I see for the second issue with df -h:

/dev/sda3          909G    793G   70G  92% /

793G is 87.24% of 909G

Real disk usage is not 87.24% but around 92.3% because real available filesystem space is 70G.
70G = 7.70% of 909G.
While the {{percent}} attribute is quite correct, the space thresholds for the [fs] plugin shall take more in consideration the available disk space more than the used disk space, to arrive at the correct disk usage.

This second issue might be related to this file.

for i in self.stats:
            self.views[i[self.get_key()]]['used']['decoration'] = self.get_alert(
                i['used'], maximum=i['size'], header=i['mnt_point'])

EDIT: At the time, I no longer have duplicates with *_action_repeat and I don't know why.
I might have this issue before because glances.conf is not correctly processed when configuration changed, and lines for *_action & *_action_repeat are being added/removed. I really don't know.

HLFH added a commit to Halgo-io/glances that referenced this issue Jun 18, 2020
`psutill` [says the following](https://github.com/giampaolo/psutil/blob/master/psutil/_psposix.py):

>  Note: UNIX usually reserves 5% disk space which is not accessible
    by user. In this function "total" and "used" values reflect the
    total and used disk space whereas "free" and "percent" represent
    the "free" and "used percent" user disk space.

To fix the alert value that has been notified in this issue nicolargo#1658 where the alert has not been fixed.
The issue nicolargo#644 has fixed it one way, this PR is completing the fix for the alert feature of the `fs` plugin.
@HLFH
Copy link
Contributor Author

HLFH commented Jun 18, 2020

@nicolargo You might check the PR #1680 as I fixed the second issue related in this topic.

@HLFH HLFH changed the title Glances duplicated action for alert notice email Bug: [fs] plugin needs to reflect user disk space usage Jun 22, 2020
@HLFH
Copy link
Contributor Author

HLFH commented Jun 22, 2020

The initial duplicate alert issue had a workaround and #1432 should fix it in the long term.

Regarding the other issues listed here, everything has now been fixed in this Pull Request: #1680

@HLFH HLFH closed this as completed Jun 22, 2020
@nicolargo nicolargo added this to the Glances 3.1.5 milestone Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants