Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix probing after "systemctl daemon-reexec" #7

Merged
merged 1 commit into from
Oct 17, 2018

Conversation

luisfdez
Copy link

Without this patch running systemctl daemon-reexec (after systemd dependencies are updated, etc) breaks the plugin until collectd is restarted as the manager to dbus is broken. What the log shows is something like this:

# After running systemctl daemon-reexec
[2018-10-15 15:58:03] systemd plugin [verbose]: Read callback called
[2018-10-15 15:58:03] Unhandled python exception in read callback: DBusException: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.10601 was not provided by any .service files                                                                                                    
[2018-10-15 15:58:03] Traceback (most recent call last):
[2018-10-15 15:58:03]   File "/usr/lib/python2.7/site-packages/collectd_systemd.py", line 68, in read_callback
    state = self.get_service_state(full_name)
[2018-10-15 15:58:03]   File "/usr/lib/python2.7/site-packages/collectd_systemd.py", line 42, in get_service_state
    return unit.Get('org.freedesktop.systemd1.Unit', 'SubState')
[2018-10-15 15:58:03]   File "/usr/lib64/python2.7/site-packages/dbus/proxies.py", line 145, in __call__
    **keywords)
[2018-10-15 15:58:03]   File "/usr/lib64/python2.7/site-packages/dbus/connection.py", line 651, in call_blocking
    message, timeout)
[2018-10-15 15:58:03] DBusException: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.10601 was not provided by any .service files
[2018-10-15 15:58:03] read-function of plugin `python.collectd_systemd' failed. Will suspend it for 20.000 seconds.

This patch tries to mitigate that case by:

  • Catching the exception:.
  • Cleaning the unit cache.
  • Reinitializing the connection (the manager is the culprit in this case).
  • Retrying only one time.

Under the same situation, the logs would look like this:

# Normal probing
[2018-10-15 16:59:51] systemd plugin [verbose]: Read callback called
[2018-10-15 16:59:51] systemd plugin [verbose]: Sending value: systemd.sshd=1.0 (state=running)

# After systemctl daemon-reexec
[2018-10-15 17:00:01] systemd plugin [verbose]: Read callback called
[2018-10-15 17:00:01] systemd plugin [verbose]: systemd plugin: failed to monitor unit sshd.service: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.10647 was not provided by any .service files                                                                                 
[2018-10-15 17:00:01] systemd plugin [verbose]: Unit sshd.service reported as broken. Reinitializing the connection to dbus & retrying.
[2018-10-15 17:00:01] systemd plugin [verbose]: Sending value: systemd.sshd=1.0 (state=running)

# Next calls...
[2018-10-15 17:00:11] systemd plugin [verbose]: Read callback called
[2018-10-15 17:00:11] systemd plugin [verbose]: Sending value: systemd.sshd=1.0 (state=running)

Maybe there is a better way to identify the broken manager, but I didn't find it exploring the python interface to dbus.

Cheers,
Luis

@coveralls
Copy link

coveralls commented Oct 16, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 9c09af2 on luisfdez:master into a35287b on mbachry:master.

@luisfdez
Copy link
Author

I've just updated the tests, code coverage should be back to 100%.

@mbachry
Copy link
Owner

mbachry commented Oct 17, 2018

Looks great! Thanks a lot.

@mbachry mbachry merged commit 212cb79 into mbachry:master Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants