Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for openvswitch and helper function to enable the checks #601

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

afreiberger
Copy link
Contributor

This commit provides an nrpe script to check for errors in ovs-vsctl show output to be shared across several openstack networking charms or any other charm that may wish to add openvswitch monitoring.

There is also a helper function in contrib.charmsupport.nrpe to add_openvswitch_checks which will setup the necessary sudoers rights for the nagios user to introspect the running openvswitch process.

The check_openvswitch.py script relies upon nagios_plugins3 module which is delivered with charm-nrpe. The add_openvswitch_checks method should not be used outside of the context of an nrpe-relation hook.

Copy link
Contributor

@ajkavanagh ajkavanagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch; just a couple of comments around testing and extracting data from the ovs-vsctl command.

Comment on lines 26 to 30
ovs_error_re = re.compile(r"^.*error: (?P<message>.+)$", re.I)
for line in ovs_output.decode(errors="ignore").splitlines():
m = ovs_error_re.match(line)
if m:
ovs_vsctl_show_errors.append(m.group("message"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a regex, the ovs-vsctl command does support outputing in json (--format=json). Is there a reason for not doing that (perhaps that the errors would appear in different nodes in different versions??) Just wondering how to make it less magic.

Copy link
Contributor Author

@afreiberger afreiberger Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovs-vsctl show does not support the --format option (the args are parsed, but the output is not differentiated).

root@ruling-manta:/home/ubuntu# ovs-vsctl -f json show
fcaf57e2-8667-4972-b687-169789d1d15d
    Bridge "br0"
        Port "dpdk-p1"
            Interface "dpdk-p1"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.1"}
                error: "could not open network device dpdk-p1 (Address family not supported by protocol)"
        Port "br0"
            Interface "br0"
                type: internal
        Port "dpdk-p0"
            Interface "dpdk-p0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0"}
                error: "could not open network device dpdk-p0 (Address family not supported by protocol)"
    ovs_version: "2.9.8"

I've also investigated using the json output for ovs-vsctl list Interfaces:

{"data":[[["uuid","9bc2d04d-0069-4a76-918a-b9ac61d4c5ce"],["set",[]],["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],"could not open network device dpdk-p1 (Address family not supported by protocol)",["map",[]],["set",[]],0,0,["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],"dpdk-p1",-1,["set",[]],["map",[["dpdk-devargs","0000:01:00.1"]]],["map",[]],["map",[]],["map",[]],"dpdk"],[["uuid","2d8758ca-510f-4399-ab84-8c1f3a33061e"],"down",["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],4,0,0,["set",[]],0,["set",[]],"down",["map",[]],["set",[]],"c2:bd:86:7f:fa:4d",1500,["set",[]],"br0",65534,["set",[]],["map",[]],["map",[]],["map",[["collisions",0],["rx_bytes",0],["rx_crc_err",0],["rx_dropped",2],["rx_errors",0],["rx_frame_err",0],["rx_over_err",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_errors",0],["tx_packets",0]]],["map",[["driver_name","openvswitch"]]],"internal"],[["uuid","d67191aa-bdae-4ed5-95c8-86778276052e"],["set",[]],["map",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],["set",[]],"could not open network device dpdk-p0 (Address family not supported by protocol)",["map",[]],["set",[]],0,0,["set",[]],["set",[]],["set",[]],["set",[]],["map",[]],["set",[]],["set",[]],["set",[]],["set",[]],"dpdk-p0",-1,["set",[]],["map",[["dpdk-devargs","0000:01:00.0"]]],["map",[]],["map",[]],["map",[]],"dpdk"]],"headings":["_uuid","admin_state","bfd","bfd_status","cfm_fault","cfm_fault_status","cfm_flap_count","cfm_health","cfm_mpid","cfm_remote_mpids","cfm_remote_opstate","duplex","error","external_ids","ifindex","ingress_policing_burst","ingress_policing_rate","lacp_current","link_resets","link_speed","link_state","lldp","mac","mac_in_use","mtu","mtu_request","name","ofport","ofport_request","options","other_config","statistics","status","type"]}

Very odd that the project uses "headings" for indexing the data model instead of making it key-value oriented output as would be expected of json. Trying to use this instead of parsing with regex, I get the following that requires some additional processing for values that are empty, as every Interface has an errors key, and if it's blank, the value of that list index is a list that contains the data type, "set", and the empty set, [].

>>> error_index = data["headings"].index("error")
>>> for interface in data["data"]:
...     print(interface[error_index])
... 
could not open network device dpdk-p1 (Address family not supported by protocol)
['set', []]
could not open network device dpdk-p0 (Address family not supported by protocol)

This could certainly be used instead of regex with an if interface[error_index] != list(['set', []]): but I chose the simpler to read regex, and am also hoping to catch errors from 'ovs-vsctl show' that may not be Interface related (though for the current requirement, limiting to checking for Interface errors would suffice).

Do you have advice regarding readability of code vs using something other than regex in a situation like this? The regex, to me, seemed more elegant and readable vs the additional handling of missing "error" index as well as the handling of the ['set', []].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigating all of the other tables, Interfaces is the only table that has the error column, so I'll write this more deterministically and be able to include potentially vital interface information in the notification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize it wasn't a complete implemention (the --format option being missing). I always worry about using regex's on human-readable/consumable output as it is prone to be changed (on a whim sometimes!) and so it can make the code brittle.

I think from your explanations, it's fine to go with regex as a pragmatic solution as long as all error conditions are handled. I'll go back and look again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated with latest commit to use the Interface table json.

charmhelpers/contrib/charmsupport/nrpe.py Show resolved Hide resolved
@afreiberger afreiberger force-pushed the add_ovs_checks branch 3 times, most recently from e688bdc to 76db2be Compare April 22, 2021 21:56
Copy link
Contributor

@jtroup jtroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of trivial feedback; feel free to ignore it all if you like. Just happy to see this check be added.

charmhelpers/contrib/charmsupport/nrpe.py Outdated Show resolved Hide resolved
nrpe.add_check(
shortname='openvswitch',
description='Check Open vSwitch {%s}' % unit_name,
check_cmd='check_openvswitch.py')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about make the name more explicit, e.g. check_ovs_interfaces.py or check_ovs_ifaces.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was definitely feeling this would be the start of something that could be expanded to more checks as we identify them. The interface errors are just MVP for the current need.

charmhelpers/contrib/charmsupport/nrpe.py Outdated Show resolved Hide resolved
def enable_sudo_for_openvswitch_checks():
sudoers_dir = "/etc/sudoers.d"
sudoers_mode = 0o100440
ovs_sudoers_file = "99-check_openvswitch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, suggest check_ovs_interfaces; unless the plan is to expand in the future?

charmhelpers/contrib/openstack/files/check_openvswitch.py Outdated Show resolved Hide resolved
charmhelpers/contrib/openstack/files/check_openvswitch.py Outdated Show resolved Hide resolved
charmhelpers/contrib/openstack/files/check_openvswitch.py Outdated Show resolved Hide resolved
def parse_args(argv=None):
"""Process CLI arguments."""
parser = argparse.ArgumentParser(
prog="check_openvswitch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previous

charmhelpers/contrib/openstack/files/check_openvswitch.py Outdated Show resolved Hide resolved
charmhelpers/contrib/openstack/files/check_openvswitch.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants