Skip to content

SolidFire (NetApp HCI Storage, Element Software) Plugin for Nagios

License

Notifications You must be signed in to change notification settings

scaleoutsean/nagfire

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

SolidFire Plugin for Nagios

Summary

This is a refreshed, Python 3-based version of an old SolidFire plugin for Nagios (originally written in Python 2).

There's a lot of room for improvement because SolidFire has made alot of progress since this plugin was written, but I have no plans to make improvements at this time. The main purpose of this refresh is to provide a starting point to Nagios users who have NetApp HCI or SolidFire and NetApp partners who serve them.

If you're interested in other monitoring integrations for NetApp SolidFire or HCI please check out the monitoring section of awesome-solidfire.

Instructions

  • Run it with python3: ./checkSolidFire.py (MVIP|MIP) PORT USERNAME PASSWORD (MVIP|NODE)
  • Positional arguments:
    • MVIP or MIP: if checking cluster, provide Management Virtual IP, if node, provide node Management IP
    • PORT: 443 for SolidFire cluster, 442 for node
    • USERNAME: cluster admin with reporting (or better) role. It is recommended to create a read-only reporting role for this plugin.
    • PASSWORD: admin password
    • MVIP or NODE: "mvip" for SolidFire cluster, "node" for individual node

Requirements

  • Element OS v11+ (NetApp HCI, SolidFire, eSDS)
  • Python 3.6+

Known Issues and Workarounds

  • Usability
    • Maximum iSCSI Session Count used to be calculated as NumberOfEnsembleNodes * 700 * 90%. Since Nagfire v2.1, that was changed to NumberOfActiveNodesWithStorageRole -1) * 700 * 90%. In Nagfire v2.2, the maximum number of iSCSI connections per node has increased from 700 to 1,000 (SolidFire 11.8 and later) as that is the maximum reported by the API (GetLimits).
    • In SolidFire version 11.8 and newer, two storage node SolidFire clusters are available in NetApp HCI. They still have 1-3 Ensemble Nodes, but there are only two storage nodes, so only one may be able to provide iSCSI connections (if one storage node fails). But if a storage node fails, the user will get a ton of various alerts and warnings anyway, so maxing out the number of iSCSI connections probably won't be a concern
    • One node is deducted from active node count so that in the case a node fails, your cluster doesn't end up maxed out in terms of iSCSI sessions, which is probably preferred to old behavior.
    • If you prefer the old behavior or if v2.1 gives you any other problems, please continue to use v2.0 or edit the script
  • Security: HTTPS certificate validation is disabled. You may edit that out if you need validation to work.
  • Security: since SolidFire v12 you can create a read-only cluster admin role. It is strongly recommended to create a 'nagios' account for this application (with only Read role) on SolidFire cluster

Sample CLI Output

Executed on Ubuntu 18.04 with Python 3.6.9+ (separate Element Demo VMs, which are single node SolidFire clusters)

SolidFire version 11.7 (cluster, node)

$ python3 checkSolidFire.py 192.168.1.34 443 admin admin mvip
+---------------------------------------------------------------+
| SolidFire Monitoring Plugin v2.2 2021/05/10                   |
+---------------------------------------------------------------+
| Cluster                        | 192.168.1.34                 |
| Version                        | 11.7.0.76                    |
| Disk Activity                  | No*                          |
| Read Bytes                     | 8770649600                   |
| Write Bytes                    | 13760053248                  |
| Utilization %                  | 0.0                          |
| iSCSI Sessions                 | 0                            |
| Cluster Faults                 | None                         |
| Cluster Name                   | DR                           |
| Ensemble Members               | [192.168.103.33]             |
| Execution Time                 | Mon May 10 12:14:17 2021     |
| Exit State                     | *Warning                     |
+---------------------------------------------------------------+

$ python3 checkSolidFire.py 192.168.1.33 442 admin admin node
+---------------------------------------------------------------+
| SolidFire Monitoring Plugin v2.2 2021/05/10                   |
+---------------------------------------------------------------+
| Node Status                    | Active                       |
| Cluster Name                   | DR                           |
| MVIP                           | 192.168.1.34                 |
| Execution Time                 | Mon May 10 12:14:23 2021     |
| Exit State                     | OK                           |
+---------------------------------------------------------------+

SolidFire version 12.2 (cluster, node)

$ python3 checkSolidFire.py 192.168.1.30 443 admin admin mvip
+---------------------------------------------------------------+
| SolidFire Monitoring Plugin v2.2 2021/05/10                   |
+---------------------------------------------------------------+
| Cluster                        | 192.168.1.30                 |
| Version                        | 12.2.0.777                   |
| Disk Activity                  | No*                          |
| Read Bytes                     | 288337262592                 |
| Write Bytes                    | 315545249280                 |
| Utilization %                  | 0.0                          |
| iSCSI Sessions                 | 3*                           |
| Cluster Faults                 | 2021-04-06T03:44:49 The sum  |
|                                | of all minimum QoS IOPS      |
|                                | (6600) is greater than the   |
|                                | expected IOPS (3000) of the  |
|                                | cluster. The minimum QoS can |
|                                | not be maintained for all    |
|                                | volumes simultaneously in    |
|                                | this condition. Adjust QoS   |
|                                | settings on one or more      |
|                                | volumes to not exceed        |
|                                | available cluster IOPS.*     |
| Cluster Name                   | PROD                         |
| Ensemble Members               | [192.168.103.29]             |
| Execution Time                 | Mon May 10 12:14:29 2021     |
| Exit State                     | *Critical                    |
+---------------------------------------------------------------+

$ python3 checkSolidFire.py 192.168.1.29 442 admin admin node
+---------------------------------------------------------------+
| SolidFire Monitoring Plugin v2.2 2021/05/10                   |
+---------------------------------------------------------------+
| Node Status                    | Active                       |
| Cluster Name                   | PROD                         |
| MVIP                           | 192.168.1.30                 |
| Execution Time                 | Mon May 10 12:14:33 2021     |
| Exit State                     | OK                           |
+---------------------------------------------------------------+

Change Log

  • v2.2 (2021/05/10)

    • Do not count v12.0+ nodes with Maintenance Mode Enabled towards iSCSI connection maximum
  • v2.1 (2020/09/26)

    • Change formula for max iSCSI sessions to increase HA and reliability for 2-10 node clusters
  • v2.0 (2020/01/31)

    • Fix node checks
    • Silence shell warnings from unverified HTTPS connections
    • Tiny formatting change for console output
  • v2.0b (2020/01/31)

    • Port v1.17 to Python 3
    • Replace urllib with requests
    • Use Element API endpoint v11
    • Lower maximum session count (needs validation, may need to be increased for larger clusters)

License and Trademarks

See the LICENSE file.

NetApp, SolidFire, and the marks listed at www.netapp.com/TM are trademarks of NetApp, Inc.