New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device State Monitoring #2694
Comments
Good thoughts, and we need to discuss more. |
We probably should just merge this into #1236. |
Possible returned states: Dell iDrac Powersupply
Cisco 4500X
Generic LibreNMS State:
The mapping could look like: Dell iDrac Powersupply
Cisco 4500X
Or is this totally off? |
I like the OK, warning, critical, unknown set, as it matches exactly what Nagios checks use, which is a very commonly understood paradigm. In fact, making them use the same values as Nagios (0-3, if I recall correctly) would make a lot of sense to me. |
@paulgear Proposed Table Layout:
|
I think trying to squeeze all states into just three types will be limiting. Can you not look that info up from the MIB to convert the state into human readable value? |
I think that if we're trying to provide a framework which allows intuitive colouring of components and alerting, we have to keep it generic and simple, and the Nagios 0-3 OK-Unknown levels are something that pretty much anyone who has worked with monitoring systems will understand. https://nagios-plugins.org/doc/guidelines.html#AEN78 |
Think I discussed this with @laf on IRC afterwards. |
I like the idea of mapping to nagios style statuses in the format "{librenms_state}: {extended_state}" e.g. "CRITICAL: Cache module critical failure" I'd like to discuss the relationship between the sensors table and this new table. I don't think we can use sensor_index or sensor_oid so we may need a junction table. Nothing wrong with that I guess that means we need to make the connection at discovery time thus we'll need to update everything in includes/discovery/states/*.inc.php. That looks to be manageable at this stage. |
Can we add some more Status checks to the Dell iDrac raid controller? OMSA_Storage_Disk_2 OMSA_Storage_Disk_3 It should respond with the same level of: |
This thread has been automatically locked since there has not been any recent activity after it was closed. |
This is a issue to discuss the implementation of a proper device state system.
Here are some of my notes so far:
The idea is to improve and possibly overhaul the current (almost non-existing) state monitoring.
There is quite a few people missing this feature.
We should have generic way to store & fetch these info.
We should have the ability to make custom value translations, since data presented through SNMP does not always make sense. For example @SaaldjorMike & I have some Dell iDrac devices, for their raid state they report back "1,2,3,4,5,6" which actually translates to "1=Other, 2=Unknown, 3=OK, 4=Non-critical, 5=Critical, 6=Non-recoverable". Devices could also just report back the actual state as a string.
I can't agree with myself wether we should make static severity levels "Information, Warning, Disaster etc." or we if we just should preserve and pass on these custom ones from the value translation part..
Tagging #1365 #1236
The text was updated successfully, but these errors were encountered: