New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing HP Proliant state sensors #8315

Merged
merged 3 commits into from Mar 19, 2018

Conversation

Projects
None yet
7 participants
@mikenowak
Contributor

mikenowak commented Mar 1, 2018

The support to HP Proliant disk discovery was originally introduced at #6124.

However that code contains a few errors that prevent it from doing what it was intended to do.

  1. The MIB that contains cpqDaPhyDrvStatus and cpqDaPhyDrvSmartStatus is CPQIDA-MIB, not
    CPQSINFO-MIB.

  2. The OID in the code that iterates the array in order to find the disk number should drop the last digit (.1) as that indicates the disk in position (1) and we end up missing the Disk # as follows:

Sensor Added: state cpqDaPhyDrvSmartStatus 10.0 Disk #
Sensor Added: state cpqDaPhyDrvStatus 00.0 Disk #

Having made the changes described herein, the following is seen in the recent events

State sensor Disk # has changed from (0) to (2)
State sensor Disk # has changed from (0) to (2)
Sensor Updated: state cpqDaPhyDrvSmartStatus 10.0 Disk #1
Sensor Updated: state cpqDaPhyDrvStatus 00.0 Disk #1

The only other issue that I can name is that the states are not expanded properly in the UI, i.e. 2 should equal to OK.

DO NOT DELETE THIS TEXT

Please note

Please read this information carefully. You can run ./scripts/pre-commit.php to check your code before submitting.

Testers

If you would like to test this pull request then please run: ./scripts/github-apply <pr_id>, i.e ./scripts/github-apply 5926

@CLAassistant

This comment has been minimized.

CLAassistant commented Mar 1, 2018

CLA assistant check
All committers have signed the CLA.

@laf

This comment has been minimized.

Member

laf commented Mar 1, 2018

Thanks for the fix :)

I think if you change:

create_sensor_to_state_index($device, $state_name, $index);

to:

create_sensor_to_state_index($device, $state_name, $x . $index);

It should fix the display issue, give it a try and update this PR.

@laf laf added the Bug 🐞 label Mar 1, 2018

@mikenowak

This comment has been minimized.

Contributor

mikenowak commented Mar 2, 2018

@laf,

thanks for that - I've added the change suggested by you and now the states are represented by nice bootstrap style buttons :)

I've also took this opportunity to extend this code and implement the following:

  1. Added support for querying:
  • raid ctrl accelerator status
  • power supply status
  • system fan status
  1. Set the labels for the above to CTRL # Status, PSU # Status, FAN # Status, and I've renamed Disk # to HDD #. Shall there be a naming convention in place that I am not aware of please let me know.

  2. Added checks for PSU and FAN to ensure that they are truly installed $temp[$index][$tablevalue[5]] == 'present', so that we wont alert if the component is not present due to such and such system configuration rather than failure.

  3. Set state_draw_graph to 0 in all instances as I understand this would prevent the graphs from being created for these states, as these would be meaningless in my opinion. Please correct me if I am missing something here.

  4. Last but not least, removed the SMART check as I am not getting anything out of that OID on my test system and based on my experience the SMART errors would result in cpqDaPhyDrvStatus becoming predictiveFailure.

Please let me know if you need anything else.

@f0o

This comment has been minimized.

Member

f0o commented Mar 2, 2018

I think travis died on this one

@laf

This comment has been minimized.

Member

laf commented Mar 2, 2018

Set state_draw_graph to 0 in all instances as I understand this would prevent the graphs from being created for these states, as these would be meaningless in my opinion. Please correct me if I am missing something here.

Well I'd say graphs are still useful. They show the historic state of the sensors over time which can be useful

Last but not least, removed the SMART check as I am not getting anything out of that OID on my test system and based on my experience the SMART errors would result in cpqDaPhyDrvStatus becoming predictiveFailure.

I think it might be better to leave them in.

With this amount of changes, can you submit test data as well so we can ensure this code works going forward. https://docs.librenms.org/#Developing/os/Test-Units/#example-workflow

@Rosiak

This comment has been minimized.

Contributor

Rosiak commented Mar 2, 2018

Just want to add in that state_draw_graph to 0 doesn't make any difference right now. Code to enforce this was never implemented, but hopefully will be with time.

@mikenowak

This comment has been minimized.

Contributor

mikenowak commented Mar 3, 2018

I think travis died on this one

Wrong number of spaces, should be good now

Well I'd say graphs are still useful. They show the historic state of the sensors over time which can be useful

Just want to add in that state_draw_graph to 0 doesn't make any difference right now. Code to enforce this was never implemented, but hopefully will be with time.

OK, thanks for confirming - this is now reverted to original value of 1.

I think it might be better to leave them in.

OK, I'll try to do something smart here, since both of these (disk status and SMART status) come from the same snmp table.

With this amount of changes, can you submit test data as well so we can ensure this code works going forward. https://docs.librenms.org/#Developing/os/Test-Units/#example-workflow

OK, will allocate time to this next week.

@mikenowak

This comment has been minimized.

Contributor

mikenowak commented Mar 3, 2018

Slight correction regarding the SMART status.

I just checked ILO 3 and 4 on Proliants G7, Gen8 and Gen9 and I dont see this exposed anywhere.

So my current thinking is that it isn't really needed unless somebody can prove otherwise please.

@laf

This comment has been minimized.

Member

laf commented Mar 9, 2018

Here's some test data someone else has previously submitted which has the OID for the smart status:

.1.3.6.1.4.1.232.3.2.5.1.1.57.1.0 = INTEGER: 2
.1.3.6.1.4.1.232.3.2.5.1.1.57.1.1 = INTEGER: 2
@murrant

This comment has been minimized.

Member

murrant commented Mar 17, 2018

Unless you update to the state code to the new format, you can't change existing state translations.

https://docs.librenms.org/Developing/Sensor-State-Support/

Numeric state data (-e)
Update to new state style
This currently cannot be tested. We don't discover sensors until the second discovery, because we poll 'hardware' instead of discover it.
@murrant

This comment has been minimized.

Member

murrant commented Mar 18, 2018

This currently cannot be tested. We don't discover sensors until the second discovery, because we poll 'hardware' instead of discover it.

@scrutinizer-notifier

This comment has been minimized.

scrutinizer-notifier commented Mar 19, 2018

The inspection completed: No new issues

@murrant murrant merged commit a83ebdc into librenms:master Mar 19, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@mikenowak

This comment has been minimized.

Contributor

mikenowak commented Mar 19, 2018

@murrant, ok thanks - so I am guessing (haven't tested yet) that, this basically fixes the original problem whereby disk checks were non functioning, but at the same time removes the FAN, PSU and Raid CTRL checks that I added as part of this PR.

How do you want me to progress with adding these changes?

@murrant

This comment has been minimized.

Member

murrant commented Mar 19, 2018

@mikenowak If they weren't in this, then you will need to open a new pull request.

inetAnt added a commit to criteo-forks/librenms that referenced this pull request Mar 19, 2018

Fixing HP Proliant state sensors (librenms#8315)
* Fixing HP Proliant state sensors

* Numeric state data (-e)

Update to new state style
This currently cannot be tested. We don't discover sensors until the second discovery, because we poll 'hardware' instead of discover it.

* Fix whitespace
@mikenowak

This comment has been minimized.

Contributor

mikenowak commented Mar 20, 2018

@murrant, ok fair enough.

one problem with this code thought - the $drive_bay isn't properly expanded so I am seeing them as follows on a system with 4 disks

Drive S.M.A.R.T. |   | ok
Drive S.M.A.R.T. |   | ok
Drive S.M.A.R.T. |   | ok
Drive S.M.A.R.T. |   | ok
Drive Status |   | ok
Drive Status |   | ok
Drive Status |   | ok
Drive Status |   | ok

While I was expecting Drive 1,2,3,4,what not instead.

@murrant

This comment has been minimized.

Member

murrant commented Mar 21, 2018

That's odd it works on my system. Can you please stop by discord or community sometime to troubleshoot.

@mikenowak mikenowak deleted the mikenowak:mikenowak-proliant-state-mib branch Mar 30, 2018

@mikenowak mikenowak restored the mikenowak:mikenowak-proliant-state-mib branch Mar 30, 2018

@lock lock bot locked as resolved and limited conversation to collaborators May 29, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.