New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPE ILO power fix #8822

Merged
merged 1 commit into from Jun 20, 2018

Conversation

Projects
None yet
3 participants
@TomEvin
Contributor

TomEvin commented Jun 15, 2018

cpqHeTemperatureThreshold is '0', which cause alerts. Fixed to '90'. Tested on Proliant Gen8-9.

HPE ILO power fix
cpqHeTemperatureThreshold is '0', which cause alerts. Fixed to '90'. Tested on Proliant Gen8-9.
@laf

This comment has been minimized.

Member

laf commented Jun 15, 2018

Is this not just a bug with the firmware you are running? According to the mib -99 is the unreported value so 0 is genuine or a bug.

@laf laf added the Needs-Info label Jun 15, 2018

@TomEvin

This comment has been minimized.

Contributor

TomEvin commented Jun 15, 2018

Well, we have different Proliant servers, from different time periods, all of them Gen8 and Gen9, with 2 different SPP pack installed, and all of them using the latest or almost the latest firmwares included into SPP pack. If this is a firmware bug (it is possible) it is well spreaded.
For Gen9 servers I can try to install the latest 2018.03 SPP pack, if you want and if we can hold this pull request until next week, but I do not expect any miracle.

@laf

This comment has been minimized.

Member

laf commented Jun 15, 2018

I'm assuming the OID in https://github.com/librenms/librenms/pull/6996/files is what we are already using here. That was your PR - was that going to be wrong as well out of interest?

@TomEvin

This comment has been minimized.

Contributor

TomEvin commented Jun 15, 2018

Since that PR we switched to yaml, and yes, we are using the same OID-s.
In the meantime, I have checked there are updated HPE mib files too. Give me a couple of days, I'll check them too with the latest firmwares, but based on a quick look this cpqHeTemperatureThreshold OID not changed, but maybe there are new OIDs.

@laf

This comment has been minimized.

Member

laf commented Jun 15, 2018

Since that PR we switched to yaml

Indeed but the yaml was taken from that PR code so I expect the OID translates to the one you're proposing to remove.

Have static high / low values isn't ideal as they should be provided by the device. If that doesn't work then static high/low can be ok but we should be sure that it is broke before merging.

@TomEvin

This comment has been minimized.

Contributor

TomEvin commented Jun 15, 2018

Believe me, I totally understand this attitude. I do not like either this solution, but still better than the fake "Sensor over limit" everywhere. Also almost all of the max Temp limit is 100C for the internal components, hopefully this fixed 90 (if really there is no other option) is more than enough.

I'll inform you as soon as I have tested the latest firmware with the latest MIBs, and checked the changes in MIB update.

@TomEvin

This comment has been minimized.

Contributor

TomEvin commented Jun 18, 2018

I udpeted every software and firmware to the current versions, and I have checked the the SNMP with the latest MIBs, powersupply "cpqHeTemperatureThreshold" is still "0".
image
I do not have really any other idea then the fix threshold limit for this. :(

And because this not affected by MIB files, I think I'll update HP MIBs in a different pull request.

@laf

This comment has been minimized.

Member

laf commented Jun 19, 2018

Ok let's go with that then. Can you add additional test data for this device please, we currently don't have any test data for it which means bad changes can break stuff: https://docs.librenms.org/#Developing/os/Test-Units/#example-workflow

@TomEvin

This comment has been minimized.

Contributor

TomEvin commented Jun 19, 2018

./scripts/pre-commit.php --db --snmpsim
"Checked 2681 files in 48.7 seconds
No syntax error found
Tests ok, submit away :)"

Is that enough, or should I copy/paste the snmpres file here?

@murrant

This comment has been minimized.

Member

murrant commented Jun 19, 2018

@TomEvin You need to include the .snmprec and if possible the .json files in this PR

@TomEvin

This comment has been minimized.

Contributor

TomEvin commented Jun 20, 2018

Here is the hpe-ilo.snmprec: https://pastebin.com/mJQmfPXD
.json not generated because of "sh: 1: snmpsimd.py: not found"

When I'll pull a HP MIB update, should I provide a new .snmpres file too?

@laf laf added this to the 1.41 milestone Jun 20, 2018

@laf

This comment has been minimized.

Member

laf commented Jun 20, 2018

I'll put the test data in a separate PR as this was submitted from OPs master branch we can't push to it.

@laf

laf approved these changes Jun 20, 2018

LGTM

@laf laf removed the Needs Tests 🦄 label Jun 20, 2018

@laf laf merged commit 026eda5 into librenms:master Jun 20, 2018

3 checks passed

WIP ready for review
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla All CLA requirements met.

@laf laf referenced this pull request Jun 20, 2018

Merged

Added tests for latest hpe ilo changes #8835

1 of 1 task complete

mattie47 added a commit to mattie47/librenms that referenced this pull request Jul 2, 2018

HPE ILO temperature threshold fix (librenms#8822)
cpqHeTemperatureThreshold is '0', which cause alerts. Fixed to '90'. Tested on Proliant Gen8-9.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 19, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.