[bug] Recent major Proxmox update has messed up voltage + fan sensor info #6044

Closed
BloodyIron opened this Issue Mar 1, 2017 · 19 comments

Comments

Projects
None yet
4 participants
@BloodyIron

BloodyIron commented Mar 1, 2017

  • [ YES (as of yesterday) ] Is your install up to date? Updating your install
    Please do not submit an issue if your install is not up to date within the last 24 hours or on a stable monthly release.
  • [ X ] Please include all of the information between the ==================================== section of ./validate.php which you can run from the cli.
Component Version
LibreNMS db58fea
DB Schema 171
PHP 5.5.9-1ubuntu4.21
MySQL 5.5.54-0ubuntu0.14.04.1
RRDTool 1.4.7
SNMP NET-SNMP 5.7.2

I updated my Proxmox cluster about a week ago, and now I'm getting really inconsistent SNMP stats from them for RPM for the CPU fan, and voltages.

For example, the vcore voltage reports the 3.3v reading, the 3.3v reports the 5v reading. None of the voltage (3.3/5/12/vcore) stats have the correct info.

The CPU Fan speed reports 0.

Running a poller debug shows the same values I'm seeing, that the voltages are "misaligned" and not seeing values for RPMs.

These are the package versions proxmox reports having on its end, perhaps this will help:

proxmox-ve: 4.4-82 (running kernel: 4.4.40-1-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-109
pve-firmware: 1.1-10
libpve-common-perl: 4.0-92
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-94
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-3
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1pve80
zfsutils: 0.6.5.9-pve15
bpo80

I'm unsure what I can do about this.

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 1, 2017

Okay now I am really confused, one node is giving accurate values, the other isn't. The one that looks to be showing accurate values, I get CPU fan speed, and the voltages appear to be in the right categories.

The two nodes are literally identical hardware and the software SHOULD be setup identically. I am exceptionally confused now...

Okay now I am really confused, one node is giving accurate values, the other isn't. The one that looks to be showing accurate values, I get CPU fan speed, and the voltages appear to be in the right categories.

The two nodes are literally identical hardware and the software SHOULD be setup identically. I am exceptionally confused now...

@sorano

This comment has been minimized.

Show comment
Hide comment
@sorano

sorano Mar 1, 2017

Contributor

Wouldn't this be an issue with that specific Proxmox installation rather than LibreNMS? In any case your issue is missing the information from step 3.

Contributor

sorano commented Mar 1, 2017

Wouldn't this be an issue with that specific Proxmox installation rather than LibreNMS? In any case your issue is missing the information from step 3.

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 1, 2017

You're probably right, I was just wondering if there was something that might be relevant on the LibreNMS end. I have since opened a ticket with Proxmox VE peeps.

You're probably right, I was just wondering if there was something that might be relevant on the LibreNMS end. I have since opened a ticket with Proxmox VE peeps.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 1, 2017

Member

Please update and test again.

Member

laf commented Mar 1, 2017

Please update and test again.

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 2, 2017

I'm getting the correct values on what were the problematic readings. However now there are duplications of the following sensor elements:

  • Chassis Fan Speed
  • temp1
  • +12V Voltage

Odd, since only some of the elements are duplicated, others aren't. This is on the build 87a5046 and then I rediscovered the device.

I'm getting the correct values on what were the problematic readings. However now there are duplications of the following sensor elements:

  • Chassis Fan Speed
  • temp1
  • +12V Voltage

Odd, since only some of the elements are duplicated, others aren't. This is on the build 87a5046 and then I rediscovered the device.

@laf laf reopened this Mar 2, 2017

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 2, 2017

Member

We need updated discovery and poller debugs

Member

laf commented Mar 2, 2017

We need updated discovery and poller debugs

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 3, 2017

Discovery paste : http://pastebin.com/4jPnTvRg
Poller paste : http://pastebin.com/nyYuq5kp

Naturally both sanitised, with 2 week expiry ;)

Discovery paste : http://pastebin.com/4jPnTvRg
Poller paste : http://pastebin.com/nyYuq5kp

Naturally both sanitised, with 2 week expiry ;)

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 3, 2017

Member

@BloodyIron PR submitted to fix this, would you mind testing.

Member

laf commented Mar 3, 2017

@BloodyIron PR submitted to fix this, would you mind testing.

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 3, 2017

Ran daily.sh and was given build 41824dd, rediscovered the device, the duplicates are still there. Should I do something differently?

Ran daily.sh and was given build 41824dd, rediscovered the device, the duplicates are still there. Should I do something differently?

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 4, 2017

In the scope of /opt/librenms, I ran "./scripts/github-apply 6079".

It gave me the output:
"
Checking patch includes/discovery/functions.inc.php...
Applied patch includes/discovery/functions.inc.php cleanly.
"

I then restarted the web host, and rediscovered the device. But the duplicate, blank, sensors are still there. Did I not do it correctly?

Also, I took a backup before applying the patch, and I'm now reverting to that backup with the eye to keep the code consistent.

BloodyIron commented Mar 4, 2017

In the scope of /opt/librenms, I ran "./scripts/github-apply 6079".

It gave me the output:
"
Checking patch includes/discovery/functions.inc.php...
Applied patch includes/discovery/functions.inc.php cleanly.
"

I then restarted the web host, and rediscovered the device. But the duplicate, blank, sensors are still there. Did I not do it correctly?

Also, I took a backup before applying the patch, and I'm now reverting to that backup with the eye to keep the code consistent.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 4, 2017

Member

please provide the output of:

SELECT * FROM sensors where device_id=X;

Member

laf commented Mar 4, 2017

please provide the output of:

SELECT * FROM sensors where device_id=X;

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 6, 2017

Yuck, not sure if this is going to parse well for you : http://pastebin.com/9iqPd0A1

Also, 2w expiry on these.

Also, kinda neat that SNMP can pull sensor data from "lmsensors" without lm-sensors being installed on the target ;D

BloodyIron commented Mar 6, 2017

Yuck, not sure if this is going to parse well for you : http://pastebin.com/9iqPd0A1

Also, 2w expiry on these.

Also, kinda neat that SNMP can pull sensor data from "lmsensors" without lm-sensors being installed on the target ;D

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 6, 2017

Member

Those are all distinct OIDs so the patch won't have removed them because they aren't duplicates.

However can you provide the output of ./discovery.php -h HOSTNAME -d -m sensors

Member

laf commented Mar 6, 2017

Those are all distinct OIDs so the patch won't have removed them because they aren't duplicates.

However can you provide the output of ./discovery.php -h HOSTNAME -d -m sensors

@laf laf added the Needs-Info label Mar 6, 2017

@laf laf closed this in #6079 Mar 6, 2017

laf added a commit that referenced this issue Mar 6, 2017

@laf laf reopened this Mar 6, 2017

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 7, 2017

These are the results from the discovery command you outlined above : http://pastebin.com/J5Cx5SPK

Considering they are identical systems, it's weird that this system is showing duplicate sensors, and only recently. Hmmm.

These are the results from the discovery command you outlined above : http://pastebin.com/J5Cx5SPK

Considering they are identical systems, it's weird that this system is showing duplicate sensors, and only recently. Hmmm.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 7, 2017

Member

I can't tell you why it's doing it. Can't see any issues, I'd suggest just deleting the sensors with 0 value.

Member

laf commented Mar 7, 2017

I can't tell you why it's doing it. Can't see any issues, I'd suggest just deleting the sensors with 0 value.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Mar 12, 2017

Member

Please update and try again, another pull request to potentially fix this was merged

Member

laf commented Mar 12, 2017

Please update and try again, another pull request to potentially fix this was merged

@BloodyIron

This comment has been minimized.

Show comment
Hide comment
@BloodyIron

BloodyIron Mar 13, 2017

Went to update daily to 54deb1d , then hit rediscover on device, then went to check edit health and found that the dupes were already gone!

I took maybe 10 seconds or less to go from rediscover to the health page edit, so I suspect the dupes were already removed before the update and the rediscover, but I'm not sure when.

As it looks right now, the values look to be correctly aligned on both nodes. Thanks! :)

Went to update daily to 54deb1d , then hit rediscover on device, then went to check edit health and found that the dupes were already gone!

I took maybe 10 seconds or less to go from rediscover to the health page edit, so I suspect the dupes were already removed before the update and the rediscover, but I'm not sure when.

As it looks right now, the values look to be correctly aligned on both nodes. Thanks! :)

@murrant

This comment has been minimized.

Show comment
Hide comment
@murrant

murrant Mar 13, 2017

Member

This was likely fixed by #6169

Thanks :)

Member

murrant commented Mar 13, 2017

This was likely fixed by #6169

Thanks :)

@murrant murrant closed this Mar 13, 2017

@lock

This comment has been minimized.

Show comment
Hide comment
@lock

lock bot May 17, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed.

lock bot commented May 17, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed.

@librenms librenms locked as resolved and limited conversation to collaborators May 17, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.