Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Recent major Proxmox update has messed up voltage + fan sensor info #6044

Closed
BloodyIron opened this issue Mar 1, 2017 · 19 comments · Fixed by #6079
Closed

[bug] Recent major Proxmox update has messed up voltage + fan sensor info #6044

BloodyIron opened this issue Mar 1, 2017 · 19 comments · Fixed by #6079

Comments

@BloodyIron
Copy link

BloodyIron commented Mar 1, 2017

  • [ YES (as of yesterday) ] Is your install up to date? Updating your install
    Please do not submit an issue if your install is not up to date within the last 24 hours or on a stable monthly release.
  • [ X ] Please include all of the information between the ==================================== section of ./validate.php which you can run from the cli.
Component Version
LibreNMS db58fea
DB Schema 171
PHP 5.5.9-1ubuntu4.21
MySQL 5.5.54-0ubuntu0.14.04.1
RRDTool 1.4.7
SNMP NET-SNMP 5.7.2

I updated my Proxmox cluster about a week ago, and now I'm getting really inconsistent SNMP stats from them for RPM for the CPU fan, and voltages.

For example, the vcore voltage reports the 3.3v reading, the 3.3v reports the 5v reading. None of the voltage (3.3/5/12/vcore) stats have the correct info.

The CPU Fan speed reports 0.

Running a poller debug shows the same values I'm seeing, that the voltages are "misaligned" and not seeing values for RPMs.

These are the package versions proxmox reports having on its end, perhaps this will help:

proxmox-ve: 4.4-82 (running kernel: 4.4.40-1-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-109
pve-firmware: 1.1-10
libpve-common-perl: 4.0-92
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-94
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-3
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1pve80
zfsutils: 0.6.5.9-pve15
bpo80

I'm unsure what I can do about this.

@BloodyIron
Copy link
Author

Okay now I am really confused, one node is giving accurate values, the other isn't. The one that looks to be showing accurate values, I get CPU fan speed, and the voltages appear to be in the right categories.

The two nodes are literally identical hardware and the software SHOULD be setup identically. I am exceptionally confused now...

@sorano
Copy link
Contributor

sorano commented Mar 1, 2017

Wouldn't this be an issue with that specific Proxmox installation rather than LibreNMS? In any case your issue is missing the information from step 3.

@BloodyIron
Copy link
Author

You're probably right, I was just wondering if there was something that might be relevant on the LibreNMS end. I have since opened a ticket with Proxmox VE peeps.

@laf
Copy link
Member

laf commented Mar 1, 2017

Please update and test again.

@BloodyIron
Copy link
Author

I'm getting the correct values on what were the problematic readings. However now there are duplications of the following sensor elements:

  • Chassis Fan Speed
  • temp1
  • +12V Voltage

Odd, since only some of the elements are duplicated, others aren't. This is on the build 87a5046 and then I rediscovered the device.

@laf laf reopened this Mar 2, 2017
@laf
Copy link
Member

laf commented Mar 2, 2017

We need updated discovery and poller debugs

@BloodyIron
Copy link
Author

Discovery paste : http://pastebin.com/4jPnTvRg
Poller paste : http://pastebin.com/nyYuq5kp

Naturally both sanitised, with 2 week expiry ;)

@laf
Copy link
Member

laf commented Mar 3, 2017

@BloodyIron PR submitted to fix this, would you mind testing.

@BloodyIron
Copy link
Author

Ran daily.sh and was given build 41824dd, rediscovered the device, the duplicates are still there. Should I do something differently?

@BloodyIron
Copy link
Author

BloodyIron commented Mar 4, 2017

In the scope of /opt/librenms, I ran "./scripts/github-apply 6079".

It gave me the output:
"
Checking patch includes/discovery/functions.inc.php...
Applied patch includes/discovery/functions.inc.php cleanly.
"

I then restarted the web host, and rediscovered the device. But the duplicate, blank, sensors are still there. Did I not do it correctly?

Also, I took a backup before applying the patch, and I'm now reverting to that backup with the eye to keep the code consistent.

@laf
Copy link
Member

laf commented Mar 4, 2017

please provide the output of:

SELECT * FROM sensors where device_id=X;

@BloodyIron
Copy link
Author

BloodyIron commented Mar 6, 2017

Yuck, not sure if this is going to parse well for you : http://pastebin.com/9iqPd0A1

Also, 2w expiry on these.

Also, kinda neat that SNMP can pull sensor data from "lmsensors" without lm-sensors being installed on the target ;D

@laf
Copy link
Member

laf commented Mar 6, 2017

Those are all distinct OIDs so the patch won't have removed them because they aren't duplicates.

However can you provide the output of ./discovery.php -h HOSTNAME -d -m sensors

@laf laf added the Needs-Info label Mar 6, 2017
@laf laf closed this as completed in #6079 Mar 6, 2017
laf added a commit that referenced this issue Mar 6, 2017
@laf laf reopened this Mar 6, 2017
@BloodyIron
Copy link
Author

These are the results from the discovery command you outlined above : http://pastebin.com/J5Cx5SPK

Considering they are identical systems, it's weird that this system is showing duplicate sensors, and only recently. Hmmm.

@laf
Copy link
Member

laf commented Mar 7, 2017

I can't tell you why it's doing it. Can't see any issues, I'd suggest just deleting the sensors with 0 value.

@laf
Copy link
Member

laf commented Mar 12, 2017

Please update and try again, another pull request to potentially fix this was merged

@BloodyIron
Copy link
Author

Went to update daily to 54deb1d , then hit rediscover on device, then went to check edit health and found that the dupes were already gone!

I took maybe 10 seconds or less to go from rediscover to the health page edit, so I suspect the dupes were already removed before the update and the rediscover, but I'm not sure when.

As it looks right now, the values look to be correctly aligned on both nodes. Thanks! :)

@murrant
Copy link
Member

murrant commented Mar 13, 2017

This was likely fixed by #6169

Thanks :)

@murrant murrant closed this as completed Mar 13, 2017
@lock
Copy link

lock bot commented May 17, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed.

@lock lock bot locked as resolved and limited conversation to collaborators May 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants