Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rapl broken in newer kernels #1892

Closed
lheckemann opened this issue Nov 19, 2020 · 7 comments · Fixed by #2092
Closed

rapl broken in newer kernels #1892

lheckemann opened this issue Nov 19, 2020 · 7 comments · Fixed by #2092

Comments

@lheckemann
Copy link

Host operating system: output of uname -a

Linux localhost 5.4.77 #1-NixOS SMP Tue Nov 10 20:13:20 UTC 2020 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.0.1 (branch: , revision: v1.0.1)
  build user:
  build date:
  go version:       go1.15.5

node_exporter command line flags

/nix/store/wmjjwpn7h2y5vpms5x2zf9qg9fp60d8c-node_exporter-1.0.1/bin/node_exporter --collector.arp --collector.bcache --collector.conntrack --collector.filefd --collector.logind --collector.netclass --collector.netdev --collector.netstat --collector.rapl --collector.sockstat --collector.softnet --collector.stat --collector.systemd --collector.textfile --collector.textfile.directory /run/prometheus-node-exporter --collector.thermal_zone --collector.time --collector.udp_queues --collector.uname --collector.vmstat --collector.cpu --collector.cpufreq --collector.diskstats --collector.edac --collector.entropy --collector.filesystem --collector.hwmon --collector.interrupts --collector.ksmd --collector.loadavg --collector.meminfo --collector.pressure --collector.timex --collector.zfs --web.listen-address 0.0.0.0:9100 --collector.disable-defaults

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Upgraded the kernel to 5.4.77 and rebooted.

What did you expect to see?

No errors.

What did you see instead?

Nov 19 14:47:21 localhost node_exporter[3135]: level=error ts=2020-11-19T14:47:21.869Z caller=collector.go:161 msg="collector failed" name=rapl duration_seconds=0.000307877 err="open /sys/class/powercap/intel-rapl:0/energy_uj: permission denied"

This seems to be caused by https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.4.77&id=19f6d91bdad42200aac557a683c17b1f65ee6c94

@discordianfish
Copy link
Member

Yeah we should probably skip this silently if we can't read it.

@uniemimu
Copy link
Contributor

For these energy counters to become readable again, it would require kernel side work and probably much less accuracy from the counters so that security would not be compromised. I have not heard of plans to implement such less-accurate versions, but you never know, somebody might do it. At the current level of accuracy these will remain root-only.

@kmille
Copy link

kmille commented Dec 28, 2020

Same problem on an up-to-date Debian 10:

root@buster:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

root@buster:~# node_exporter --version
node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f)
  build user:       root@1f76dbbcfa55
  build date:       20200616-12:44:12
  go version:       go1.14.4

root@buster:~# journalctl -u  node_exporter.service  | grep err | head -n 1
Dec 26 21:05:00 buster node_exporter[15560]: level=error ts=2020-12-26T20:05:00.257Z caller=collector.go:161 msg="collector failed" name=rapl duration_seconds=0.006197907 err="open /sys/class/powercap/intel-rapl:0/energy_uj: permission denied"

root@buster:~# uname -a
Linux buster 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux

Is there a way to disable the rapl metric?
EDIT: yes you can with --no-collector.rapl as command line parameter.

@sammko
Copy link

sammko commented Dec 28, 2020

Thank you for finding the workaround, @kmille. Using the archlinux package, just put NODE_EXPORTER_ARGS="--no-collector.rapl" into /etc/conf.d/prometheus-node-exporter.

@pboiseau
Copy link

Any solution if we want to keep collecting rapl power consumption metrics ?

@dswarbrick
Copy link
Contributor

You should be able to use sysfstools to change the permissions of the rapl sysfs entries upon boot. For example, add something like this to /etc/sysfs.conf:

mode class/powercap/intel-rapl:0/energy_uj = 0444

wagdav added a commit to wagdav/homelab that referenced this issue Jul 7, 2021
With newer kernels the the Intel Running Average Power Limit (RAPL)
collector fails with the error message:

    open /sys/class/powercap/intel-rapl:0/energy_uj: permission denied

Disable this collector for now.

See prometheus/node_exporter#1892
@Jsalas424
Copy link

I'm having this same issue #2090

SuperQ added a commit that referenced this issue Jul 21, 2021
Capture permission denied error for "energy_uj" file.

Fixes: #1892

Signed-off-by: Ben Kochie <superq@gmail.com>
SuperQ added a commit that referenced this issue Jul 23, 2021
Capture permission denied error for "energy_uj" file.

Fixes: #1892

Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this issue Apr 9, 2024
Capture permission denied error for "energy_uj" file.

Fixes: prometheus#1892

Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this issue Apr 9, 2024
Capture permission denied error for "energy_uj" file.

Fixes: prometheus#1892

Signed-off-by: Ben Kochie <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants