Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpufreq collector panic on hosts with offline cpus #2746

Closed
chris-b2c2 opened this issue Jul 15, 2023 · 14 comments
Closed

cpufreq collector panic on hosts with offline cpus #2746

chris-b2c2 opened this issue Jul 15, 2023 · 14 comments
Labels

Comments

@chris-b2c2
Copy link

Host operating system: output of uname -a

Linux <hostname> 5.4.0-1080-aws #87~18.04.1-Ubuntu SMP Fri Jun 10 18:32:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.6.0 (branch: HEAD, revision: ff7f9d69b645cb691dd3e84dc3afc88f5c006962)
  build user:       root@f9c3ed0cfbd3
  build date:       20230527-12:03:54
  go version:       go1.20.4
  platform:         linux/amd64
  tags:             netgo osusergo static_build

node_exporter command line flags

  --collector.systemd \
  --collector.systemd.enable-restarts-metrics \
  --collector.systemd.enable-start-time-metrics \
  --collector.systemd.unit-include=.*.service \
  --collector.systemd.unit-exclude=(fwupd-refresh).service \
  --collector.textfile.directory=/tmp/textfile_collector

node_exporter log output

node_exporter[80979]: panic: runtime error: slice bounds out of range [73:72]
node_exporter[80979]: goroutine 85 [running]:
node_exporter[80979]: github.com/prometheus/procfs/sysfs.filterOfflineCPUs(0xc00031f800?, 0xc0001dfc08)
node_exporter[80979]:         /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:181 +0x23a
node_exporter[80979]: github.com/prometheus/procfs/sysfs.FS.SystemCpufreq({{0xbca467?, 0x4?}})
node_exporter[80979]:         /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:209 +0x28b
node_exporter[80979]: github.com/prometheus/node_exporter/collector.(*cpuFreqCollector).Update(0x0?, 0x0?)
node_exporter[80979]:         /app/collector/cpufreq_linux.go:51 +0x45
node_exporter[80979]: github.com/prometheus/node_exporter/collector.execute({0xbcd885, 0x7}, {0xce3840, 0xc0000d9800}, 0x0?, {0xce3360, 0xc0002ae5c0})
node_exporter[80979]:         /app/collector/collector.go:161 +0x9c
node_exporter[80979]: github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1({0xbcd885?, 0x0?}, {0xce3840?, 0xc0000d9800?})
node_exporter[80979]:         /app/collector/collector.go:152 +0x3d
node_exporter[80979]: created by github.com/prometheus/node_exporter/collector.NodeCollector.Collect
node_exporter[80979]:         /app/collector/collector.go:151 +0xd0
systemd[1]: node_exporter.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: node_exporter.service: Failed with result 'exit-code'.

Are you running node_exporter in Docker?

No. Systemd.

What did you do that produced an error?

Only occurs on hosts with offline CPU, in this case 96 vCPU down to 48 with SMT disabled.
Am able to avoid this issue by adding --no-collector.cpufreq to startup options.

What did you expect to see?

No exit

What did you see instead?

Exit

@discordianfish
Copy link
Member

Ugh I thought we fixed this in prometheus/procfs#497

@discordianfish
Copy link
Member

@taherkk any ideas if this is same/different from what you worked on?

@dswarbrick
Copy link
Contributor

It looks like it was really fixed in prometheus/procfs#534, which was included in procfs v0.11.0. The stacktrace shows that node_exporter 1.6.0 was built with procfs v0.10.0.

@SuperQ
Copy link
Member

SuperQ commented Jul 17, 2023

Ahh, boo, I should have pulled that into 1.6.1 today.

@zer0stars
Copy link

Having the same problem.

panic: runtime error: slice bounds out of range [4:2]

goroutine 71 [running]:
github.com/prometheus/procfs/sysfs.filterOfflineCPUs(0xc0002c8400?, 0xc000252c08)
    /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:181 +0x23a
github.com/prometheus/procfs/sysfs.FS.SystemCpufreq({{0x7ffc1e0e3c23?, 0x9?}})
    /go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:209 +0x28b
github.com/prometheus/node_exporter/collector.(*cpuFreqCollector).Update(0x0?, 0x0?)
    /app/collector/cpufreq_linux.go:51 +0x45
github.com/prometheus/node_exporter/collector.execute({0xbcd883, 0x7}, {0xce39e0, 0xc000079760}, 0x0?, {0xce3500, 0xc000162580})
    /app/collector/collector.go:161 +0x9c
github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1({0xbcd883?, 0x0?}, {0xce39e0?, 0xc000079760?})
    /app/collector/collector.go:152 +0x3d
created by github.com/prometheus/node_exporter/collector.NodeCollector.Collect
    /app/collector/collector.go:151 +0xd0```

@zer0stars
Copy link

@SuperQ when can we expect a release with latest procfs?

@tongwoojun
Copy link

tongwoojun commented Aug 10, 2023

hi, i find new BUG;
version:1.6.1
My server is ubuntu

`
panic: runtime error: slice bounds out of range [4:3]

goroutine 74 [running]:
github.com/prometheus/procfs/sysfs.filterOfflineCPUs(0xc0001ad400?, 0xc00023dc08)
/go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:181 +0x23a
github.com/prometheus/procfs/sysfs.FS.SystemCpufreq({{0xbca465?, 0x4?}})
/go/pkg/mod/github.com/prometheus/procfs@v0.10.0/sysfs/system_cpu.go:209 +0x28b
github.com/prometheus/node_exporter/collector.(*cpuFreqCollector).Update(0x0?, 0x0?)
/app/collector/cpufreq_linux.go:51 +0x45
github.com/prometheus/node_exporter/collector.execute({0xbcd883, 0x7}, {0xce39e0, 0xc0000350a0}, 0x0?, {0xce3500, 0xc00015a580})
/app/collector/collector.go:161 +0x9c
github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1({0xbcd883?, 0x0?}, {0xce39e0?, 0xc0000350a0?})
/app/collector/collector.go:152 +0x3d
created by github.com/prometheus/node_exporter/collector.NodeCollector.Collect
/app/collector/collector.go:151 +0xd0
`

@discordianfish
Copy link
Member

@SuperQ Guess this deserves a new release?

@dels78
Copy link

dels78 commented Sep 2, 2023

Reverting to 1.5.0 stops this issue! I hope we can get an updated version soon too!

@spammads
Copy link

+1

@johnkord
Copy link
Contributor

I would appreciate a new release as well!

@johnkord
Copy link
Contributor

What are the criteria for a new release @discordianfish? I hope it's not too much trouble to get one sometime soon. Thank you very much for your hard work.

@SuperQ
Copy link
Member

SuperQ commented Sep 22, 2023

We do not have a strict release schedule. But I am planning to do one soon.

@unreturned
Copy link

Looks like the problem fixed in https://github.com/prometheus/node_exporter/releases/tag/v1.7.0 and this issue can be closed

@SuperQ SuperQ closed this as completed Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants