Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus exporter: glitch happen with systemd process power consumption #19

Closed
bpetit opened this issue Nov 27, 2020 · 12 comments
Closed
Labels
bug Something isn't working
Projects

Comments

@bpetit
Copy link
Contributor

bpetit commented Nov 27, 2020

Bug description

Glitch of unreal power consumption happen sometimes. Seems to be related specifically to systemd.

To Reproduce

No specific procedure to reproduce. Run the prometheus exporter for quite a long time and hope you are lucky,.

Expected behavior

Screenshots

2020-11-27_11-00

Environment

  • OS: GNU/Linux 5.4.0-54-generic Ubuntu 20.04.1

Additional context

@bpetit bpetit added the bug Something isn't working label Nov 27, 2020
@bpetit bpetit added this to Triage in General Jan 12, 2021
@Mathieu-Coupe
Copy link

Same issue, but much more frequent.

image

@bpetit
Copy link
Contributor Author

bpetit commented Jan 17, 2021

Woot, super strange 😳
thanks for reporting !

@bpetit
Copy link
Contributor Author

bpetit commented Jan 17, 2021

what is the exact request you used for your graph ?

@Mathieu-Coupe
Copy link

Mathieu-Coupe commented Jan 18, 2021

what is the exact request you used for your graph ?

I've clone the grafana dashboard and updated one of the "X power consumption" graph to :
scaph_process_power_consumption_microwatts{exe=~"systemd.*"} / 1000000

I've narrowed the filter to
scaph_process_power_consumption_microwatts{cmdline=~"--user.*systemd.*"} / 1000000

meaning that the processes that generate these high power are the "/lib/systemd/systemd --user" that are the root of one user session.

I've attached an export generated by the grafana query inspector.
export.zip

Also, the cmdline content seems inverted as it reads "--user /lib/systemd/systemd" instead of "/lib/systemd/systemd --user" (and also missing space between args)

@bpetit
Copy link
Contributor Author

bpetit commented Jan 18, 2021

Could you paste your CPU model from /proc/cpuinfo, the result of uname -r and you linux distribution and its version please ?

@Mathieu-Coupe
Copy link

This is a Ubuntu 20.10 distro running on a Core i3-7100. This processor has a TDP of 51W.

$ uname -r
5.8.0-25-generic

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
stepping : 9
microcode : 0xd6
cpu MHz : 2099.251
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips : 7799.87
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

@bpetit bpetit moved this from Triage to To do in General Jan 20, 2021
@bpetit
Copy link
Contributor Author

bpetit commented Feb 2, 2021

To investigate further on this bug we will need data I guess. I opened an FR to be able to extract this data: #66

Stay tuned

@bpetit bpetit moved this from To do to In progress in General Oct 5, 2021
@bpetit
Copy link
Contributor Author

bpetit commented Oct 30, 2021

Hi @Mathieu-Coupe !

Sorry for the long delay response !

I tried a fix that might have an incidence on this issue : #132

Could you try that on the same machine/setup (if you still have it...) and tell me if it seems to get any better ?

@Mathieu-Coupe
Copy link

Mathieu-Coupe commented Oct 31, 2021

Hi !

Yes, I still have this setup (but it got updated to newer kernel in the meantime).
I installed it back from the compose file provided and the issue seems fixed 👍
I'll let it run from now.

image

There is still a minor issue. I see a spike in the socket consumption only, that goes away if I zoom a little in the graph.
image

Same view when zoomed on the spike
image

Computer is now running Ubuntu 21.04, will upgrade it to 21.10 is a few days

Welcome to Ubuntu 21.04 (GNU/Linux 5.11.0-34-generic x86_64)

@bpetit
Copy link
Contributor Author

bpetit commented Nov 9, 2021

Hi @Mathieu-Coupe thanks for the feedback !

About the lasting issue. It seems to me (but I could be wrong), that it's another issue, maybe related to grafana or prometheus config ? (why would that be displayed with totally different values depending on the zoom level ?)

Do you have some news about the first one ? Is it still showing better results ?

Thanks !

@Mathieu-Coupe
Copy link

Hi,

there is indeed something to configure on grafana level to better handle on very high spike in the data but the issue itself is that one point got a very high value.

If you look at the raw data sent by prometheus, we can see it :
image

For the rest, still good.

@bpetit
Copy link
Contributor Author

bpetit commented Nov 10, 2021

Hi,

There is something to fix indeed. But this seems not specifically related to systemd, right ?

Would you mind open a new issue for that specific behavior ?

Thanks !

@bpetit bpetit closed this as completed Nov 10, 2021
General automation moved this from In progress to Done Nov 10, 2021
@bpetit bpetit moved this from Done to Previous releases in General Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
General
Previous releases
Development

No branches or pull requests

2 participants