Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upPower consumption #532
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
natoscott
Jul 27, 2018
Contributor
@jcpunk I'd like to see this too. If anyone is keen to hack on this and needs any help with the PCP side of things, just let me know.
|
@jcpunk I'd like to see this too. If anyone is keen to hack on this and needs any help with the PCP side of things, just let me know. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
christianhorn
Oct 4, 2018
Contributor
I looked around the topic a bit, just leaving thoughts here.
- Seems like the used power can only be computed with either
- a) external power measurement device (i.e. powertop can deal with one)
- or b) the metering offered by laptops when they run on battery. For servers, both do not seem to apply. Vendors might implement something similar to a) though and make it available, I fear they mostly make it available via proprietary interfaces. Maybe they don't make it available to linux userland via 3rd party daemons, but only in their systems bios/remote control boards.
- As for usable code to implement pmda power metrics:
- Intel powertop ( https://github.com/fenrus75/powertop ) is big, but is not offering functions via a library
- PowerAPI looks like a stalled approach ( https://github.com/rouvoy/powerapi https://ercim-news.ercim.eu/en92/special/powerapi-a-software-library-to-monitor-the-energy-consumed-at-the-process-level http://abourdon.github.io/powerapi-akka/ )
- powerstat ( https://github.com/ColinIanKing/powerstat ) is maybe the best code to start with, one could use that code as base. Directly in a pmda or abstracted as a library, and then be used from a library. Maybe besides library also a constantly running userspace daemon would be required.
So while this looks like quite some things would be required, the outcome could be nice. We could answer things like
- How much energy did this process use in the last week? This computation would assume that i.e. for measuring frequency of 10min, the process was over the 10min using in average the measured value. With higher measuring frequency, the value would get more accurate.
- "This nginx did run for a week. How long would I need to run this bicycle power generator to generate the same energy?"
- We might be able to compare how power efficient various CPU types can perform the same calculation. I.e. if an ARM takes double the time of an x86, but uses just 30% of the power while computing, it would be more power efficient.
- In case we get GPUs also measured, we could see how power-efficient GPU and CPUs can compute a certain calculation.
|
I looked around the topic a bit, just leaving thoughts here.
So while this looks like quite some things would be required, the outcome could be nice. We could answer things like
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jhansonhpe
Oct 4, 2018
Power API is not stalled. We just move really, really slowly. The standard is now public at https://github.com/pwrapi with reference implementation and a few plugins. That being said I think there are other approaches that might be more interesting.
Intel has a new project called GeoPM - https://geopm.github.io/ which is more at the level of facility than at a system level. Still it is able to gather power from nodes. I believe there will be non Intel CPUs that will supported by their vendors
There is a new approach in development that does not, as yet, have a public presence. I might be able to comment more in a few weeks after I go to a conference and a Power API spec meeting.
@natoscott is there a particular PMDA that aligns with this effort I could start looking at as a model?
jhansonhpe
commented
Oct 4, 2018
|
Power API is not stalled. We just move really, really slowly. The standard is now public at https://github.com/pwrapi with reference implementation and a few plugins. That being said I think there are other approaches that might be more interesting. Intel has a new project called GeoPM - https://geopm.github.io/ which is more at the level of facility than at a system level. Still it is able to gather power from nodes. I believe there will be non Intel CPUs that will supported by their vendors There is a new approach in development that does not, as yet, have a public presence. I might be able to comment more in a few weeks after I go to a conference and a Power API spec meeting. @natoscott is there a particular PMDA that aligns with this effort I could start looking at as a model? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
minnus
Oct 4, 2018
Contributor
It just so happens that we've been looking at this over the past few weeks. I forgot about this github issue. Our efforts are maybe a little bit more corse grained than what you are interested in, but here is what we have done so far:
https://github.com/ubccr/pcp/tree/ipmi-pmda/src/pmdas/ipmi
A pmda that uses the dcmi ipmi interface to get chassis power utilization. There's a ton of stuff you can get from IPMI, but this is as far as we will probably go for our needs. Once I get all the associated qa, man pages, etc, done, I'll push this up. A few caveats: we have a lot of old hardware where IPMI is completely broken. Calls will return errors, 0 values, or completely lock up the BMC in some cases. Newer hardware works pretty well.
We have validated these metrics against per outlet PDU measurements using the JSON pmda querying snmp. Probably not the most elegant solution, but if you have a smart PDU with per outlet monitoring, you can do something like:
#!/bin/bash
val=`snmpget -Oq -Ov -v 1 -c foobar pdu_ip rPDUOutletStatusLoad.X`
echo "{\"power\": $((val * 208 /10))}"
Where the config for the metric is:
{
"prefix" : "snmp",
"data-exec": "/var/lib/pcp/pmdas/json/trusted/run.sh",
"metrics": [
{
"name": "power",
"pointer": "/power",
"type": "integer",
"description": "Power in watts"
}
]
}
Then you get a json.snmp.power metric.
Also, I just added power measurement to the nvidia pmda. If you have a Fermi (I believe) or newer card, this should work:
https://github.com/ubccr/pcp/tree/nvidiapmda_power/src/pmdas/nvidia
All these changes are based on 3.12.2. We haven't made the jump to pcp 4 yet, so YMMV if you try to port these. I hope to get there soon.
Finally, you should be able to use the RAPL counters in the perfevent pmda if your chip supports that. We've started validating those as well. You just need to be careful that you sample the metrics frequently enough, since those counters roll over rather quickly.
I'd be happy to discuss any other ideas you have since this is my current focus.
|
It just so happens that we've been looking at this over the past few weeks. I forgot about this github issue. Our efforts are maybe a little bit more corse grained than what you are interested in, but here is what we have done so far: https://github.com/ubccr/pcp/tree/ipmi-pmda/src/pmdas/ipmi A pmda that uses the dcmi ipmi interface to get chassis power utilization. There's a ton of stuff you can get from IPMI, but this is as far as we will probably go for our needs. Once I get all the associated qa, man pages, etc, done, I'll push this up. A few caveats: we have a lot of old hardware where IPMI is completely broken. Calls will return errors, 0 values, or completely lock up the BMC in some cases. Newer hardware works pretty well. We have validated these metrics against per outlet PDU measurements using the JSON pmda querying snmp. Probably not the most elegant solution, but if you have a smart PDU with per outlet monitoring, you can do something like:
Where the config for the metric is:
Then you get a json.snmp.power metric. Also, I just added power measurement to the nvidia pmda. If you have a Fermi (I believe) or newer card, this should work: https://github.com/ubccr/pcp/tree/nvidiapmda_power/src/pmdas/nvidia All these changes are based on 3.12.2. We haven't made the jump to pcp 4 yet, so YMMV if you try to port these. I hope to get there soon. Finally, you should be able to use the RAPL counters in the perfevent pmda if your chip supports that. We've started validating those as well. You just need to be careful that you sample the metrics frequently enough, since those counters roll over rather quickly. I'd be happy to discuss any other ideas you have since this is my current focus. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
fche
Oct 4, 2018
Contributor
FWIW, pmdaprometheus is probably a more future-proof approach to this problem than pmdajson.
|
FWIW, pmdaprometheus is probably a more future-proof approach to this problem than pmdajson. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
minnus
Oct 4, 2018
Contributor
Interesting, OK. I didn't have time to follow the pmdaprometheus discussion and would not have thought to look there for this functionality. Thanks for the pointer
|
Interesting, OK. I didn't have time to follow the pmdaprometheus discussion and would not have thought to look there for this functionality. Thanks for the pointer |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
fche
Oct 4, 2018
Contributor
What we mean is that creating new data sources (like that snmp->power script) in the form of prometheus-exporter makes it usable by pcp (via pmdaprometheus) as well as other tools.
|
What we mean is that creating new data sources (like that snmp->power script) in the form of prometheus-exporter makes it usable by pcp (via pmdaprometheus) as well as other tools. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
minnus
Oct 4, 2018
Contributor
Right, yeah I understood. Thats exactly what I used pmdajson for. I initially tried pmdapipe and couldn't get that working reliably, and then had a vague recollection that pmdajson did a similar thing. I can easily migrate my pmdajson config to use pmdaprometheus.
What I meant was that i didn't think to look at pmdaprometheus to see if it had the ability to run arbitrary scripts like pmdajson.
|
Right, yeah I understood. Thats exactly what I used pmdajson for. I initially tried pmdapipe and couldn't get that working reliably, and then had a vague recollection that pmdajson did a similar thing. I can easily migrate my pmdajson config to use pmdaprometheus. What I meant was that i didn't think to look at pmdaprometheus to see if it had the ability to run arbitrary scripts like pmdajson. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
#384 :-) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
christianhorn
Oct 5, 2018
Contributor
What do others think, which scope could make sense for a power pmda? These 3 big areas come to mind.
- 1.) metrics for power consumed by the system (intake). Could be metrics which come out of querying the overall power consumption of the system from bios/bmc-board, or an external measure tool, or from querying battery data on a laptop.
- 2.) metrics for the components where the system spends power. All of these summed up should be equal to the overall consumption from 1. Might be possible to measure GPU, and depending on systems capabilities we might get metrics on how much single components like harddisks, wlan, cpu, memory consumed. It might also be possible to get the data for 2) from a calibration procedure: even if the system is not dedicatedly reporting harddisk consumption, if it reports the overall consumption one could put the harddisk to sleep and measure.
- 3.) metrics on how much of above resources single processes did use. For this, it would be interesting to see per process, how many % of CPU, of I/O to a harddisk, and of GPU were used.
|
What do others think, which scope could make sense for a power pmda? These 3 big areas come to mind.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
natoscott
Oct 5, 2018
Contributor
@christianhorn yep, all of those areas make sense to me for a power PMDA. I would recommend starting with the basic metrics you need, and evolve it from there. In terms of point 3, keep in mind that exporting values for individual processes is most easily done using the existing pmdaproc - its a tricky instance domain to get right.
|
@christianhorn yep, all of those areas make sense to me for a power PMDA. I would recommend starting with the basic metrics you need, and evolve it from there. In terms of point 3, keep in mind that exporting values for individual processes is most easily done using the existing pmdaproc - its a tricky instance domain to get right. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
christianhorn
Oct 10, 2018
Contributor
I will not come up with a push request for pmda-power myself. If only desktop hardware had proper power consumption measurement build in, that would make testing much nicer.
Implementing a metric for getting consumption via IPMI, as mentioned by @minnus , sounds like a great start for a PMDA.
Can't stop thinking of nice things one could do with this. With such a PMDA, it might be possible in the future to compare power consumption of multiple builds, i.e. "your recent commit is passing the QA tests, but we have seen that it increased power consumption while running the test suite by 20%".
|
I will not come up with a push request for pmda-power myself. If only desktop hardware had proper power consumption measurement build in, that would make testing much nicer. Can't stop thinking of nice things one could do with this. With such a PMDA, it might be possible in the future to compare power consumption of multiple builds, i.e. "your recent commit is passing the QA tests, but we have seen that it increased power consumption while running the test suite by 20%". |
jcpunk commentedJul 26, 2018
Utilities such as powertop are able to query the system's power consumption. It would be helpful if those same metrics were captured here so that I could chart power consumption along with other metrics.
Ideally I'd love to have really fine data (per disk, cpu, pci device, etc) but I'd be perfectly happy with a raw consumption number.