New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power consumption #532

Open
jcpunk opened this Issue Jul 26, 2018 · 13 comments

Comments

Projects
None yet
7 participants
@jcpunk

jcpunk commented Jul 26, 2018

Utilities such as powertop are able to query the system's power consumption. It would be helpful if those same metrics were captured here so that I could chart power consumption along with other metrics.

Ideally I'd love to have really fine data (per disk, cpu, pci device, etc) but I'd be perfectly happy with a raw consumption number.

@natoscott

This comment has been minimized.

Show comment
Hide comment
@natoscott

natoscott Jul 27, 2018

Contributor

@jcpunk I'd like to see this too. If anyone is keen to hack on this and needs any help with the PCP side of things, just let me know.

Contributor

natoscott commented Jul 27, 2018

@jcpunk I'd like to see this too. If anyone is keen to hack on this and needs any help with the PCP side of things, just let me know.

@christianhorn

This comment has been minimized.

Show comment
Hide comment
@christianhorn

christianhorn Oct 4, 2018

Contributor

I looked around the topic a bit, just leaving thoughts here.

So while this looks like quite some things would be required, the outcome could be nice. We could answer things like

  • How much energy did this process use in the last week? This computation would assume that i.e. for measuring frequency of 10min, the process was over the 10min using in average the measured value. With higher measuring frequency, the value would get more accurate.
  • "This nginx did run for a week. How long would I need to run this bicycle power generator to generate the same energy?"
  • We might be able to compare how power efficient various CPU types can perform the same calculation. I.e. if an ARM takes double the time of an x86, but uses just 30% of the power while computing, it would be more power efficient.
  • In case we get GPUs also measured, we could see how power-efficient GPU and CPUs can compute a certain calculation.
Contributor

christianhorn commented Oct 4, 2018

I looked around the topic a bit, just leaving thoughts here.

So while this looks like quite some things would be required, the outcome could be nice. We could answer things like

  • How much energy did this process use in the last week? This computation would assume that i.e. for measuring frequency of 10min, the process was over the 10min using in average the measured value. With higher measuring frequency, the value would get more accurate.
  • "This nginx did run for a week. How long would I need to run this bicycle power generator to generate the same energy?"
  • We might be able to compare how power efficient various CPU types can perform the same calculation. I.e. if an ARM takes double the time of an x86, but uses just 30% of the power while computing, it would be more power efficient.
  • In case we get GPUs also measured, we could see how power-efficient GPU and CPUs can compute a certain calculation.
@jhansonhpe

This comment has been minimized.

Show comment
Hide comment
@jhansonhpe

jhansonhpe Oct 4, 2018

Power API is not stalled. We just move really, really slowly. The standard is now public at https://github.com/pwrapi with reference implementation and a few plugins. That being said I think there are other approaches that might be more interesting.

Intel has a new project called GeoPM - https://geopm.github.io/ which is more at the level of facility than at a system level. Still it is able to gather power from nodes. I believe there will be non Intel CPUs that will supported by their vendors

There is a new approach in development that does not, as yet, have a public presence. I might be able to comment more in a few weeks after I go to a conference and a Power API spec meeting.

@natoscott is there a particular PMDA that aligns with this effort I could start looking at as a model?

jhansonhpe commented Oct 4, 2018

Power API is not stalled. We just move really, really slowly. The standard is now public at https://github.com/pwrapi with reference implementation and a few plugins. That being said I think there are other approaches that might be more interesting.

Intel has a new project called GeoPM - https://geopm.github.io/ which is more at the level of facility than at a system level. Still it is able to gather power from nodes. I believe there will be non Intel CPUs that will supported by their vendors

There is a new approach in development that does not, as yet, have a public presence. I might be able to comment more in a few weeks after I go to a conference and a Power API spec meeting.

@natoscott is there a particular PMDA that aligns with this effort I could start looking at as a model?

@minnus

This comment has been minimized.

Show comment
Hide comment
@minnus

minnus Oct 4, 2018

Contributor

@jhansonhpe @christianhorn

It just so happens that we've been looking at this over the past few weeks. I forgot about this github issue. Our efforts are maybe a little bit more corse grained than what you are interested in, but here is what we have done so far:

https://github.com/ubccr/pcp/tree/ipmi-pmda/src/pmdas/ipmi

A pmda that uses the dcmi ipmi interface to get chassis power utilization. There's a ton of stuff you can get from IPMI, but this is as far as we will probably go for our needs. Once I get all the associated qa, man pages, etc, done, I'll push this up. A few caveats: we have a lot of old hardware where IPMI is completely broken. Calls will return errors, 0 values, or completely lock up the BMC in some cases. Newer hardware works pretty well.

We have validated these metrics against per outlet PDU measurements using the JSON pmda querying snmp. Probably not the most elegant solution, but if you have a smart PDU with per outlet monitoring, you can do something like:

#!/bin/bash

val=`snmpget -Oq -Ov -v 1 -c foobar  pdu_ip  rPDUOutletStatusLoad.X`

echo "{\"power\": $((val * 208 /10))}"

Where the config for the metric is:

{
    "prefix" : "snmp",
    "data-exec": "/var/lib/pcp/pmdas/json/trusted/run.sh",
    "metrics": [
      {
        "name": "power",
        "pointer": "/power",
        "type": "integer",
        "description": "Power in watts"
      }
    ]
}

Then you get a json.snmp.power metric.

Also, I just added power measurement to the nvidia pmda. If you have a Fermi (I believe) or newer card, this should work:

https://github.com/ubccr/pcp/tree/nvidiapmda_power/src/pmdas/nvidia

All these changes are based on 3.12.2. We haven't made the jump to pcp 4 yet, so YMMV if you try to port these. I hope to get there soon.

Finally, you should be able to use the RAPL counters in the perfevent pmda if your chip supports that. We've started validating those as well. You just need to be careful that you sample the metrics frequently enough, since those counters roll over rather quickly.

I'd be happy to discuss any other ideas you have since this is my current focus.

Contributor

minnus commented Oct 4, 2018

@jhansonhpe @christianhorn

It just so happens that we've been looking at this over the past few weeks. I forgot about this github issue. Our efforts are maybe a little bit more corse grained than what you are interested in, but here is what we have done so far:

https://github.com/ubccr/pcp/tree/ipmi-pmda/src/pmdas/ipmi

A pmda that uses the dcmi ipmi interface to get chassis power utilization. There's a ton of stuff you can get from IPMI, but this is as far as we will probably go for our needs. Once I get all the associated qa, man pages, etc, done, I'll push this up. A few caveats: we have a lot of old hardware where IPMI is completely broken. Calls will return errors, 0 values, or completely lock up the BMC in some cases. Newer hardware works pretty well.

We have validated these metrics against per outlet PDU measurements using the JSON pmda querying snmp. Probably not the most elegant solution, but if you have a smart PDU with per outlet monitoring, you can do something like:

#!/bin/bash

val=`snmpget -Oq -Ov -v 1 -c foobar  pdu_ip  rPDUOutletStatusLoad.X`

echo "{\"power\": $((val * 208 /10))}"

Where the config for the metric is:

{
    "prefix" : "snmp",
    "data-exec": "/var/lib/pcp/pmdas/json/trusted/run.sh",
    "metrics": [
      {
        "name": "power",
        "pointer": "/power",
        "type": "integer",
        "description": "Power in watts"
      }
    ]
}

Then you get a json.snmp.power metric.

Also, I just added power measurement to the nvidia pmda. If you have a Fermi (I believe) or newer card, this should work:

https://github.com/ubccr/pcp/tree/nvidiapmda_power/src/pmdas/nvidia

All these changes are based on 3.12.2. We haven't made the jump to pcp 4 yet, so YMMV if you try to port these. I hope to get there soon.

Finally, you should be able to use the RAPL counters in the perfevent pmda if your chip supports that. We've started validating those as well. You just need to be careful that you sample the metrics frequently enough, since those counters roll over rather quickly.

I'd be happy to discuss any other ideas you have since this is my current focus.

@fche

This comment has been minimized.

Show comment
Hide comment
@fche

fche Oct 4, 2018

Contributor

FWIW, pmdaprometheus is probably a more future-proof approach to this problem than pmdajson.

Contributor

fche commented Oct 4, 2018

FWIW, pmdaprometheus is probably a more future-proof approach to this problem than pmdajson.

@myllynen

This comment has been minimized.

Show comment
Hide comment
@myllynen

myllynen Oct 4, 2018

Contributor

Agreed, there are some long-standing issues with pmdajson with no signs those being worked on; see:

#432
#484

Contributor

myllynen commented Oct 4, 2018

Agreed, there are some long-standing issues with pmdajson with no signs those being worked on; see:

#432
#484

@minnus

This comment has been minimized.

Show comment
Hide comment
@minnus

minnus Oct 4, 2018

Contributor

Interesting, OK. I didn't have time to follow the pmdaprometheus discussion and would not have thought to look there for this functionality. Thanks for the pointer

Contributor

minnus commented Oct 4, 2018

Interesting, OK. I didn't have time to follow the pmdaprometheus discussion and would not have thought to look there for this functionality. Thanks for the pointer

@fche

This comment has been minimized.

Show comment
Hide comment
@fche

fche Oct 4, 2018

Contributor

What we mean is that creating new data sources (like that snmp->power script) in the form of prometheus-exporter makes it usable by pcp (via pmdaprometheus) as well as other tools.

Contributor

fche commented Oct 4, 2018

What we mean is that creating new data sources (like that snmp->power script) in the form of prometheus-exporter makes it usable by pcp (via pmdaprometheus) as well as other tools.

@minnus

This comment has been minimized.

Show comment
Hide comment
@minnus

minnus Oct 4, 2018

Contributor

Right, yeah I understood. Thats exactly what I used pmdajson for. I initially tried pmdapipe and couldn't get that working reliably, and then had a vague recollection that pmdajson did a similar thing. I can easily migrate my pmdajson config to use pmdaprometheus.

What I meant was that i didn't think to look at pmdaprometheus to see if it had the ability to run arbitrary scripts like pmdajson.

Contributor

minnus commented Oct 4, 2018

Right, yeah I understood. Thats exactly what I used pmdajson for. I initially tried pmdapipe and couldn't get that working reliably, and then had a vague recollection that pmdajson did a similar thing. I can easily migrate my pmdajson config to use pmdaprometheus.

What I meant was that i didn't think to look at pmdaprometheus to see if it had the ability to run arbitrary scripts like pmdajson.

@fche

This comment has been minimized.

Show comment
Hide comment
@fche

fche Oct 4, 2018

Contributor

#384 :-)

Contributor

fche commented Oct 4, 2018

#384 :-)

@christianhorn

This comment has been minimized.

Show comment
Hide comment
@christianhorn

christianhorn Oct 5, 2018

Contributor

What do others think, which scope could make sense for a power pmda? These 3 big areas come to mind.

  • 1.) metrics for power consumed by the system (intake). Could be metrics which come out of querying the overall power consumption of the system from bios/bmc-board, or an external measure tool, or from querying battery data on a laptop.
  • 2.) metrics for the components where the system spends power. All of these summed up should be equal to the overall consumption from 1. Might be possible to measure GPU, and depending on systems capabilities we might get metrics on how much single components like harddisks, wlan, cpu, memory consumed. It might also be possible to get the data for 2) from a calibration procedure: even if the system is not dedicatedly reporting harddisk consumption, if it reports the overall consumption one could put the harddisk to sleep and measure.
  • 3.) metrics on how much of above resources single processes did use. For this, it would be interesting to see per process, how many % of CPU, of I/O to a harddisk, and of GPU were used.
Contributor

christianhorn commented Oct 5, 2018

What do others think, which scope could make sense for a power pmda? These 3 big areas come to mind.

  • 1.) metrics for power consumed by the system (intake). Could be metrics which come out of querying the overall power consumption of the system from bios/bmc-board, or an external measure tool, or from querying battery data on a laptop.
  • 2.) metrics for the components where the system spends power. All of these summed up should be equal to the overall consumption from 1. Might be possible to measure GPU, and depending on systems capabilities we might get metrics on how much single components like harddisks, wlan, cpu, memory consumed. It might also be possible to get the data for 2) from a calibration procedure: even if the system is not dedicatedly reporting harddisk consumption, if it reports the overall consumption one could put the harddisk to sleep and measure.
  • 3.) metrics on how much of above resources single processes did use. For this, it would be interesting to see per process, how many % of CPU, of I/O to a harddisk, and of GPU were used.
@natoscott

This comment has been minimized.

Show comment
Hide comment
@natoscott

natoscott Oct 5, 2018

Contributor

@christianhorn yep, all of those areas make sense to me for a power PMDA. I would recommend starting with the basic metrics you need, and evolve it from there. In terms of point 3, keep in mind that exporting values for individual processes is most easily done using the existing pmdaproc - its a tricky instance domain to get right.

Contributor

natoscott commented Oct 5, 2018

@christianhorn yep, all of those areas make sense to me for a power PMDA. I would recommend starting with the basic metrics you need, and evolve it from there. In terms of point 3, keep in mind that exporting values for individual processes is most easily done using the existing pmdaproc - its a tricky instance domain to get right.

@christianhorn

This comment has been minimized.

Show comment
Hide comment
@christianhorn

christianhorn Oct 10, 2018

Contributor

I will not come up with a push request for pmda-power myself. If only desktop hardware had proper power consumption measurement build in, that would make testing much nicer.
Implementing a metric for getting consumption via IPMI, as mentioned by @minnus , sounds like a great start for a PMDA.

Can't stop thinking of nice things one could do with this. With such a PMDA, it might be possible in the future to compare power consumption of multiple builds, i.e. "your recent commit is passing the QA tests, but we have seen that it increased power consumption while running the test suite by 20%".

Contributor

christianhorn commented Oct 10, 2018

I will not come up with a push request for pmda-power myself. If only desktop hardware had proper power consumption measurement build in, that would make testing much nicer.
Implementing a metric for getting consumption via IPMI, as mentioned by @minnus , sounds like a great start for a PMDA.

Can't stop thinking of nice things one could do with this. With such a PMDA, it might be possible in the future to compare power consumption of multiple builds, i.e. "your recent commit is passing the QA tests, but we have seen that it increased power consumption while running the test suite by 20%".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment