Skip to content

April 2018 Committee meeting notes

jhansonhpe edited this page Apr 12, 2018 · 1 revision

April 6, 2018 Power Api Spec committee meeting

Agenda

  1. Specification – two tickets to discuss and vote on, see https://github.com/pwrapi/powerapi_spec/pulls/
  2. Reference implementation review - discussion on which implementation to use as a community baseline for development
  3. Discussion of Resource Managers and interoperability with Power API and potential API interfaces

Attendees

Natalie Bates, EEHPCWG
Jeff Hanson, HPE
Matt Kappel, Cray
Andrew Younge, SNL
Barry Rountree, LLNL
Ryan Grant, SNL
Sid Jana, Intel
Steve Martin, Cray
Ram Nagappan, Intel
Steve Leak, NERSC
Todd Rosendahl, IBM
Kevin Pedretti, SNL
Jim Laros, SNL

Specification tickets - which to use. Version 1 has names in order as best known to Ryan. Ryan feels it is weak because it lacks organizations. Version 2 has organizational affiliation. Barry asked if the github repo is mentioned. Ryan said he’d add. Steve Martin likes both. Org represented is probably good. Suggestions noting who was chair with a symbol rather than at the top. Todd thinks that makes sense but wonders if the editor changes on version or in time. Steve Leak says it is worth calling out the editor for a version. Todd suggests editor be at the top with symbols for the chair/secretary. Ryan is more or less okay with this. Ryan wants to archive the contributors as we change major versions. Sid asked in chat about list by company and not alphabetic by person. To have focus on contributing companies. And only in the contributors. Ryan would rather have it be by name so that if people changed companies that would be easier. Matt suggests a tag by name to give contributors organizational. Aka author-refmark. Ram likes the author-refmark idea. Ryan proposes a vote on 1 or 2

Natalie Bates, EEHPCWG, V2
Jeff Hanson, HPE - V2
Matt Kappel, Cray V2
Andrew Younge, SNL V2
Barry Rountree, LLNL V2
Ryan Grant, SNL V2
Sid Jana, Intel V2
Steve Martin, Cray V2
Ramkumar Nagappan, Intel V2
Steve Leak, NERSC V2
Todd Rosendahl, IBM V2
Kevin Pedretti, SNL
Jim Laros, SNL V2

With 2/3rds of the organizations attending and 2/3rds of those present voting for Version 2 it is approved. We will discuss use of author refmark in the ticket before a pull request is done.

Jeff/Ryan to figure out a method for organizations to vote as this time was a bit messy. Natalie offers up survey monkey. Steve Martin suggests companies pick a representative. Ryan thinks this is good. Jim reported that Sandia did actually talk ahead. Barry asked who will mark the ticket (1) as rejected (or closed). For the accepted ticket, Barry or Ryan will do the merge request once Ryan refines.

Reference implementation review Community has two implementations (Cray and Sandia). Redfish goes into Sandia version. Todd asked for a description of differences. Sandia is C++ with plugins. These are the measurement/control down to the hardware. Plugins for power insight, rapl, xtp, power gadget are in Sandia. Limitations are not as highly tuned for large scale. System is discovered by XML file. These are a bit hard to develop. Some tools would need to be created describe systems. Hwloc for example.

Cray (Matt) geared towards Trinity (plugin as well but only one done). Only on compute nodes. Scaling is done that way. Uses a daemon on compute node to pass calls, check for permissions, demo source in docs directory, cray specific is counters that are read via sysfs. Otherwise it uses rapl. Todd asks did Cray start with SNL or ? Steve Martin says they were encouraged to write one from scratch for Trinity. Limited scope to just compute nodes which is a known trade off. This resulted in a more refined spec. Reference implementation was more of a higher level feature view at the start.Jim reported that doing the work was very useful. Todd asks if the base reference implementation was then refined.PowerAPI ref was kept up to date. Command shipping is different (JSON in Cray, roll your own in SNL). Sid asked about scalable tuning. Steve Martin said it is a compute node LOCAL implementation. Cray has other code with CAP MC to allow workload managers to interact with compute nodes. Ryan says there are abilities to query non compute nodes (like power insight devices) and does network calls (which is part of the maybe it is not tuned).Cray’s backend might be more efficient. Ryan reports 1000 nodes, 1 to 10Hz is doable. Sid asked if a hierarchy is possible (read and writes to agent, can the agents talk to each other). Ryan said yes. We will have to discuss what is possible as it is a bit open now. Ram asked if HPE version is something else. Ryan said it is a plugin. Jeff to ask if HPE rafpa has been tried with cray_pwrapi reference. Steve Martin states cray version is all in band where rafpa might be different - Sid notes Redfish slides from SC17: https://eehpcwg.llnl.gov/assets/sc17_bof_1715_3_redfish_overview.pdf check out slide 8 which clearly shows Sandia style. Jim asked what is Cray doing for redfish.Steve Martin says the are looking at it but has no details that are shareable. Ram reminded us of the long discussion in the face to face. Vendors would do the sensor work. Is this still the plan? Ryan notes there is already a lot of devices in the plugins source in ref implementation. Ryan thinks multi node is the issue to be solved.

Ryan says this is homework to decide which implementation is to be chosen. Jim asks if this group is going to work out the communication method for tuning. Ryan says this is the benefit of contributing to the community implementation. Steve M asked about are we going to get more vendors? Both system and cpu vendors. Ryan said yes that is the plan. AMD is still interested. Ryan is going some outreach to Europe and Japan.

Resource manager plugins is delayed to next month (or via email). We are very interested in getting this going. Please review your contacts.