Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Interpreting results #116

Closed
wanecek opened this issue May 25, 2021 · 6 comments
Closed

Documentation: Interpreting results #116

wanecek opened this issue May 25, 2021 · 6 comments
Labels
enhancement New feature or request
Projects

Comments

@wanecek
Copy link

wanecek commented May 25, 2021

Problem

It can be confusing to interpret the domains that RAPL readings output (Core, Uncore, CPU/Package).

Solution

Add a short section to the documentation on what these different domains entail. I am more than happy to help with this, but I would need some help to understand this myself first.

For example, I get the following results, which at least makes it look like Core is a subset of CPU (as they consistently have the same slope), but I'm not sure if that's the case or not. To elaborate, if I wanted the entire host consumption, do I add CPU+CORE+DRAM, or is it rather CORE + DRAM?

Screenshot-2021-05-25T14:58:49

Alternatives

If there is another resource that already covers this well, we can of course link to it instead.

@wanecek wanecek added the enhancement New feature or request label May 25, 2021
@bpetit bpetit added this to Triage in General May 26, 2021
@demeringo
Copy link
Contributor

My -limited- understanding is that we should sum CPU + DRAM to approximate the host consumption without considering display a.s.o,

What I understand from the docs or discussions is that CPU is a general term we use to designate a package (i.e. the physical chip that we plug into the motherboard CPU socket).
This package may contain multiple core(s) + cache + controllers. I think that package = core(s) + uncore but I am not sure of what is meant by uncore.

Also, according to the intel doc, I would not have expected my laptop to show DRAM domain (if I get it right, the docs seems to say that this is reserved to servers products, not client products). However DRAM seems to be displayed on my laptop (Intel core i5-6200U CPU).

Host:	5.96226 W	Core		Uncore		DRAM
Socket0	6.012866 W	0.728036 W	1.677757 W	2.841885 W	
Top 5 consumers:
Power	PID	Exe
3.8609004 W	342	""
0.7208643 W	309	""
0.0122180395 W	10	""
0.0122180395 W	274	""
0 W	1	""

References:

@demeringo
Copy link
Contributor

The following diagram gives a clear view of RAPL domains.
https://raw.githubusercontent.com/powerapi-ng/pyJoules/master/rapl_domains.png

It also indicates that uncore domain corresponds to the integrated GPU.

Source: Powerapi-ng/pyJoules project documentation
https://github.com/powerapi-ng/pyJoules/blob/master/README.md#rapl-domain-description

@PierreRust
Copy link
Collaborator

PierreRust commented Jun 4, 2021

For an "official" reference, you can look at Intel's System Programming Manual , specifically section 14.9.2 RAPL Domains and Platform Specificity (p500) :

The specific RAPL domains available in a platform vary across product segments. Platforms targeting the client segment support the following RAPL domain hierarchy:

  • Package
  • Two power planes: PP0 and PP1 (PP1 may reflect to uncore devices)
    Platforms targeting the server segment support the following RAPL domain hierarchy:
  • Package
  • Power plane: PP0
  • DRAM

The description of PP1, aka 'uncore', (given in 14.9.4) is very vague :

The availability of PP1 RAPL domain interface is platform-specific. For a client platform, the PP1 domain refers to the power plane of a specific device in the uncore.

So actually it's never officially written that uncore actually maps to the integrated GPU, although that seems to be the case. My guess is that Intel prefers to keep some liberty on what goes into this power package.

Additionnally, on recent client / soc processors there is a new 'PSys' domain for which I could not find any description in the doc.

For another interesting reference, you can have a look at the paper “RAPL in Action: Experiences in Using RAPL for Power Measurements,” (K. N. Khan, M. Hirki, T. Niemi, J. K. Nurminen, and Z. Ou, ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 3, no. 2, pp. 1–26, Apr. 2018, doi: 10.1145/3177754.) , which includes the following diagram:

image

In scaphandre case, my opinion is that the focus should be on Package, core and dram domain, which are the domains supported by server cpu (which is imho scaphandre's target use case).

@bpetit bpetit moved this from Triage to To do in General Oct 5, 2021
@bpetit
Copy link
Contributor

bpetit commented May 25, 2023

link to #318 and #316 as doc should be updated once those metrics are added

@bpetit
Copy link
Contributor

bpetit commented Mar 6, 2024

#318 and #316 have been merged in 1.0.0, so you could probably checkout new PSYS and MMIO metrics if your machine provides them.

The documentation has been inmproved on this topic, see : https://hubblo-org.github.io/scaphandre-documentation/explanations/rapl-domains.html

What do you think about it @wanecek ?

@wanecek
Copy link
Author

wanecek commented Mar 7, 2024

Wow, time flies! That looks like an excellent solution. Thanks for the further development on this tool and for getting back to me. Best wishes!

@wanecek wanecek closed this as completed Mar 7, 2024
General automation moved this from To do to Done Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

4 participants