Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCS/ARCH: Read from cpu MHz for CPU speed on x86 #2793

Closed
wants to merge 1 commit into from
Closed

UCS/ARCH: Read from cpu MHz for CPU speed on x86 #2793

wants to merge 1 commit into from

Conversation

rzambre
Copy link
Contributor

@rzambre rzambre commented Aug 13, 2018

Fixes #2792.

When the user changes the frequency of the CPU, UCS
still reads the base frequency on x86 for its
calculations.

This patch ensures that the CPU frequency is read
from the `cpu MHz` field of `cpuinfo` and not from
`model name`. The function that reads from the
model name is removed because it is not used anywhere
else and, if kept, would throw a build error.

Signed-off-by: Rohit Zambre <rzambre@uci.edu>
@swx-jenkins1
Copy link

Can one of the admins verify this patch?

@hoopoepg
Copy link
Contributor

ok to test

@mellanox-github
Copy link
Contributor

Can one of the admins verify this patch?

@shamisp
Copy link
Contributor

shamisp commented Aug 13, 2018

It is okay for now but in longer term for x86 I would suggest to follow perftest approach
https://github.com/linux-rdma/perftest/blob/master/src/get_clock.c

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5040/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/7448/ for details (Mellanox internal link).

Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fix is incorrect since the clock reported by "cpu MHz" in procinfo is not always accurate and may change

@yosefe
Copy link
Contributor

yosefe commented Aug 30, 2018

as discussed on conf call, suggested approach is to have "slow and accurate" and "fast and inaccurate" version of get_time

@shamisp
Copy link
Contributor

shamisp commented Aug 30, 2018

@rzambre what is the paper that you mentioned the other day ?

@rzambre
Copy link
Contributor Author

rzambre commented Aug 30, 2018

This is the paper that I was referring to: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf The paper talks how to measure for architectures that do and do not support rdtscp.

The paper's methodology also seems to be used in courses at universities: http://cseweb.ucsd.edu/classes/wi16/cse221-a/timing.html The infrastructure is basically the code snippet under the Examine the Assembly Code section.

The overhead of this infrastructure is 16ns (similar to the current infrastructure), according to my measurements.

Since UCS has different macros for start (SCOPE_START) and end (SCOPE_END) of timers, I think we can forward the ucs_profile_type_t to ucs_get_time to call the right sequence of assembly instructions. For SAMPLE, a rdtscp followed by cpuid would be the right thing (the end-timer portion of the infrastructure), according to the explanation in the white paper.

@yosefe
Copy link
Contributor

yosefe commented Dec 3, 2018

@rzambre closing this one, replaced by #3070

@yosefe yosefe closed this Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants