New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDCYCLE definition #140
Comments
>>>> On Tue, 27 Feb 2018 09:13:49 +0000 (UTC), Liviu Ionescu ***@***.***> said:
| The current definition in rv32.tx:1171 reads:
| The RDCYCLE pseudoinstruction reads the low XLEN bits of the {\tt
| cycle} CSR which holds a count of the number of clock cycles
| executed by the processor core on which the hart is running from
| an arbitrary start time in the past.
| Given that a core may support multiple harts (1.1 "... A RISC-V- compatible
| core might support multiple RISC-V-compatible hardware threads, or harts,
| through multithreading."), is the RDCYCLE definition as the "number of clock
| cycles executed by the processor core" correct?
Yes.
| Or it should read "number of clock cycle executed by the current hart", to
| account for each hart going to sleep differently and possibly not counting
| cycles at all in deep sleep?
No - it is intended to be the number of cycles executed by the
processor core, not the hart. Precisely defining what is a "core" is
difficult given some implementation choices (e.g., AMD Bulldozer).
Precisely defining what is a "clock cycle" is also difficult given the
range of implementations (including software emulations), but the
intent is that this is used for performance monitoring along with the
other performance counters. In particular, where there is one
hart/core, one would expect cycle-count/instructions-retired to
measure CPI for a hart.
Cores don't have to be exposed to software at all, and an implementor
might choose to pretend multiple harts on one physical core are
running on separate cores with one hart/core, and provide separate
cycle counters for each hart. This might make sense in a simple
barrel processor (e.g., CDC6600 PCPs) where inter-hart timing
interactions are non-existent or minimal.
Where there is more than one hart/core and dynamic multithreading, it
is not generally possible to separate out cycles per hart (especially
with SMT). It might be possible to define a separate performance
counter that tried to capture the number of cycles a particular hart
was running, but this definition would have to be very fuzzy to cover
all the possible threading implementations (cycles any instruction was
issued to execution for this hart?, and/or cycles any instruction
retired? but, how to count cycles this hart was occupying machine
resources but couldn't execute due to stalls? Likely, "all of the
above" would be needed to have understandable performance stats).
This complexity of defining a per-hart cycle count, and also the need
in any case for a total per-core cycle count when tuning multithreaded
code led to just standardizing the per-core cycle counter, which also
happens to work well for the common single hart/core case.
Standardizing what happens during "sleep" is not practical given that
what "sleep" means is not standardized, but if the entire core is
paused (entirely clock-gated or powered-down in deep sleep), then it
is not executing clock cycles, and the cycle count shouldn't be
increasing per the spec. There are many details, e.g., whether clock
cycles required to reset a processor after waking up from a power-down
event should be counted, and these I'd consider
implementation-specific details.
As a side note, just because we can't find a precise definition that
works for all platforms, doesn't mean it isn't a useful facility for
most platforms, and a fuzzy, common, "usually correct" standard here
is better than no standard. The intent of RDCYCLE was primarily
performance monitoring/tuning, and the specification was written with
that goal in mind.
Of course, wall-clock time is measured separately with RDTIME. On
some simple platforms, cycle count might be usable as an alternative
to RDTIME, but I'd recommend platform aliases time counters with cycle
counters in that case to make code more portable, rather than changing
code to use cycle counter CSR for wall-clock time.
| In other words, is the cycle counter unique to the core, or specific
| to a hart?
It is per-core as stated in the definition. I'll add an edited
version of the above long discussion to standard commentary.
Krste
| —
| You are receiving this because you are subscribed to this thread.
| Reply to this email directly, view it on GitHub, or mute the thread.*
|
ok, thank you. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The current definition in rv32.tex:1171 reads:
Given that a core may support multiple harts (1.1 "... A RISC-V- compatible core might support multiple RISC-V-compatible hardware threads, or harts, through multithreading."), is the RDCYCLE definition as the "number of clock cycles executed by the processor core" correct?
Or it should read "number of clock cycle executed by the current hart", to account for each hart going to sleep differently and possibly not counting cycles at all in deep sleep?
In other words, is the cycle counter unique to the core, or specific to a hart?
The text was updated successfully, but these errors were encountered: