Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch,cpu,sim: Add mechanism to partially print vector regs #1234

Merged
merged 3 commits into from
Jun 17, 2024

Conversation

hnpl
Copy link
Contributor

@hnpl hnpl commented Jun 13, 2024

Currently, gem5's inst tracer prints the whole vector register container by default. The size of vector register containers in gem5 is the maximum size allowed by the ISA. For vector-length agnostic (VLA) vector registers, this means ARM SVE vector container is 2048 bits long, and RISC-V vector container is 65535 bits long. Note that VLA implementation in gem5 allows the vector length to be varied within the limit specified by the ISAs.

However, in most use cases of gem5, the vector length is much less than 65535 bits. This causes two issues: (1) the vector container requires allocating and moving around a large amount of unused data while only a fraction of it is used, and (2) printing the execution trace of a vector register results in a wall of text with a small amount of useful data.

This change addresses the problem (2) by providing a mechanism to limit the amount data printed by the instruction tracer. This is done by adding a function printing the first X bits of a vector register container, where X is the vector length determined at runtime, as opposed to the vector container size, which is determined at compilation time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7

@hnpl hnpl force-pushed the vreg-trace-fix branch 3 times, most recently from 3a07bf0 to ae5f9b7 Compare June 13, 2024 02:29
@giactra giactra self-requested a review June 13, 2024 09:06
@hnpl
Copy link
Contributor Author

hnpl commented Jun 13, 2024

I tested this change set by producing the traces from a simple RVV binary with and without this change set.

The desired outcome is to only have differences in the amount of vector register content printed by the tracer. It looks like the output matches the expected outcome.

image

I'll test this change with SVE binaries and probably some x86 binaries with SSE* instructions.

I also tested this change set by compiling build/ALL/gem5.opt and it worked.

@hnpl hnpl marked this pull request as ready for review June 13, 2024 19:45
Copy link
Contributor

@powerjg powerjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it looks good to me. One small thing, but if you disagree, then it's good to go from my point of view.

src/cpu/reg_class.hh Show resolved Hide resolved
@hnpl hnpl force-pushed the vreg-trace-fix branch 2 times, most recently from 563a49f to fe7533f Compare June 13, 2024 21:55
hnpl added 3 commits June 14, 2024 00:19
DataStatus is used by InstTracer to determine the data format of the content
of the destination register of an instruction. The types consists of integer
types (`DataInt*`), a floating point type (`DataDouble`), and a vector type
(`DataReg`).

Currently, the `setData(RegClass, *)` function assumes the register value
to be of type `DataReg`, which means the content of the register is an
array of values, for every `RegClass`. However, there are cases in the ISA
implementation that the `setData` function above is used for writing an
integer to the trace.

This change addresses this issue by setting DataStatus accordingly to RegClass.

Change-Id: I79423a4942ab2a3fde5c9cf86de0d1fced648cf0
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Vector-length Agnostic (VLA) is a style of vectorization in which the
vectorized code can run on any vector length implemented by the
microarchitecture. This vectorization style appears in SVE/SVE2 (an
extension of ARM ISA) and RVV (an extension of RISC-V ISA).

This change adds a function to the gem5's ISA interface returning
the vector length of the VLA registers. For ARM ISA, it returns
SVE_VL. For RISC-V ISA, it returns VLEN. For other ISAs, it returns -1.

Change-Id: I8e622c9ab47a36770479c0ff68a15522602c82bf
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector registers,
this means ARM SVE vector container is 2048 bits long, and RISC-V vector container
is 65535 bits long. Note that VLA implementation in gem5 allows the vector length
to be varied within the limit specified by the ISAs.

However, in most use cases of gem5, the vector length is much less than 65535 bits.
This causes two issues: (1) the vector container requires allocating and moving
around a large amount of unused data while only a fraction of it is used, and (2) printing
the execution trace of a vector register results in a wall of text with a small amount of
useful data.

This change addresses the problem (2) by providing a mechanism to limit the amount
data printed by the instruction tracer. This is done by adding an function printing
the first X bits of a vector register container, where X is the vector length determined
at runtime, as opposed to the vector container size, which is determined at compilation time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
@hnpl
Copy link
Contributor Author

hnpl commented Jun 14, 2024

It seems to work properly with SVE instructions.

image

dataStatus = DataReg;
switch (reg_class.type()) {
case IntRegClass:
case MiscRegClass:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really confident about this part. I'm not entirely sure that MiscRegs and CCRegs are always not larger than 64 bits.

@hnpl
Copy link
Contributor Author

hnpl commented Jun 14, 2024

I just realized that the x86 instructions modifying xmm registers are broken down into micro-ops and do not need the vector container printing method. So the SSE* instructions will have the same trace as before.

@ivanaamit ivanaamit added sim General gem5 Simulation Components arch General gem5 architecture-specific components cpu General gem5 CPU code (e.g., `BaseCPU`) labels Jun 17, 2024
@BobbyRBruce BobbyRBruce merged commit 15e0236 into gem5:develop Jun 17, 2024
71 checks passed
@hnpl hnpl deleted the vreg-trace-fix branch June 18, 2024 08:07
BobbyRBruce added a commit to BobbyRBruce/gem5 that referenced this pull request Jun 20, 2024
Introduced in gem5#1234, this caused compilation to faill in Apple Silicon
systems. This bug is the same as gem5#582 where a more detailed explanation
is provided.

Change-Id: If186b43c41df6b1da009dc9409cb6facac79fa4f
BobbyRBruce added a commit to BobbyRBruce/gem5 that referenced this pull request Jun 20, 2024
Introduced in gem5#1234, this caused compilation to fail in Apple Silicon
systems. This bug is the same as gem5#582 where a more detailed explanation
is provided.

Fixed by changing `num_bytes` parameter to `size_t`.

Change-Id: If186b43c41df6b1da009dc9409cb6facac79fa4f
BobbyRBruce added a commit that referenced this pull request Jun 20, 2024
Introduced in #1234, this caused compilation to faill in Apple Silicon
systems. This bug is the same as #582 where a more detailed explanation
is provided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch General gem5 architecture-specific components cpu General gem5 CPU code (e.g., `BaseCPU`) sim General gem5 Simulation Components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants