Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch-riscv: Relation chain on RVV support #83

Merged
merged 8 commits into from Aug 3, 2023
Merged

Conversation

aarmejach
Copy link
Contributor

Gerrit relation chain on RVV support found here:
https://gem5-review.googlesource.com/c/public/gem5/+/68418/15

@mattsinc mattsinc self-requested a review July 15, 2023 22:06
@aarmejach
Copy link
Contributor Author

The test that is failing is: test-gem5-library-riscv-hello-restore-checkpoint-ALL-x86_64-opt

This test is meant to fail given the changes on the vector register file done on these commits. Should we disable it?

@powerjg
Copy link
Contributor

powerjg commented Jul 17, 2023

We should probably fix it before we merge this change.

I think that we'll need to create a new checkpoint and update the test to use that new checkpoint. Give us a couple of days to look into this. @BobbyRBruce and @mkjost0

@BobbyRBruce
Copy link
Member

I'll work on a solution to this this afternoon. It shouldn't take too long.

These checkpoint tests are quite an overhead to maintain ...

@BobbyRBruce BobbyRBruce added cpu-minor gem5's Minor CPU cpu-o3 gem5's Out-Of-Order CPU arch-riscv The RISC-V ISA labels Jul 18, 2023
@powerjg powerjg self-assigned this Jul 20, 2023
@powerjg
Copy link
Contributor

powerjg commented Jul 20, 2023

@BobbyRBruce, don't forget to work on this checkpoint "fix"

@BobbyRBruce
Copy link
Member

All you need to do to get the tests passing is change this line

checkpoint=Resource("riscv-hello-example-checkpoint-v23"),
to checkpoint=Resource("riscv-hello-example-checkpoint").

Though, in doing this, I discovered the "util/cpt_upgrader.py" did not work in this case. I logged this in Issue #106. I'm not familiar with the upgrader, but we'll probably need to fix this issue as we want to give people an easy way to upgrade their checkpoints to work with this.

@powerjg
Copy link
Contributor

powerjg commented Jul 20, 2023

I think it would be good to implement the checkpoint upgrader #106 before merging this change. If you have questions on how to do this, let me know and I can try to help (I'm also not an expert 😄 ).

@aarmejach aarmejach force-pushed the develop branch 2 times, most recently from 4973c25 to 2bf0b25 Compare July 21, 2023 16:44
@aarmejach
Copy link
Contributor Author

aarmejach commented Jul 21, 2023

Thanks Bobby, I have made the modification to the config file to use the new checkpoint restore test. Let's see if tests pass now.

I can help on the cpt_upgrader if you want, I have an idea on what needs to be done there and can take a look at it on Monday.

P.S: I had way too many develop branches on diferent remotes and made a bit of a mess on my first push, think it is fixed now.

aarmejach pushed a commit to aarmejach/gem5 that referenced this pull request Jul 24, 2023
  * Solves issue gem5#106 by updating the cpts with the necessary vector
    registers.

Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585
@vspetrov
Copy link

vspetrov commented Jul 24, 2023

Hello, we've been looking into this patch and faced an issue with O3 CPU. The reproducer is quite large and depends on our internal library - we are working on making it smaller and plan to share it here. Meanwhile, i wanted to share our current view and check if the understanding of the problem is correct.

Basically, the reproducer looks like this:

vsetvl (type1, vl1) //First vset vl instruction that sets custom type/vl for decode
inst_squash // any instruction that will cause a pipeline squash
vse // Any vector instruction that relies on type1,vl1 durin decode
...
vsetvl (type2, vl2) // Second vset vl with different type/vl
vse // any vector instr that uses type2/vl2

What happens is: O3 CPU starts executing vsetvl №1 and saves type1/vl1 on decoder structure. Then It would create inst_squash - this can be any instruction that will cause a pipeline squash in future (it can be ld/st speculation or branch for example). Then vsetvl 2 is executed BEFORE inst_squash actually triggers pipeline squash. THis saves new type/vl on decoder. This is when pipeline is actually squashed and restarted from inst_squash. However, decoding of 1st vse instruction now uses wrong type/vl values cached on decoder from 2nd vsetvl execution.

Does that view of the problem makes sense or we missing smth?

It looks like for such scenarios we might need to maintain kind of stack data structure that would be rolled back in case of pipeline squashes (smth similar to bpu history state stack). However, it will require some mechanism to pass that info from CPU to decoder. Maybe seqNum.

@aarmejach
Copy link
Contributor Author

The problem you describe makes sense to me. And the solution of rolling back decoder state on pipeline squashes is pobably a good way to fix it, and is related to ongoing discussion on how to better implement vsetvl instructions.

AFAIK the vsetvl instructions now stall younger instructions via a fake dependency, and there has been discussion to change vsetvl to behave like a control instruction (basically branch-like) to enable speculation around them. That is, speculate that the vl/vtype is the same as the previous and, if not, a rollback is done. Therefore, the described problem and the proposed solution can probably be used to rethink the vsetvl implementation and solve two problems in one shot.

To be honest i'd like to see this current PR merged ASAP, and try to make new issues describing problems we are seeing to fix them (like this vsetvl issue). This will allow much faster progress as now this PR is blocking multiple improvements.

I'd like some thoughts on this. Of course we should go through code review, etc., but if we merge this PR it will enable faster progress.

BTW CI tests should pass now without problems.

@powerjg
Copy link
Contributor

powerjg commented Jul 25, 2023

My bad, @aarmejach . I'll make sure this is reviewed ASAP with a goal of getting it merged this week.

@vspetrov
Copy link

@aarmejach thanks. sounds good. Just out of curiosity, do you know of any actual public RTL implementation of this vector extension and how the vsetvl states are handled there?

@powerjg
Copy link
Contributor

powerjg commented Jul 25, 2023

@vspetrov IIRC, the RVV standard states that vsetvl should be treated as a control instruction. Although, I don't have the exact citation for you.

From my understanding, if you're going to execute/schedule the vector instructions on an out-of-order core, the only option is to treat vsetvl as a control instruction. The performance will significantly suffer or correctness will be impacted, if you stall younger instructions after the vsetvl, at least with how many compilers generate code.

However, if you have a fully decoupled vector unit, then you would probably have more freedom in how you implement the vsetvl instruction (e.g., you could execute that instruction in the front-end).

That said, the current implementation in gem5 is going to be using the core (out-of-order or otherwise) to execute the vector instructions. So, at least in the longer term, we will likely implement vsetvl as a control instruction.

In the short term, we should probably just fail (e.g., fatal) if trying to use the O3CPU with vector extensions enabled until we implement vsetvl in a more accurate way.

@hnpl
Copy link
Contributor

hnpl commented Jul 26, 2023

In case you're interested, the chapter 9 of the rvv spec has a note about control dependencies related to the vl register,

https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#9-vector-memory-consistency-model

Copy link
Contributor

@powerjg powerjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an initial review. I have a few more files I'd like to go through, but I don't expect any big hurdles.

Overall, this looks great. Let's get it in and improve on it from there!

src/arch/riscv/isa.cc Show resolved Hide resolved
src/arch/riscv/decoder.cc Show resolved Hide resolved
src/arch/riscv/isa/templates/templates.isa Outdated Show resolved Hide resolved
src/cpu/minor/fetch2.cc Outdated Show resolved Hide resolved
src/arch/riscv/isa.cc Outdated Show resolved Hide resolved
inline constexpr RegClass vecElemClass =
RegClass(VecElemClass, VecElemClassName, NumVecRegs * NumVecElemPerVecReg,
debug::VecRegs).
ops(vecRegElemClassOps);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop the vecElemClass commit because it's just solve the hanging problem for vsetvl, vsetvli and vsetivli in minor CPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, i removed this one and the other 2 commits specific to minor/o3.

The drawback is that now the restore-checkpoint test will fail again. This has 2 implications:

  1. @BobbyRBruce could you fix the test again, basically the regs.vector_element field in the cpt now should be empty.
  2. I will amend the PR util: fix cpt upgrader for rvv changes in PR #83 #115 on the cpt_upgrader now to take this change into account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, i removed this one and the other 2 commits specific to minor/o3.

The drawback is that now the restore-checkpoint test will fail again. This has 2 implications:

  1. @BobbyRBruce could you fix the test again, basically the regs.vector_element field in the cpt now should be empty.
  2. I will amend the PR util: fix cpt upgrader for rvv changes in PR #83 #115 on the cpt_upgrader now to take this change into account.

I've updated the restore checkpoint. It should work with the PR as it currently stands.

@vmskv
Copy link

vmskv commented Jul 28, 2023

Hello, we've been looking into this patch and faced an issue with O3 CPU. The reproducer is quite large and depends on our internal library - we are working on making it smaller and plan to share it here. Meanwhile, i wanted to share our current view and check if the understanding of the problem is correct.

Basically, the reproducer looks like this:

vsetvl (type1, vl1) //First vset vl instruction that sets custom type/vl for decode
inst_squash // any instruction that will cause a pipeline squash
vse // Any vector instruction that relies on type1,vl1 durin decode
...
vsetvl (type2, vl2) // Second vset vl with different type/vl
vse // any vector instr that uses type2/vl2

What happens is: O3 CPU starts executing vsetvl №1 and saves type1/vl1 on decoder structure. Then It would create inst_squash - this can be any instruction that will cause a pipeline squash in future (it can be ld/st speculation or branch for example). Then vsetvl 2 is executed BEFORE inst_squash actually triggers pipeline squash. THis saves new type/vl on decoder. This is when pipeline is actually squashed and restarted from inst_squash. However, decoding of 1st vse instruction now uses wrong type/vl values cached on decoder from 2nd vsetvl execution.

Does that view of the problem makes sense or we missing smth?

It looks like for such scenarios we might need to maintain kind of stack data structure that would be rolled back in case of pipeline squashes (smth similar to bpu history state stack). However, it will require some mechanism to pass that info from CPU to decoder. Maybe seqNum.

The shareable reproducer has been prepared:
https://github.com/vmskv/gem5/tree/vmskv/rvv_bad_o3_vsetvl_execuion/riscv_o3_bad_vsetvl_execution_reproducer
In out-of-order(O3CPU) mode if vsetvl is in mispredicted branch it changes the 'vl' value in decoder and the vector instructions afterwards are decoded/executed bad after the mispredicted vsetvl squashed.

@aarmejach aarmejach changed the title arch-riscv,cpu-o3,cpu-minor: Relation chain on RVV support arch-riscv: Relation chain on RVV support Jul 28, 2023
@aarmejach
Copy link
Contributor Author

Thanks @vmskv. I think we should open an issue with the description of the problem and the reproducible, which will be the motivation to modify the current vset* implementation.

@aarmejach aarmejach requested a review from powerjg July 28, 2023 14:58
@powerjg
Copy link
Contributor

powerjg commented Jul 28, 2023

I went ahead and added a couple of commits to this chain. If it's OK with you, I can simply push them here. Otherwise, you can pull them from my repo: https://github.com/powerjg/gem5/commits/pr/aarmejach/83

I fixed some style problems and I added a fatal if RVV is used with o3 or minor.

@aarmejach
Copy link
Contributor Author

@powerjg feel free to push them here.

@BobbyRBruce a quick way to fix the restore-checkpoint test again is to run the cpt_upgrader of PR #115 over the checkpoint used in checkpoint=Resource("riscv-hello-example-checkpoint") that this PR is using.

Hopefully, this should get us close to merge this. I am going into vacation mode, but will monitor this to make sure we get it in asap.

src/arch/riscv/isa/decoder.isa Outdated Show resolved Hide resolved
src/arch/riscv/isa/decoder.isa Outdated Show resolved Hide resolved
src/arch/riscv/isa/decoder.isa Outdated Show resolved Hide resolved
@vmskv
Copy link

vmskv commented Jul 31, 2023

Thanks @vmskv. I think we should open an issue with the description of the problem and the reproducible, which will be the motivation to modify the current vset* implementation.

The issue is opened #144

@rogerchang23424
Copy link
Contributor

rogerchang23424 commented Jul 31, 2023

Hi @powerjg,

The additional commits are LGTM. However, I think it will be better to move the enable_rvv option from RiscvDecorder to RiscvISA as it is one of isa configuration. There are two benefits of this.

  1. Some of isa options riscv_type and check_alignment are put in RiscvISA. If the enable_rvv is in RiscvISA, it will be easier for user to configure.
  2. The initialization of mstatus and misa CSR are depending on the enable_rvv option.

(nit) Riscv decoder can get enable_rvv option by (dynamic_cast<ISA*>(isa))->enableRvv(); .

@powerjg
Copy link
Contributor

powerjg commented Aug 1, 2023

Thanks for the feedback @rogerchang23424, I took your suggestions into account.

@aarmejach, I had to force push. I hope that doesn't cause a big headache for you.

There's one more comment to address from Roger and we need to figure out the checkpoint. After that, I think we're good to go.

huxuan0307 and others added 8 commits August 2, 2023 14:46
This commit add regs and configs for vector extension

* Add 32 vector arch regs as spec defined and 8 internal regs for
  uop-based vector implementation.
* Add default vector configs(VLEN = 256, ELEN = 64). These cannot
  be changed yet, since the vector implementation has only be tested
  with such configs.
* Add disassamble register name v0~v31 and vtmp0~vtmp7.
* Add CSR registers defined in RISCV Vector Spec v1.0.
* Add vector bitfields.
* Add vector operand_types and operands.

Change-Id: I7bbab1ee9e0aa804d6f15ef7b77fac22d4f7212a
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>

arch-riscv: enable rvv flags only for RV64

Change-Id: I6586e322dfd562b598f63a18964d17326c14d4cf
Change-Id: I84363164ca327151101e8a1c3d8441a66338c909
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>

arch-riscv: Add a todo to fix vsetvl stall on decode

Change-Id: Iafb129648fba89009345f0c0ad3710f773379bf6
* TODOs:
  + Vector Segment Load/Store
  + Vector Fault-only-first Load

Change-Id: I2815c76404e62babab7e9466e4ea33ea87e66e75
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
TODOs:
  + vcompress.vm

Change-Id: I86eceae66e90380416fd3be2c10ad616512b5eba
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>

arch-riscv: Add LICENCE to template files

Change-Id: I825e72bffb84cce559d2e4c1fc2246c3b05a1243
Change-Id: I019fc6394a03196711ab52533ad8062b22c89daf
Since the O3 and Minor CPU models do not support RVV right now as the
implementation stalls the decode until vsetvl instructions are exectued,
this change calls `fatal` if RVV is not explicitly enabled.

It is possible to override this if you explicitly enable RVV in the
config file.

Change-Id: Ia801911141bb2fb2bedcff3e139bf41ba8936085
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Minor style fixes in vector code

Change-Id: If0de45a2dbfb5d5aaa65ed3b5d91d9bee9bcc960
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Current implementation of vset*vl* instructions serialize pipeline and
are non-speculative.

Change-Id: Ibf93b60133fb3340690b126db12827e36e2c202d
Copy link
Contributor

@rogerchang23424 rogerchang23424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let Jason and Bobby do final review

@aarmejach
Copy link
Contributor Author

Sure, all comments are addressed and tests should pass.

Once a final review is done feel free to hit the button.

Copy link
Contributor

@powerjg powerjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aarmejach for pushing this through!

@powerjg powerjg merged commit 5eda9fe into gem5:develop Aug 3, 2023
5 checks passed
@powerjg
Copy link
Contributor

powerjg commented Aug 3, 2023

Thanks for the hard work everyone!

aarmejach pushed a commit to aarmejach/gem5 that referenced this pull request Aug 4, 2023
  * Solves issue gem5#106 by updating the cpts with the necessary vector
    registers.

Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585

util: clear vector_element field

Change-Id: I6c9ec4e71f66722b26de030fa139cd626bdb24dc
powerjg added a commit that referenced this pull request Aug 4, 2023
Solves issue #106 by updating the cpts with the necessary vector
registers.
alvaromorenomv pushed a commit to alvaromorenomv/gem5 that referenced this pull request Aug 9, 2023
  * Solves issue gem5#106 by updating the cpts with the necessary vector
    registers.

Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585

util: clear vector_element field

Change-Id: I6c9ec4e71f66722b26de030fa139cd626bdb24dc
giactra added a commit to giactra/gem5 that referenced this pull request Feb 1, 2024
This is working around an existing SMT issue [1].

The BaseO3CPU uses two physical matrix registers [2]. This is
enough for a single threaded CPU which as of now uses
1 architectural matrix only.

The problem arises when SMT is enabled.  As 2 architectural matrices
need to be supported by a single CPU, the O3CPU won't have any available
register in the freeList for renaming.  This causes the SMT O3CPU to
indefinitely stall renaming [3]

If the archtectural number of registers is seto to 0, the regclass won't
be taken into consideration when evaluating if we can rename
instructions.

This issue has been implicitly fixed for RISCV by a preceding PR [4]

[1]: gem5#668
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170
[3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228
[4]: gem5#83

Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
giactra added a commit to giactra/gem5 that referenced this pull request Feb 2, 2024
This is working around an existing SMT issue [1].

The BaseO3CPU uses two physical matrix registers [2]. This is
enough for a single threaded CPU which as of now uses
1 architectural matrix only.

The problem arises when SMT is enabled.  As 2 architectural matrices
need to be supported by a single CPU, the O3CPU won't have any available
register in the freeList for renaming.  This causes the SMT O3CPU to
indefinitely stall renaming [3]

If the archtectural number of registers is seto to 0, the regclass won't
be taken into consideration when evaluating if we can rename
instructions.

This issue has been implicitly fixed for RISCV by a preceding PR [4]

[1]: gem5#668
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170
[3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228
[4]: gem5#83

Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
BobbyRBruce pushed a commit that referenced this pull request Mar 5, 2024
This is working around an existing SMT issue [1].

The BaseO3CPU uses two physical matrix registers [2]. This is
enough for a single threaded CPU which as of now uses
1 architectural matrix only.

The problem arises when SMT is enabled.  As 2 architectural matrices
need to be supported by a single CPU, the O3CPU won't have any available
register in the freeList for renaming.  This causes the SMT O3CPU to
indefinitely stall renaming [3]

If the archtectural number of registers is seto to 0, the regclass won't
be taken into consideration when evaluating if we can rename
instructions.

This issue has been implicitly fixed for RISCV by a preceding PR [4]

[1]: #668
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170
[3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228
[4]: #83

Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv The RISC-V ISA cpu-minor gem5's Minor CPU cpu-o3 gem5's Out-Of-Order CPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants