New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arch-riscv: Relation chain on RVV support #83
Conversation
The test that is failing is: This test is meant to fail given the changes on the vector register file done on these commits. Should we disable it? |
We should probably fix it before we merge this change. I think that we'll need to create a new checkpoint and update the test to use that new checkpoint. Give us a couple of days to look into this. @BobbyRBruce and @mkjost0 |
I'll work on a solution to this this afternoon. It shouldn't take too long. These checkpoint tests are quite an overhead to maintain ... |
@BobbyRBruce, don't forget to work on this checkpoint "fix" |
All you need to do to get the tests passing is change this line
checkpoint=Resource("riscv-hello-example-checkpoint") .
Though, in doing this, I discovered the "util/cpt_upgrader.py" did not work in this case. I logged this in Issue #106. I'm not familiar with the upgrader, but we'll probably need to fix this issue as we want to give people an easy way to upgrade their checkpoints to work with this. |
I think it would be good to implement the checkpoint upgrader #106 before merging this change. If you have questions on how to do this, let me know and I can try to help (I'm also not an expert 😄 ). |
4973c25
to
2bf0b25
Compare
Thanks Bobby, I have made the modification to the config file to use the new checkpoint restore test. Let's see if tests pass now. I can help on the cpt_upgrader if you want, I have an idea on what needs to be done there and can take a look at it on Monday. P.S: I had way too many develop branches on diferent remotes and made a bit of a mess on my first push, think it is fixed now. |
* Solves issue gem5#106 by updating the cpts with the necessary vector registers. Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585
Hello, we've been looking into this patch and faced an issue with O3 CPU. The reproducer is quite large and depends on our internal library - we are working on making it smaller and plan to share it here. Meanwhile, i wanted to share our current view and check if the understanding of the problem is correct. Basically, the reproducer looks like this:
What happens is: O3 CPU starts executing vsetvl №1 and saves type1/vl1 on decoder structure. Then It would create inst_squash - this can be any instruction that will cause a pipeline squash in future (it can be ld/st speculation or branch for example). Then vsetvl 2 is executed BEFORE inst_squash actually triggers pipeline squash. THis saves new type/vl on decoder. This is when pipeline is actually squashed and restarted from inst_squash. However, decoding of 1st vse instruction now uses wrong type/vl values cached on decoder from 2nd vsetvl execution. Does that view of the problem makes sense or we missing smth? It looks like for such scenarios we might need to maintain kind of stack data structure that would be rolled back in case of pipeline squashes (smth similar to bpu history state stack). However, it will require some mechanism to pass that info from CPU to decoder. Maybe seqNum. |
The problem you describe makes sense to me. And the solution of rolling back decoder state on pipeline squashes is pobably a good way to fix it, and is related to ongoing discussion on how to better implement vsetvl instructions. AFAIK the vsetvl instructions now stall younger instructions via a fake dependency, and there has been discussion to change vsetvl to behave like a control instruction (basically branch-like) to enable speculation around them. That is, speculate that the vl/vtype is the same as the previous and, if not, a rollback is done. Therefore, the described problem and the proposed solution can probably be used to rethink the vsetvl implementation and solve two problems in one shot. To be honest i'd like to see this current PR merged ASAP, and try to make new issues describing problems we are seeing to fix them (like this vsetvl issue). This will allow much faster progress as now this PR is blocking multiple improvements. I'd like some thoughts on this. Of course we should go through code review, etc., but if we merge this PR it will enable faster progress. BTW CI tests should pass now without problems. |
My bad, @aarmejach . I'll make sure this is reviewed ASAP with a goal of getting it merged this week. |
@aarmejach thanks. sounds good. Just out of curiosity, do you know of any actual public RTL implementation of this vector extension and how the vsetvl states are handled there? |
@vspetrov IIRC, the RVV standard states that From my understanding, if you're going to execute/schedule the vector instructions on an out-of-order core, the only option is to treat However, if you have a fully decoupled vector unit, then you would probably have more freedom in how you implement the That said, the current implementation in gem5 is going to be using the core (out-of-order or otherwise) to execute the vector instructions. So, at least in the longer term, we will likely implement In the short term, we should probably just fail (e.g., |
In case you're interested, the chapter 9 of the rvv spec has a note about control dependencies related to the vl register, https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#9-vector-memory-consistency-model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an initial review. I have a few more files I'd like to go through, but I don't expect any big hurdles.
Overall, this looks great. Let's get it in and improve on it from there!
src/arch/riscv/regs/vector.hh
Outdated
inline constexpr RegClass vecElemClass = | ||
RegClass(VecElemClass, VecElemClassName, NumVecRegs * NumVecElemPerVecReg, | ||
debug::VecRegs). | ||
ops(vecRegElemClassOps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should drop the vecElemClass commit because it's just solve the hanging problem for vsetvl
, vsetvli
and vsetivli
in minor CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, i removed this one and the other 2 commits specific to minor/o3.
The drawback is that now the restore-checkpoint
test will fail again. This has 2 implications:
- @BobbyRBruce could you fix the test again, basically the
regs.vector_element
field in the cpt now should be empty. - I will amend the PR util: fix cpt upgrader for rvv changes in PR #83 #115 on the cpt_upgrader now to take this change into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, i removed this one and the other 2 commits specific to minor/o3.
The drawback is that now the
restore-checkpoint
test will fail again. This has 2 implications:
- @BobbyRBruce could you fix the test again, basically the
regs.vector_element
field in the cpt now should be empty.- I will amend the PR util: fix cpt upgrader for rvv changes in PR #83 #115 on the cpt_upgrader now to take this change into account.
I've updated the restore checkpoint. It should work with the PR as it currently stands.
The shareable reproducer has been prepared: |
Thanks @vmskv. I think we should open an issue with the description of the problem and the reproducible, which will be the motivation to modify the current |
I went ahead and added a couple of commits to this chain. If it's OK with you, I can simply push them here. Otherwise, you can pull them from my repo: https://github.com/powerjg/gem5/commits/pr/aarmejach/83 I fixed some style problems and I added a fatal if RVV is used with o3 or minor. |
@powerjg feel free to push them here. @BobbyRBruce a quick way to fix the restore-checkpoint test again is to run the cpt_upgrader of PR #115 over the checkpoint used in Hopefully, this should get us close to merge this. I am going into vacation mode, but will monitor this to make sure we get it in asap. |
Hi @powerjg, The additional commits are LGTM. However, I think it will be better to move the
(nit) Riscv decoder can get |
Thanks for the feedback @rogerchang23424, I took your suggestions into account. @aarmejach, I had to force push. I hope that doesn't cause a big headache for you. There's one more comment to address from Roger and we need to figure out the checkpoint. After that, I think we're good to go. |
This commit add regs and configs for vector extension * Add 32 vector arch regs as spec defined and 8 internal regs for uop-based vector implementation. * Add default vector configs(VLEN = 256, ELEN = 64). These cannot be changed yet, since the vector implementation has only be tested with such configs. * Add disassamble register name v0~v31 and vtmp0~vtmp7. * Add CSR registers defined in RISCV Vector Spec v1.0. * Add vector bitfields. * Add vector operand_types and operands. Change-Id: I7bbab1ee9e0aa804d6f15ef7b77fac22d4f7212a Co-authored-by: Yang Liu <numbksco@gmail.com> Co-authored-by: Fan Yang <1209202421@qq.com> Co-authored-by: Jerin Joy <joy@rivosinc.com> arch-riscv: enable rvv flags only for RV64 Change-Id: I6586e322dfd562b598f63a18964d17326c14d4cf
Change-Id: I84363164ca327151101e8a1c3d8441a66338c909 Co-authored-by: Yang Liu <numbksco@gmail.com> Co-authored-by: Fan Yang <1209202421@qq.com> arch-riscv: Add a todo to fix vsetvl stall on decode Change-Id: Iafb129648fba89009345f0c0ad3710f773379bf6
* TODOs: + Vector Segment Load/Store + Vector Fault-only-first Load Change-Id: I2815c76404e62babab7e9466e4ea33ea87e66e75 Co-authored-by: Yang Liu <numbksco@gmail.com> Co-authored-by: Fan Yang <1209202421@qq.com> Co-authored-by: Jerin Joy <joy@rivosinc.com>
TODOs: + vcompress.vm Change-Id: I86eceae66e90380416fd3be2c10ad616512b5eba Co-authored-by: Yang Liu <numbksco@gmail.com> Co-authored-by: Fan Yang <1209202421@qq.com> Co-authored-by: Jerin Joy <joy@rivosinc.com> arch-riscv: Add LICENCE to template files Change-Id: I825e72bffb84cce559d2e4c1fc2246c3b05a1243
Change-Id: I019fc6394a03196711ab52533ad8062b22c89daf
Since the O3 and Minor CPU models do not support RVV right now as the implementation stalls the decode until vsetvl instructions are exectued, this change calls `fatal` if RVV is not explicitly enabled. It is possible to override this if you explicitly enable RVV in the config file. Change-Id: Ia801911141bb2fb2bedcff3e139bf41ba8936085 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Minor style fixes in vector code Change-Id: If0de45a2dbfb5d5aaa65ed3b5d91d9bee9bcc960 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Current implementation of vset*vl* instructions serialize pipeline and are non-speculative. Change-Id: Ibf93b60133fb3340690b126db12827e36e2c202d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let Jason and Bobby do final review
Sure, all comments are addressed and tests should pass. Once a final review is done feel free to hit the button. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aarmejach for pushing this through!
Thanks for the hard work everyone! |
* Solves issue gem5#106 by updating the cpts with the necessary vector registers. Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585 util: clear vector_element field Change-Id: I6c9ec4e71f66722b26de030fa139cd626bdb24dc
* Solves issue gem5#106 by updating the cpts with the necessary vector registers. Change-Id: Ifeda90e96097f0b0a65338c6b22a8258c932c585 util: clear vector_element field Change-Id: I6c9ec4e71f66722b26de030fa139cd626bdb24dc
This is working around an existing SMT issue [1]. The BaseO3CPU uses two physical matrix registers [2]. This is enough for a single threaded CPU which as of now uses 1 architectural matrix only. The problem arises when SMT is enabled. As 2 architectural matrices need to be supported by a single CPU, the O3CPU won't have any available register in the freeList for renaming. This causes the SMT O3CPU to indefinitely stall renaming [3] If the archtectural number of registers is seto to 0, the regclass won't be taken into consideration when evaluating if we can rename instructions. This issue has been implicitly fixed for RISCV by a preceding PR [4] [1]: gem5#668 [2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170 [3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228 [4]: gem5#83 Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is working around an existing SMT issue [1]. The BaseO3CPU uses two physical matrix registers [2]. This is enough for a single threaded CPU which as of now uses 1 architectural matrix only. The problem arises when SMT is enabled. As 2 architectural matrices need to be supported by a single CPU, the O3CPU won't have any available register in the freeList for renaming. This causes the SMT O3CPU to indefinitely stall renaming [3] If the archtectural number of registers is seto to 0, the regclass won't be taken into consideration when evaluating if we can rename instructions. This issue has been implicitly fixed for RISCV by a preceding PR [4] [1]: gem5#668 [2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170 [3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228 [4]: gem5#83 Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is working around an existing SMT issue [1]. The BaseO3CPU uses two physical matrix registers [2]. This is enough for a single threaded CPU which as of now uses 1 architectural matrix only. The problem arises when SMT is enabled. As 2 architectural matrices need to be supported by a single CPU, the O3CPU won't have any available register in the freeList for renaming. This causes the SMT O3CPU to indefinitely stall renaming [3] If the archtectural number of registers is seto to 0, the regclass won't be taken into consideration when evaluating if we can rename instructions. This issue has been implicitly fixed for RISCV by a preceding PR [4] [1]: #668 [2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/BaseO3CPU.py#L170 [3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename.cc#L1228 [4]: #83 Change-Id: I99bfdefff11a246b1f191251dc67689e95b3f0db Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Gerrit relation chain on RVV support found here:
https://gem5-review.googlesource.com/c/public/gem5/+/68418/15