Cost of vsetvl Instructions #642
Comments
Implementations should assume vsetvl instructions will be frequent, and design microarchs accordingly. In particular, OoO implementations should not wait till commit and flush pipeline on vl and vtype CSR updates. The values can be renamed/bypassed before commit. |
Thanks Krste, so those vsetvl instructions updating CSRs would be costly, thus software should be aware that frequent updates should be avoided in iterations. |
I'd state it as software will want to update these CSRs a lot, so hardware should make updates fast. |
This is a reason why Average OoO cores can execute |
@daweili1226 note that a future 64 bit encoding of vector instructions is likely to include the vtype in every instruction. You should maybe look at the current vsetvli instruction and associated CSR as effectively presenting a few more opcode bits than fit in the current 32 bit encoding, and the current vtype should be carried along with each instruction in the pipeline. Assuming you want maximum performance, anyway. If your machine's vector instructions are executed over a few beats then the overhead of a bubble after each vsetvli might not be too awful. |
I don't see anything actionable here, so closing the issue. |
For the V extension, I see many vsetvl instructions in code examples, and these vsetvl(s) are executed every iteration that possibly update global settings, which might create bubbles waiting for its commit for Out of Order CPU models (correct me if wrong), but the V design seems assuming that vsetvl instructions are cheap in a CPU pipeline, and I guess there should be doc or previous discussions on the cost of vsetvl instructions. Can somebody explain the cost of vsetvl instructions?
The text was updated successfully, but these errors were encountered: