-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce preliminary macro operation fusion #132
Conversation
bb1b72f
to
3ae3059
Compare
You shall show some numbers to illustrate how we can benefit from macro operation fusion. |
3ae3059
to
b66f4dc
Compare
b66f4dc
to
3f84ce5
Compare
3f84ce5
to
1f9cbea
Compare
@@ -1219,6 +1220,60 @@ RVOP(cswsp, { | |||
}) | |||
#endif | |||
|
|||
/* auipc + addi */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to manipulate the sequence lui
+ addi
?
See #81 (comment)
Disassembly of CoreMark:
10324: 000087b7 lui a5,0x8
10328: b0578793 addi a5,a5,-1275 # 0x7b05
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible. however, there are some problems when running qrcode.elf if we import this pattern, so I skip it in this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible. however, there are some problems when running qrcode.elf if we import this pattern, so I skip it in this pull request.
Add a comment starting with "FIXME: lui + addi"
rv->PC += ir->insn_len * (ir->imm2 - 1); | ||
}) | ||
|
||
/* multiple lw */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lw
is the most frequent instruction (see #34), and we might dive into its use case more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you handle the following case? (disassembly from CoreMark
10248: 03012603 lw a2,48(sp)
1024c: 01c11583 lh a1,28(sp)
10250: 03412503 lw a0,52(sp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, consider the following scenario:
10a84: 01c12083 lw ra,28(sp)
10a88: 07f47513 andi a0,s0,127
10a8c: 01812403 lw s0,24(sp)
10a90: 01412483 lw s1,20(sp)
10a94: 01012903 lw s2,16(sp)
10a98: 00c12983 lw s3,12(sp)
It can be regarded as 5 lw
. Roughly speaking, if peephole optimization can be applied, we shall benefit from further optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case: (disassembly from CoreMark)
10c08: 01162023 sw a7,0(a2)
10c0c: 00052783 lw a5,0(a0)
10c10: 00059883 lh a7,0(a1)
10c14: 00259603 lh a2,2(a1)
10c18: 00f82023 sw a5,0(a6)
10c1c: 01052023 sw a6,0(a0)
10c20: 00e82223 sw a4,4(a6)
10c24: 0006a783 lw a5,0(a3)
Mixture of sw
and lw
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case: (disassembly from CoreMark)
10c08: 01162023 sw a7,0(a2) 10c0c: 00052783 lw a5,0(a0) 10c10: 00059883 lh a7,0(a1) 10c14: 00259603 lh a2,2(a1) 10c18: 00f82023 sw a5,0(a6) 10c1c: 01052023 sw a6,0(a0) 10c20: 00e82223 sw a4,4(a6) 10c24: 0006a783 lw a5,0(a3)
Mixture of
sw
andlw
.
In this case, the memory address is not contiguous, what we can do just pack these instructions, but we cannot save any operation, such as checking misaligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, consider the following scenario:
10a84: 01c12083 lw ra,28(sp) 10a88: 07f47513 andi a0,s0,127 10a8c: 01812403 lw s0,24(sp) 10a90: 01412483 lw s1,20(sp) 10a94: 01012903 lw s2,16(sp) 10a98: 00c12983 lw s3,12(sp)
It can be regarded as 5
lw
. Roughly speaking, if peephole optimization can be applied, we shall benefit from further optimizations.
In this case, we can pack the last four instruction lw. if we want to handle this case by packing 5 lw, we need to reorder the instruction. For example, swap the first and the second instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you handle the following case? (disassembly from CoreMark
10248: 03012603 lw a2,48(sp) 1024c: 01c11583 lh a1,28(sp) 10250: 03412503 lw a0,52(sp)
Ditto, if we want to handle this case, we need some strategies to reorder the instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this pull request, let's concentrate on preliminary support of macro operation fusion. You shall add some comments for further efforts such as instruction reordering.
a7b8455
to
8933804
Compare
8933804
to
fc9c3b8
Compare
fc9c3b8
to
56b14b8
Compare
56b14b8
to
743110f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some FIXME/TODO comments which address more macro operation fusion we can pay attention to.
743110f
to
9636542
Compare
Through our observations, we have identified certain patterns in instruction sequences. By converting these specific RISC-V instruction patterns into faster and equivalent code, we can significantly improve execution efficiency. In our current analysis, we focus on a commonly used benchmark and have found the following frequently occurring instruction patterns: auipc + addi, auipc + add, multiple sw, and multiple lw. | Metric | commit fba5802 | macro fuse operation |Speedup| |----------+--------------------------+---------------------------+-------| | CoreMark | 1351.065 (Iterations/Sec)| 1352.843 (Iterations/Sec)|+0.13% | | dhrystone| 1073 DMIPS | 1146 DMIPS | +6.8% | | nqueens | 8295 msec | 7824 msec | +6.0% |
9636542
to
18213bc
Compare
In debug mode, the Therefore, we cannot do fuse operation in debug mode. |
Through our observations, we have identified certain patterns in instruction sequences. By converting these specific RISC-V instruction patterns into faster and equivalent code, we can significantly improve execution efficiency.
In our current analysis, we focus on a commonly used benchmark and have found the following frequently occurring instruction patterns: auipc + addi, auipc + add, multiple sw, and multiple lw.