New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf,x64: Use BMI2 for shifts #3726
Conversation
Master branch: dbdea9b |
Master branch: 230bf13 |
1e466ae
to
bfd46df
Compare
Master branch: bec2171 |
bfd46df
to
2a41db2
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=680036 expired. Closing PR. |
Master branch: 87dbdc2 |
2a41db2
to
706c4f3
Compare
Master branch: aa55dfd |
706c4f3
to
32f4249
Compare
Master branch: 8526f0d |
32f4249
to
4c4077b
Compare
Master branch: 5ee35ab |
4c4077b
to
9932da6
Compare
Master branch: 5a8921b |
9932da6
to
f465f1e
Compare
Master branch: 2ade1cd |
f465f1e
to
7773386
Compare
Master branch: 2efcf69 |
486b85e
to
f38fab3
Compare
Master branch: 62c69e8 |
f38fab3
to
e2132a3
Compare
Master branch: 79d878f |
e2132a3
to
f160c8b
Compare
Master branch: 6c4e777 |
f160c8b
to
50d1081
Compare
Master branch: e2ac2a0 |
50d1081
to
00298ea
Compare
Master branch: a526a3c |
00298ea
to
bd579f9
Compare
Master branch: a526a3c |
bd579f9
to
8ea7cd2
Compare
Master branch: 05ee658 |
8ea7cd2
to
4b4e09d
Compare
Master branch: 7a698ed |
4b4e09d
to
8971f8d
Compare
Master branch: 01dea95 |
8971f8d
to
2936d43
Compare
x64 JIT produces redundant instructions when a shift operation's destination register is BPF_REG_4/ecx and this patch removes them. Specifically, when dest reg is BPF_REG_4 but the src isn't, we needn't push and pop ecx around shift only to get it overwritten by r11 immediately afterwards. In the rare case when both dest and src registers are BPF_REG_4, a single shift instruction is sufficient and we don't need the two MOV instructions around the shift. To summarize using shift left as an example, without patch: ------------------------------------------------- | dst == ecx | dst != ecx ================================================= src == ecx | mov r11, ecx | shl dst, cl | shl r11, ecx | | mov ecx, r11 | ------------------------------------------------- src != ecx | mov r11, ecx | push ecx | push ecx | mov ecx, src | mov ecx, src | shl dst, cl | shl r11, cl | pop ecx | pop ecx | | mov ecx, r11 | ------------------------------------------------- With patch: ------------------------------------------------- | dst == ecx | dst != ecx ================================================= src == ecx | shl ecx, cl | shl dst, cl ------------------------------------------------- src != ecx | mov r11, ecx | push ecx | mov ecx, src | mov ecx, src | shl r11, cl | shl dst, cl | mov ecx, r11 | pop ecx ------------------------------------------------- Signed-off-by: Jie Meng <jmeng@fb.com>
BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX encoding but target general purpose registers [1]. They allow the shift count in any general purpose register and have the same performance as non BMI2 shift instructions [2]. Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx), emit their more flexible alternatives provided in BMI2 when advantageous; keep using the non BMI2 instructions when shift count is already in BPF_REG_4/%rcx as non BMI2 instructions are shorter. To summarize, when BMI2 is available: ------------------------------------------------- | arbitrary dst ================================================= src == ecx | shl dst, cl ------------------------------------------------- src != ecx | shlx dst, dst, src ------------------------------------------------- And no additional register shuffling is needed. A concrete example between non BMI2 and BMI2 codegen. To shift %rsi by %rdi: Without BMI2: ef3: push %rcx 51 ef4: mov %rdi,%rcx 48 89 f9 ef7: shl %cl,%rsi 48 d3 e6 efa: pop %rcx 59 With BMI2: f0b: shlx %rdi,%rsi,%rsi c4 e2 c1 f7 f6 [1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set [2] https://www.agner.org/optimize/instruction_tables.pdf Signed-off-by: Jie Meng <jmeng@fb.com>
Current tests cover only shifts with an immediate as the source operand/shift counts; add a new test case to cover register operand. Signed-off-by: Jie Meng <jmeng@fb.com>
Master branch: 81bfcc3 |
2936d43
to
3d80438
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=683811 irrelevant now. Closing PR. |
Pull request for series with
subject: bpf,x64: Use BMI2 for shifts
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=680036