[PowerPC] Check value uses in ValueBit tracking #66040

ecnelises · 2023-09-12T02:54:08Z

PowerPC instruction selection tracks bits for every series of bitwise
operations. But in some cases the method selects worse result for complex
PowerPC rotate instructions. Check use of visited operands to avoid suboptimal
analysis.

ecnelises · 2023-11-09T09:08:46Z

Gentle ping... any comments?

bzEq · 2024-01-29T09:23:34Z

Though I do see some codegen improvement, I don't think we should check uses inside getValueBits, since getValueBits is for analysis and is gathering as much information as it can. If you are relying usage information, one solution might be extending getValueBits to return more information. Can you post your motivation example?

ecnelises · 2024-01-29T09:37:42Z

The motivating case:

define i64 @test(i64 %a) {
entry:
  %x0 = shl i64 %a, 8
  %x1 = and i64 %a, 255
  %x2 = or i64 %x0, %x1
  %x3 = shl i64 %x2, 16
  %x4 = and i64 %x2, 65535
  %x5 = or i64 %x3, %x4
  %x6 = shl i64 %x5, 32
  %x7 = and i64 %x5, 4294967295
  %x8 = or i64 %x6, %x7
  ret i64 %x8
}

; before patch
rotldi 4, 3, 32
rlwimi 4, 3, 0, 24, 31
rlwimi 4, 3, 8, 16, 23
rlwimi 4, 3, 16, 8, 15
rlwimi 4, 3, 24, 0, 7
rldimi 4, 3, 40, 16
rldimi 4, 3, 48, 8
rldimi 4, 3, 56, 0
mr     3, 4

; after patch
rldimi 3, 3, 8, 0
rldimi 3, 3, 16, 0
rlwinm 3, 3, 0, 1, 0

Bit permutation is not good at selecting globally optimized result. This is the limitation until a better selector respecting rotate operations is introduced.

chenzheng1030

TBH, this is more like a workaround. Now you replace some known bits with Variable for operands that have more than one uses. This sounds not good. We should try to get analysis info as accurate as we can. The issue is how we use the analysis info and generate the instructions based on the analysis info. As you can see, now with this patch, the case llvm/test/CodeGen/PowerPC/loop-instr-form-prepare.ll becomes worse.

For the improvement in llvm/test/CodeGen/PowerPC/rldimi.ll, as this IR is generated from ppc builtin __rldimi, I think the right way to fix this maybe adding an ppc intrinsic for the builtin and map the __rldimi to the new intrinsic and then select the intrinsic in the PPC ISEL td files. Since developer explicitly uses __rldimi, what the developer expects should be the corresponding PPC instruction rldimi? What do you think?

ecnelises · 2024-02-23T07:56:06Z

Since developer explicitly uses __rldimi, what the developer expects should be the corresponding PPC instruction rldimi?

We can definitely create a new intrinsic for rldimi. But I'm concerned whether the rest 8 rl* instructions also need such an intrinsic? Current codegen tests won't show changes after adding a new intrinsic, so I may need to find whether some C cases will go worse.

Yes, this patch is indeed a workaround. The reason of this suboptimal codegen is local greedy BitPermutation selector. For longer consideration, we need modeling and recombination of rotate-clear/mask/insert operations in backend. If that shows good result, BitPermutation can be dropped.

chenzheng1030 · 2024-02-23T08:24:01Z

Since developer explicitly uses __rldimi, what the developer expects should be the corresponding PPC instruction rldimi?

We can definitely create a new intrinsic for rldimi. But I'm concerned whether the rest 8 rl* instructions also need such an intrinsic? Current codegen tests won't show changes after adding a new intrinsic, so I may need to find whether some C cases will go worse.

I think we just need to add intrinsic for builtin that are currently supported by clang. See https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=rf-rldimi-builtin-ppc-rldimi-rlwimi-builtin-ppc-rlwimi and https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-rlwnm-builtin-ppc-rlwnm . So we need 3 different intrinsics.

Yes, this patch is indeed a workaround. The reason of this suboptimal codegen is local greedy BitPermutation selector. For longer consideration, we need modeling and recombination of rotate-clear/mask/insert operations in backend. If that shows good result, BitPermutation can be dropped.

Agreed. From the other two cases diff, we indeed have some issue for the BitPermutation logic. We need a redesign here.

ecnelises · 2024-02-26T08:37:33Z

I opened #82968 to replace this one since both patches differ a lot.

ecnelises requested review from bzEq, chenzheng1030, nemanjai and stefanp-ibm September 12, 2023 02:54

bzEq added the backend:PowerPC label Sep 12, 2023

ecnelises force-pushed the bps_use_chk branch from a5d5588 to ebaafdd Compare September 28, 2023 08:15

[PowerPC] Check value uses in ValueBit tracking

48a3374

ecnelises force-pushed the bps_use_chk branch from f201509 to 48a3374 Compare February 1, 2024 05:31

chenzheng1030 reviewed Feb 23, 2024

View reviewed changes

ecnelises closed this Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PowerPC] Check value uses in ValueBit tracking #66040

[PowerPC] Check value uses in ValueBit tracking #66040

ecnelises commented Sep 12, 2023

ecnelises commented Nov 9, 2023

bzEq commented Jan 29, 2024 •

edited

ecnelises commented Jan 29, 2024 •

edited

chenzheng1030 left a comment •

edited

ecnelises commented Feb 23, 2024

chenzheng1030 commented Feb 23, 2024

ecnelises commented Feb 26, 2024

[PowerPC] Check value uses in ValueBit tracking #66040

[PowerPC] Check value uses in ValueBit tracking #66040

Conversation

ecnelises commented Sep 12, 2023

ecnelises commented Nov 9, 2023

bzEq commented Jan 29, 2024 • edited

ecnelises commented Jan 29, 2024 • edited

chenzheng1030 left a comment • edited

Choose a reason for hiding this comment

ecnelises commented Feb 23, 2024

chenzheng1030 commented Feb 23, 2024

ecnelises commented Feb 26, 2024

bzEq commented Jan 29, 2024 •

edited

ecnelises commented Jan 29, 2024 •

edited

chenzheng1030 left a comment •

edited