Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PowerPC] Check value uses in ValueBit tracking #66040

Closed
wants to merge 1 commit into from

Conversation

ecnelises
Copy link
Member

PowerPC instruction selection tracks bits for every series of bitwise
operations. But in some cases the method selects worse result for complex
PowerPC rotate instructions. Check use of visited operands to avoid suboptimal
analysis.

@ecnelises
Copy link
Member Author

Gentle ping... any comments?

@bzEq
Copy link
Collaborator

bzEq commented Jan 29, 2024

Though I do see some codegen improvement, I don't think we should check uses inside getValueBits, since getValueBits is for analysis and is gathering as much information as it can. If you are relying usage information, one solution might be extending getValueBits to return more information. Can you post your motivation example?

@ecnelises
Copy link
Member Author

ecnelises commented Jan 29, 2024

The motivating case:

define i64 @test(i64 %a) {
entry:
  %x0 = shl i64 %a, 8
  %x1 = and i64 %a, 255
  %x2 = or i64 %x0, %x1
  %x3 = shl i64 %x2, 16
  %x4 = and i64 %x2, 65535
  %x5 = or i64 %x3, %x4
  %x6 = shl i64 %x5, 32
  %x7 = and i64 %x5, 4294967295
  %x8 = or i64 %x6, %x7
  ret i64 %x8
}
; before patch
rotldi 4, 3, 32
rlwimi 4, 3, 0, 24, 31
rlwimi 4, 3, 8, 16, 23
rlwimi 4, 3, 16, 8, 15
rlwimi 4, 3, 24, 0, 7
rldimi 4, 3, 40, 16
rldimi 4, 3, 48, 8
rldimi 4, 3, 56, 0
mr     3, 4

; after patch
rldimi 3, 3, 8, 0
rldimi 3, 3, 16, 0
rlwinm 3, 3, 0, 1, 0

Bit permutation is not good at selecting globally optimized result. This is the limitation until a better selector respecting rotate operations is introduced.

Copy link
Collaborator

@chenzheng1030 chenzheng1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, this is more like a workaround. Now you replace some known bits with Variable for operands that have more than one uses. This sounds not good. We should try to get analysis info as accurate as we can. The issue is how we use the analysis info and generate the instructions based on the analysis info. As you can see, now with this patch, the case llvm/test/CodeGen/PowerPC/loop-instr-form-prepare.ll becomes worse.

For the improvement in llvm/test/CodeGen/PowerPC/rldimi.ll, as this IR is generated from ppc builtin __rldimi, I think the right way to fix this maybe adding an ppc intrinsic for the builtin and map the __rldimi to the new intrinsic and then select the intrinsic in the PPC ISEL td files. Since developer explicitly uses __rldimi, what the developer expects should be the corresponding PPC instruction rldimi? What do you think?

@ecnelises
Copy link
Member Author

Since developer explicitly uses __rldimi, what the developer expects should be the corresponding PPC instruction rldimi?

We can definitely create a new intrinsic for rldimi. But I'm concerned whether the rest 8 rl* instructions also need such an intrinsic? Current codegen tests won't show changes after adding a new intrinsic, so I may need to find whether some C cases will go worse.

Yes, this patch is indeed a workaround. The reason of this suboptimal codegen is local greedy BitPermutation selector. For longer consideration, we need modeling and recombination of rotate-clear/mask/insert operations in backend. If that shows good result, BitPermutation can be dropped.

@chenzheng1030
Copy link
Collaborator

Since developer explicitly uses __rldimi, what the developer expects should be the corresponding PPC instruction rldimi?

We can definitely create a new intrinsic for rldimi. But I'm concerned whether the rest 8 rl* instructions also need such an intrinsic? Current codegen tests won't show changes after adding a new intrinsic, so I may need to find whether some C cases will go worse.

I think we just need to add intrinsic for builtin that are currently supported by clang. See https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=rf-rldimi-builtin-ppc-rldimi-rlwimi-builtin-ppc-rlwimi and https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-rlwnm-builtin-ppc-rlwnm . So we need 3 different intrinsics.

Yes, this patch is indeed a workaround. The reason of this suboptimal codegen is local greedy BitPermutation selector. For longer consideration, we need modeling and recombination of rotate-clear/mask/insert operations in backend. If that shows good result, BitPermutation can be dropped.

Agreed. From the other two cases diff, we indeed have some issue for the BitPermutation logic. We need a redesign here.

@ecnelises
Copy link
Member Author

I opened #82968 to replace this one since both patches differ a lot.

@ecnelises ecnelises closed this Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants