Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimplifyDemandedBitsForTargetNode - Missing AArch64ISD::BIC & AArch64ISD::BICi handling #53881

Closed
RKSimon opened this issue Feb 16, 2022 · 12 comments · Fixed by #76644
Closed
Assignees
Labels
backend:AArch64 good first issue https://github.com/llvm/llvm-project/contribute

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 16, 2022

These get lowered quite early, meaning that there are missed opportunities to further simplify the DAG based on the masked bits.

Noticed while looking at Issue #53622

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 16, 2022

@llvm/issue-subscribers-backend-aarch64

@RKSimon RKSimon added the good first issue https://github.com/llvm/llvm-project/contribute label Sep 23, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 23, 2022

@llvm/issue-subscribers-good-first-issue

@sjoerdmeijer
Copy link
Collaborator

Hi @RKSimon, can you expand a little bit what the idea of this work is? Do we expect that it triggers a rewrite?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Aug 11, 2023

@sjoerdmeijer I'll try to find the work I was doing on #53622 and see if I can repro

@RKSimon
Copy link
Collaborator Author

RKSimon commented Aug 13, 2023

@davemgreen @sjoerdmeijer This is the kind of thing I had in mind: https://rust.godbolt.org/z/M6WbxTaYv

define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) {
  %x0 = zext <8 x i8> %a0 to <8 x i16>
  %x1 = zext <8 x i8> %a1 to <8 x i16>
  %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1)
  %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511>
  ret <8 x i16> %res
}
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)

->

Combining: t0: ch,glue = EntryToken
Optimized lowered selection DAG: %bb.0 'haddu_known:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
          t2: v8i8,ch = CopyFromReg t0, Register:v8i8 %0
        t5: v8i16 = zero_extend t2
          t4: v8i8,ch = CopyFromReg t0, Register:v8i8 %1
        t6: v8i16 = zero_extend t4
      t8: v8i16 = llvm.aarch64.neon.uhadd TargetConstant:i64<635>, t5, t6
    t18: v8i16 = AArch64ISD::BICi t8, Constant:i32<254>, Constant:i32<8>
  t13: ch,glue = CopyToReg t0, Register:v8i16 $q0, t18
  t14: ch = AArch64ISD::RET_GLUE t13, Register:v8i16 $q0, t13:1

The AND should be removable as the uhadd (ISD::AVGFLOORU) node should never have the top most bits set, but the AND gets coverted to a AArch64ISD::BICi node very early, so we need:

  • SelectionDAG::computeKnownBits to handle ISD::AVGFLOORU nodes (this is a very basic extension of the ISD::AVGCEILU handling from D119629 for [DAG] Add knownbits/signbits handling for ISD::AVG* nodes #53622)
  • SimplifyDemandedBitsForTargetNode handling for AArch64ISD::BICi to correctly grab the known bits for the input operand and realise the AArch64ISD::BICi is superfluous.

@snikitav
Copy link
Contributor

May I take this one?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 12, 2023

Sure, go for it

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 12, 2023

#53622 needs addressing as well if you're interested :)

@snikitav
Copy link
Contributor

Sure

@snikitav
Copy link
Contributor

@RKSimon could you please assign it to me so no one will be confused? and #53622 as well

@snikitav
Copy link
Contributor

snikitav commented Dec 31, 2023

@RKSimon could you please provide an example similar to https://rust.godbolt.org/z/M6WbxTaYv but with BIC instead of BICi. Because I don't see how this optimization could be applied to BIC version without immediate, sorry.

#76644 PTAL

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 31, 2023

It wasn't necessarily for HADD etc. But you should be able to at least use KnownBits getMinValue/getMaxValue bounds to work out the known upper zero bits etc.

davemgreen pushed a commit that referenced this issue Apr 6, 2024
…ndling (#76644)

Fold BICi if all destination bits are already known to be zeroes

```llvm
define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) {
  %x0 = zext <8 x i8> %a0 to <8 x i16>
  %x1 = zext <8 x i8> %a1 to <8 x i16>
  %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1)
  %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511>
  ret <8 x i16> %res
}
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
```

```
haddu_known:                            // @haddu_known
        ushll   v0.8h, v0.8b, #0
        ushll   v1.8h, v1.8b, #0
        uhadd   v0.8h, v0.8h, v1.8h
        bic     v0.8h, #254, lsl #8 <-- this one will be removed as we know high bits are zero extended
        ret
```

Fixes #53881
Fixes #53622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 good first issue https://github.com/llvm/llvm-project/contribute
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants