Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] there is redundant neg instruction for mull with complex type #84152

Closed
vfdff opened this issue Mar 6, 2024 · 2 comments · Fixed by #84267
Closed

[AArch64] there is redundant neg instruction for mull with complex type #84152

vfdff opened this issue Mar 6, 2024 · 2 comments · Fixed by #84267

Comments

@vfdff
Copy link
Contributor

vfdff commented Mar 6, 2024

define ptr @_ZNSt7complexIiEmLIiEERS0_RKS_IT_E(ptr noundef nonnull align 4 dereferenceable(8) %a, 
                                               ptr noundef nonnull align 4 dereferenceable(8) %b) align 2 {
entry:
  %0 = load i32, ptr %a, align 4
  %1 = load i32, ptr %b, align 4
  %mul = mul nsw i32 %1, %0
  %_M_imag = getelementptr inbounds i8, ptr %a, i64 4
  %2 = load i32, ptr %_M_imag, align 4
  %_M_imag.i = getelementptr inbounds i8, ptr %b, i64 4
  %3 = load i32, ptr %_M_imag.i, align 4
  %mul3 = mul nsw i32 %3, %2
  %sub = sub nsw i32 %mul, %mul3
  %mul6 = mul nsw i32 %3, %0
  %mul9 = mul nsw i32 %2, %1
  %add = add nsw i32 %mul6, %mul9
  store i32 %add, ptr %_M_imag, align 4
  store i32 %sub, ptr %a, align 4
  ret ptr %a
}
  • llvm: transform the madd into msub, then the neg instruction is redundant
    • MADD Rd, Rn, Rm, Ra => Rd = Ra + Rn*Rm
    • MSUB Rd, Rn, Rm, Ra => Rd = Ra - Rn*Rm
std::complex<int>& std::complex<int>::operator*=<int>(std::complex<int> const&):     // @std::complex<int>& std::complex<int>::operator*=<int>(std::complex<int> const&)
        ldp     w11, w8, [x0]
        ldp     w10, w9, [x1]
        mul     w12, w9, w8
        mul     w8, w8, w10
        madd    w8, w9, w11, w8
        neg     w13, w12
        madd    w9, w10, w11, w13  ; w9 = w10 * w11 + w13 = w9 = w10 * w11 - w12 =  w10 * w11 - w9 * w8 = msub w10 * w11, w9, 28
        stp     w9, w8, [x0]
        ret
void cmul2_int (complex<int>  * __restrict a,
                complex<int>  * __restrict b,
                complex<int>  * __restrict c) {
  for (int i=0; i<1000; i++) {
     c[i] = a[i] * b[i];
  }
}
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 6, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Allen (vfdff)

* **test**: https://gcc.godbolt.org/z/enYjsjGfE ``` define ptr @_ZNSt7complexIiEmLIiEERS0_RKS_IT_E(ptr noundef nonnull align 4 dereferenceable(8) %a, ptr noundef nonnull align 4 dereferenceable(8) %b) align 2 { entry: %0 = load i32, ptr %a, align 4 %1 = load i32, ptr %b, align 4 %mul = mul nsw i32 %1, %0 %_M_imag = getelementptr inbounds i8, ptr %a, i64 4 %2 = load i32, ptr %_M_imag, align 4 %_M_imag.i = getelementptr inbounds i8, ptr %b, i64 4 %3 = load i32, ptr %_M_imag.i, align 4 %mul3 = mul nsw i32 %3, %2 %sub = sub nsw i32 %mul, %mul3 %mul6 = mul nsw i32 %3, %0 %mul9 = mul nsw i32 %2, %1 %add = add nsw i32 %mul6, %mul9 store i32 %add, ptr %_M_imag, align 4 store i32 %sub, ptr %a, align 4 ret ptr %a } ```
  • llvm: transform the madd into msub, then the neg instruction is redundant
    • MADD Rd, Rn, Rm, Ra => Rd = Ra + Rn*Rm
    • MSUB Rd, Rn, Rm, Ra => Rd = Ra - Rn*Rm
std::complex&lt;int&gt;&amp; std::complex&lt;int&gt;::operator*=&lt;int&gt;(std::complex&lt;int&gt; const&amp;):     // @<!-- -->std::complex&lt;int&gt;&amp; std::complex&lt;int&gt;::operator*=&lt;int&gt;(std::complex&lt;int&gt; const&amp;)
        ldp     w11, w8, [x0]
        ldp     w10, w9, [x1]
        mul     w12, w9, w8
        mul     w8, w8, w10
        madd    w8, w9, w11, w8
        neg     w13, w12
        madd    w9, w10, w11, w13  ; w9 = w10 * w11 + w13 = w9 = w10 * w11 - w12 =  w10 * w11 - w9 * w8 = msub w10 * w11, w9, 28
        stp     w9, w8, [x0]
        ret
void cmul2_int (complex&lt;int&gt;  * __restrict a,
                complex&lt;int&gt;  * __restrict b,
                complex&lt;int&gt;  * __restrict c) {
  for (int i=0; i&lt;1000; i++) {
     c[i] = a[i] * b[i];
  }
}

vfdff added a commit to vfdff/llvm-project that referenced this issue Mar 7, 2024
Pattern should be sorted in priority order since the pattern evalutor
stops checking as soon as it finds a faster sequence.
so for a * b - c * d, we prefer to match the 2nd operands of sub,
which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix llvm#84152
@vfdff
Copy link
Contributor Author

vfdff commented Mar 8, 2024

the complex<long long> has similar issue , https://gcc.godbolt.org/z/79GnMPnY1

@vfdff vfdff closed this as completed in 3a62edc Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants