Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang-15 outputs better code than clang-16 #62238

Closed
k-arrows opened this issue Apr 19, 2023 · 4 comments
Closed

clang-15 outputs better code than clang-16 #62238

k-arrows opened this issue Apr 19, 2023 · 4 comments
Assignees

Comments

@k-arrows
Copy link

k-arrows commented Apr 19, 2023

Test case is the following trivial function.

int foo(int a, int b)
{
    return a - b ? b - a : b - a ;
}

https://godbolt.org/z/MexYMfeK1
https://alive2.llvm.org/ce/z/9Ma_ob

Clang 15.0.0

foo(int, int):                               # @foo(int, int)
        mov     eax, esi
        sub     eax, edi
        ret

Clang 16.0.0

foo(int, int):                               # @foo(int, int)
        xor     eax, eax
        sub     esi, edi
        cmovne  eax, esi
        ret

If this change is as expected, feel free to close this issue.

@junaire
Copy link
Member

junaire commented May 11, 2023

I believe we should blame InstCombine pass. In LLVM 15 the optimizer can merge all the same code of the 2 branches and then eliminate the phi node. However, in LLVM 16 it can't fully merge the code so the branch was kept.

Original IR:

define dso_local noundef i32 @foo(int, int)(i32 noundef %a, i32 noundef %b) local_unnamed_addr {
entry:
  %sub = sub nsw i32 %a, %b
  %tobool = icmp ne i32 %sub, 0
  br i1 %tobool, label %cond.true, label %cond.false

cond.true:                                        ; preds = %entry
  %sub1 = sub nsw i32 %b, %a
  br label %cond.end

cond.false:                                       ; preds = %entry
  %sub2 = sub nsw i32 %b, %a
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %cond.true
  %cond = phi i32 [ %sub1, %cond.true ], [ %sub2, %cond.false ]
  ret i32 %cond
}

LLVM 15:

define dso_local noundef i32 @foo(int, int)(i32 noundef %a, i32 noundef %b) local_unnamed_addr {
entry:
  %tobool.not = icmp eq i32 %b, %a
  br i1 %tobool.not, label %cond.false, label %cond.true

cond.true:                                        ; preds = %entry
  br label %cond.end

cond.false:                                       ; preds = %entry
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %cond.true
  %cond = sub nsw i32 %b, %a
  ret i32 %cond
}

LLVM 16:

define dso_local noundef i32 @foo(int, int)(i32 noundef %a, i32 noundef %b) local_unnamed_addr {
entry:
  %tobool.not = icmp eq i32 %b, %a
  br i1 %tobool.not, label %cond.false, label %cond.true

cond.true:                                        ; preds = %entry
  %sub1 = sub nsw i32 %b, %a       ; <===== This code is kept so we can't eliminate the phi node.
  br label %cond.end

cond.false:                                       ; preds = %entry
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %cond.true
  %cond = phi i32 [ %sub1, %cond.true ], [ 0, %cond.false ]
  ret i32 %cond
}

@junaire
Copy link
Member

junaire commented May 11, 2023

My guess is that when the optimizer is in cond.false (%a == %b), it constant folds %sub2 = sub nsw i32 %b, %a to 0 then it loses further optimization opportunity.

@DianQK
Copy link
Member

DianQK commented May 11, 2023

Hi @junaire, if you are interested in this issue, you can check out https://reviews.llvm.org/D148979. If you're not interested, I'll probably come back to this after I've addressed the other issues.

@junaire
Copy link
Member

junaire commented May 11, 2023

Candidate patch: https://reviews.llvm.org/D150378

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants