Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead Code Elimination Regression at -O3 (trunk vs. 14.0.4) #56048

Closed
thetheodor opened this issue Jun 15, 2022 · 4 comments
Closed

Dead Code Elimination Regression at -O3 (trunk vs. 14.0.4) #56048

thetheodor opened this issue Jun 15, 2022 · 4 comments

Comments

@thetheodor
Copy link
Contributor

static short c, d;

void foo();

char a();

static long e(short f) { return f & (f - 1) ? 0 : f - 1; }

int main() {
    for (int h = 1; h; h = e(h))
        for (; d < 1; d += 1) {
            c = h;
            if (c + 1 == h) {
                foo();
                h == 1 || a();
            }
        }
}

llvm-4c2bccfda3892ae13e97b6bfdbc99ec8cf5d095d (trunk) -O3 can not eliminate foo but llvm-llvmorg-14.0.4 -O3 can.

Target: x86_64-unknown-linux-gnu


llvm-4c2bccfda3892ae13e97b6bfdbc99ec8cf5d095d (trunk) -O3 [-emit-llvm] -S -o /dev/stdout case.c

Reduced assembly

main:                                   # @main
	.cfi_startproc
# %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	pushq	%rbx
	.cfi_def_cfa_offset 24
	pushq	%rax
	.cfi_def_cfa_offset 32
	.cfi_offset %rbx, -24
	.cfi_offset %rbp, -16
	movzwl	d(%rip), %eax
	testw	%ax, %ax
	jle	.LBB0_1
.LBB0_10:
	xorl	%eax, %eax
	addq	$8, %rsp
	.cfi_def_cfa_offset 24
	popq	%rbx
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	retq
.LBB0_1:
	.cfi_def_cfa_offset 32
	movl	$1, %ecx
	xorl	%ebp, %ebp
	jmp	.LBB0_2
	.p2align	4, 0x90
.LBB0_9:                                #   in Loop: Header=BB0_2 Depth=1
	leal	-1(%rbx), %ecx
	testl	%ecx, %ebx
	cmovnel	%ebp, %ecx
	testl	%ecx, %ecx
	je	.LBB0_10
.LBB0_2:                                # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_7 Depth 2
                                        #     Child Loop BB0_6 Depth 2
                                        #     Child Loop BB0_5 Depth 2
	movswl	%cx, %ebx
	testw	%ax, %ax
	jg	.LBB0_9
# %bb.3:                                #   in Loop: Header=BB0_2 Depth=1
	leal	1(%rbx), %edx
	cmpl	%ecx, %edx
	jne	.LBB0_7
# %bb.4:                                #   in Loop: Header=BB0_2 Depth=1
	cmpl	$1, %ecx
	jne	.LBB0_6
	.p2align	4, 0x90
.LBB0_5:                                #   Parent Loop BB0_2 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
	xorl	%eax, %eax
	callq	foo@PLT
	movzwl	d(%rip), %ecx
	leal	1(%rcx), %eax
	movw	%ax, d(%rip)
	cmpl	$32766, %ecx                    # imm = 0x7FFE
	ja	.LBB0_5
	jmp	.LBB0_9
	.p2align	4, 0x90
.LBB0_7:                                #   Parent Loop BB0_2 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
                                        # kill: def $ax killed $ax killed $eax def $rax
	movzwl	%ax, %ecx
	incl	%eax
	cmpl	$32766, %ecx                    # imm = 0x7FFE
	ja	.LBB0_7
# %bb.8:                                #   in Loop: Header=BB0_2 Depth=1
	movw	%ax, d(%rip)
	movw	$1, %ax
	jmp	.LBB0_9
	.p2align	4, 0x90
.LBB0_6:                                #   Parent Loop BB0_2 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
	xorl	%eax, %eax
	callq	foo@PLT
	xorl	%eax, %eax
	callq	a@PLT
	movzwl	d(%rip), %ecx
	leal	1(%rcx), %eax
	movw	%ax, d(%rip)
	cmpl	$32766, %ecx                    # imm = 0x7FFE
	ja	.LBB0_6
	jmp	.LBB0_9
.Lfunc_end0:
	.size	main, .Lfunc_end0-main


llvm-llvmorg-14.0.4 -O3 [-emit-llvm] -S -o /dev/stdout case.c

Reduced assembly

main:                                   # @main
	.cfi_startproc
# %bb.0:
	movzwl	d(%rip), %eax
	testw	%ax, %ax
	jle	.LBB0_1
# %bb.3:
	xorl	%eax, %eax
	retq
	.p2align	4, 0x90
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
                                        # kill: def $ax killed $ax killed $eax def $rax
	movzwl	%ax, %ecx
	addl	$1, %eax
	cmpl	$32766, %ecx                    # imm = 0x7FFE
	ja	.LBB0_1
# %bb.2:
	movw	%ax, d(%rip)
	xorl	%eax, %eax
	retq
.Lfunc_end0:
	.size	main, .Lfunc_end0-main


Bisection

Bisected to: 6001bfc
Committed by: @nikic

@nunoplopes
Copy link
Member

Why do we get the freeze in the first place?
Here's a slightly simplified example: https://gcc.godbolt.org/z/evn37ze7G
I see loop peeling going on, not sure anything else is happening? Which pass introduces the freeze?

@nunoplopes
Copy link
Member

Answering myself, it's loop unswitch. In this case we could forgo the freeze, because we have smth like:

  %h.0 = phi i32 [ 1, %entry ], [ %sub, %for.inc10 ]
  %cmp4 = icmp eq i32 %h.0, 1
  %cmp4.fr = freeze i1 %cmp4
...
  %sub = add nsw i32 %h.0, -1

The initial phi value is non-poison, and the subsequent value is also not poison (if we drop nsw), so loop unswitch could have avoided adding the freeze by dropping nsw (which in retrospect that's what your proposed patch does).

The reason I was looking into this is because dropping nsw from induction variables is not great. It can make SCEV unable to compute the trip count.
I fear we are getting into a pass ordering problem. I don't know what the right answer is here, though. We can try your patch and see if something regresses.

@nikic
Copy link
Contributor

nikic commented Jun 16, 2022

Mentioned patch for reference: https://reviews.llvm.org/D127960

The reason I was looking into this is because dropping nsw from induction variables is not great. It can make SCEV unable to compute the trip count. I fear we are getting into a pass ordering problem. I don't know what the right answer is here, though. We can try your patch and see if something regresses.

SCEV can't analyze freeze instructions, so it will not be able to even form an addrec in this case, let alone compute a trip count. I think that pushing the freeze to the start value (so we can form the addrec) should generally be preferable to keeping nowrap flags.

@nunoplopes
Copy link
Member

True, and freeze kills the nsw anyway, so getting rid of freeze is the way to go, yes.
Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants