Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LICM behaving differently with and without debug information present #55915

Closed
mikaelholmen opened this issue Jun 7, 2022 · 9 comments
Closed

Comments

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Jun 7, 2022

llvm commit: 19647e5
Reproduce with:

opt -opaque-pointers=0 -passes="default<O2>" bbi-70612_typed.ll -S -o -

Result:

[...]
%.promoted = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 3), align 1
%.promoted5 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 4), align 1
%.promoted8 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 2), align 1

If we comment out the call to llvm.dbg.declare we instead get these corresponding loads:

[...]
%.promoted = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 4), align 1
%.promoted5 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 2), align 1
%.promoted8 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 3), align 1

Note that we load in a different order.

This starts happening with commit 2cdc6f2

    Reland "[LICM] Hoist LOAD without sinking the STORE"
    
    When doing load/store promotion within LICM, if we
    cannot prove that it is safe to sink the store we won't
    hoist the load, even though we can prove the load could
    be dereferenced and moved outside the loop. This patch
    implements the load promotion by moving it in the loop
    preheader by inserting proper PHI in the loop. The store
    is kept as is in the loop. By doing this, we avoid doing
    the load from a memory location in each iteration.
    
    Please consider this small example:
    
    loop {
      var = *ptr;
      if (var) break;
      *ptr= var + 1;
    }
    After this patch, it will be:
    
    var0 = *ptr;
    loop {
      var1 = phi (var0, var2);
      if (var1) break;
      var2 = var1 + 1;
      *ptr = var2;
    }
    This addresses some problems from [0].
    
    [0] https://bugs.llvm.org/show_bug.cgi?id=51193
    
    Differential revision: https://reviews.llvm.org/D113289

bbi-70612_typed.ll.gz

@mikaelholmen
Copy link
Collaborator Author

Looking at -print-before-all -print-after-all -print-module-scope printouts, we starts seeing differences with/without the dbg-declare when LICM run:

27844c31580
< *** IR Dump Before LICMPass on Loop at depth 1 containing: %for.body<header>,%if.then,%land.rhs,%land.end,%if.then1,%for.inc11<latch><exiting>,%for.body4.preheader,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2 ***
---
> *** IR Dump Before LICMPass on Loop at depth 1 containing: %for.body<header>,%if.then,%for.body4.preheader,%land.rhs,%land.end,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2,%if.then1,%for.inc11<latch><exiting> ***
27931a31668,31670
> ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
> declare void @llvm.dbg.value(metadata, metadata, metadata) addrspace(40) #1
> 
27932a31672
> attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }
27946c31686
< *** IR Dump After LICMPass on Loop at depth 1 containing: %for.body<header>,%if.then,%land.rhs,%land.end,%if.then1,%for.inc11<latch><exiting>,%for.body4.preheader,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2 ***
---
> *** IR Dump After LICMPass on Loop at depth 1 containing: %for.body<header>,%if.then,%for.body4.preheader,%land.rhs,%land.end,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2,%if.then1,%for.inc11<latch><exiting> ***
27965,27967c31705,31707
<   %.promoted = load i32, ptr getelementptr inbounds ([5 x i32], ptr @c, i64 0, i64 4), align 1
<   %.promoted5 = load i32, ptr getelementptr inbounds ([5 x i32], ptr @c, i64 0, i64 2), align 1
<   %.promoted8 = load i32, ptr getelementptr inbounds ([5 x i32], ptr @c, i64 0, i64 3), align 1
---
>   %.promoted = load i32, ptr getelementptr inbounds ([5 x i32], ptr @c, i64 0, i64 3), align 1
>   %.promoted5 = load i32, ptr getelementptr inbounds ([5 x i32], ptr @c, i64 0, i64 4), align 1
>   %.promoted8 = load i32, ptr getelementptr inbounds ([5 x i32], ptr @c, i64 0, i64 2), align 1
27971,27973c31711,31713
<   %conv.110 = phi i32 [ %conv.19, %for.inc11 ], [ %.promoted8, %for.body.preheader ]
<   %conv.27 = phi i32 [ %conv.26, %for.inc11 ], [ %.promoted5, %for.body.preheader ]
<   %conv4 = phi i32 [ %conv3, %for.inc11 ], [ %.promoted, %for.body.preheader ]
---
>   %conv.210 = phi i32 [ %conv.29, %for.inc11 ], [ %.promoted8, %for.body.preheader ]
>   %conv7 = phi i32 [ %conv6, %for.inc11 ], [ %.promoted5, %for.body.preheader ]
>   %conv.14 = phi i32 [ %conv.13, %for.inc11 ], [ %.promoted, %for.body.preheader ]
27981c31721
<   %tobool5.not = icmp eq i32 %conv.110, 0
---
>   %tobool5.not = icmp eq i32 %conv.14, 0
27989c31729
<   %tobool7 = icmp ne i32 %conv4, 0
---
>   %tobool7 = icmp ne i32 %conv7, 0
27996c31736
<   %tobool5.not.1 = icmp eq i32 %conv.27, 0
---
>   %tobool5.not.1 = icmp eq i32 %conv.210, 0
28000c31740
<   %tobool7.1 = icmp ne i32 %conv.110, 0
---
>   %tobool7.1 = icmp ne i32 %conv.14, 0
28010c31750
<   %tobool7.2 = icmp ne i32 %conv.27, 0
---
>   %tobool7.2 = icmp ne i32 %conv.210, 0
28021,28023c31761,31763
<   %conv.19 = phi i32 [ %conv.1, %land.end.2 ], [ %conv.110, %for.body ], [ %conv.110, %if.then1 ]
<   %conv.26 = phi i32 [ %conv.2, %land.end.2 ], [ %conv.27, %for.body ], [ %conv.27, %if.then1 ]
<   %conv3 = phi i32 [ %conv, %land.end.2 ], [ %conv4, %for.body ], [ %conv4, %if.then1 ]
---
>   %conv.29 = phi i32 [ %conv.2, %land.end.2 ], [ %conv.210, %for.body ], [ %conv.210, %if.then1 ]
>   %conv6 = phi i32 [ %conv, %land.end.2 ], [ %conv7, %for.body ], [ %conv7, %if.then1 ]
>   %conv.13 = phi i32 [ %conv.1, %land.end.2 ], [ %conv.14, %for.body ], [ %conv.14, %if.then1 ]

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 8, 2022

@llvm/issue-subscribers-debuginfo

@fhahn
Copy link
Contributor

fhahn commented Jun 8, 2022

cc @djtodoro it would be great if you could take a look

@djtodoro
Copy link
Collaborator

djtodoro commented Jun 17, 2022

I am not sure how much is it related with this patch (this may just have revealed a hidden problem), and how much is it related with the debugging information. Let me think/debug out loud.

I've taken a look, and what is different in these 2 cases is following.

When entering the LoopInvariantCodeMotion::runOnLoop (5th time), more precisely:

static void foreachMemoryAccess(MemorySSA *MSSA, Loop *L,
                                function_ref<void(Instruction *)> Fn) {
  for (const BasicBlock *BB : L->blocks())
    if (const auto *Accesses = MSSA->getBlockAccesses(BB))
      for (const auto &Access : *Accesses)
        if (const auto *MUD = dyn_cast<MemoryUseOrDef>(&Access))
          Fn(MUD->getMemoryInst());
}

The loop L is not in the same shape for with/out the llvm.dbg.declare() instruction cases:
without:

(gdb) p L->dump()
Loop at depth 1 containing: %for.body<header>,%if.then,%land.rhs,%land.end,%if.then1,%for.inc11<latch><exiting>,%for.body4.preheader,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2

with debug-info:

(gdb) p L->dump()
Loop at depth 1 containing: %for.body<header>,%if.then,%for.body4.preheader,%land.rhs,%land.end,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2,%if.then1,%for.inc11<latch><exiting>

Even though the function is the same for both:

(gdb) p L->getExitBlock()->getParent()->dump()
; Function Attrs: nofree norecurse nosync nounwind
define dso_local i32 @d() local_unnamed_addr addrspace(40) #0 {
entry:
  %.pr = load i32, i32* @b, align 4
  %tobool.not2 = icmp eq i32 %.pr, 0
  br i1 %tobool.not2, label %for.end13, label %for.body.preheader

for.body.preheader:                               ; preds = %entry
  %0 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 1), align 1
  %tobool5.not.2 = icmp eq i32 %0, 0
  br label %for.body

for.body:                                         ; preds = %for.body.preheader, %for.inc11
  %1 = phi i32 [ %dec12, %for.inc11 ], [ %.pr, %for.body.preheader ]
  br i1 icmp ne (i32 (i32, i32) addrspace(40)* @foo, i32 (i32, i32) addrspace(40)* null), label %if.then, label %for.inc11

if.then:                                          ; preds = %for.body
  br i1 icmp ne (i16 (i16, i16) addrspace(40)* @bar, i16 (i16, i16) addrspace(40)* null), label %if.then1, label %for.body4.preheader

for.body4.preheader:                              ; preds = %if.then
  %2 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 3), align 1
  %tobool5.not = icmp eq i32 %2, 0
  br i1 %tobool5.not, label %land.end, label %land.rhs

if.then1:                                         ; preds = %if.then
  store i32 0, i32* @a, align 4
  br label %for.inc11

land.rhs:                                         ; preds = %for.body4.preheader
  %3 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 4), align 1
  %tobool7 = icmp ne i32 %3, 0
  br label %land.end

land.end:                                         ; preds = %land.rhs, %for.body4.preheader
  %4 = phi i1 [ false, %for.body4.preheader ], [ %tobool7, %land.rhs ]
  %conv = zext i1 %4 to i32
  store i32 %conv, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 4), align 1
  %5 = load i32, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 2), align 1
  %tobool5.not.1 = icmp eq i32 %5, 0
  br i1 %tobool5.not.1, label %land.end.1, label %land.rhs.1

land.rhs.1:                                       ; preds = %land.end
  %tobool7.1 = icmp ne i32 %2, 0
  br label %land.end.1

land.end.1:                                       ; preds = %land.rhs.1, %land.end
  %6 = phi i1 [ false, %land.end ], [ %tobool7.1, %land.rhs.1 ]
  %conv.1 = zext i1 %6 to i32
  store i32 %conv.1, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 3), align 1
  br i1 %tobool5.not.2, label %land.end.2, label %land.rhs.2

land.rhs.2:                                       ; preds = %land.end.1
  %tobool7.2 = icmp ne i32 %5, 0
  br label %land.end.2

land.end.2:                                       ; preds = %land.rhs.2, %land.end.1
  %7 = phi i1 [ false, %land.end.1 ], [ %tobool7.2, %land.rhs.2 ]
  %conv.2 = zext i1 %7 to i32
  store i32 %conv.2, i32* getelementptr inbounds ([5 x i32], [5 x i32]* @c, i64 0, i64 2), align 1
  store i32 0, i32* @a, align 4
  br label %for.inc11

for.inc11:                                        ; preds = %land.end.2, %for.body, %if.then1
  %dec12 = add nsw i32 %1, -1
  %tobool.not = icmp eq i32 %dec12, 0
  br i1 %tobool.not, label %for.cond.for.end13_crit_edge, label %for.body

for.cond.for.end13_crit_edge:                     ; preds = %for.inc11
  store i32 0, i32* @b, align 4
  br label %for.end13

for.end13:                                        ; preds = %for.cond.for.end13_crit_edge, %entry
  ret i32 undef
}

The reason the order of the loads changes is this different structure/internal representation of the Loop (that implies different order of the Values in the vector that represents the candidates for promotion).

@djtodoro
Copy link
Collaborator

Adding --enable-new-pm=0 to the opt I see the same LLVM IR code for both with and without debug info (so the problem does not exist with the legacy PM).

@djtodoro
Copy link
Collaborator

The weird thing is that, in the debug info case, it somehow changes the loop form (but the LLVM IR Function is the same) outside the LICM Pass itself (may be in the LoopPassManager) from %for.body<header>,%if.then,%land.rhs,%land.end,%if.then1,%for.inc11<latch><exiting>,%for.body4.preheader,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2 to %for.body<header>,%if.then,%for.body4.preheader,%land.rhs,%land.end,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2,%if.then1,%for.inc11<latch><exiting>

@fhahn
Copy link
Contributor

fhahn commented Jul 7, 2022

The weird thing is that, in the debug info case, it somehow changes the loop form (but the LLVM IR Function is the same) outside the LICM Pass itself (may be in the LoopPassManager) from %for.body

,%if.then,%land.rhs,%land.end,%if.then1,%for.inc11,%for.body4.preheader,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2 to %for.body,%if.then,%for.body4.preheader,%land.rhs,%land.end,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2,%if.then1,%for.inc11

Hmm, is there a material difference or is the order of the blocks just different here? The order of the blocks may not necessarily be deterministic and is basically meaningless, as long as the identified blocks (header, latch , exiting) are the same.

It might be possible that we hit a limit in some utility due to the extra dbg instructions.

@djtodoro
Copy link
Collaborator

djtodoro commented Jul 7, 2022

The weird thing is that, in the debug info case, it somehow changes the loop form (but the LLVM IR Function is the same) outside the LICM Pass itself (may be in the LoopPassManager) from %for.body,%if.then,%land.rhs,%land.end,%if.then1,%for.inc11,%for.body4.preheader,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2 to %for.body,%if.then,%for.body4.preheader,%land.rhs,%land.end,%land.rhs.1,%land.end.1,%land.rhs.2,%land.end.2,%if.then1,%for.inc11

Hmm, is there a material difference or is the order of the blocks just different here? The order of the blocks may not necessarily be deterministic and is basically meaningless, as long as the identified blocks (header, latch , exiting) are the same.

The identified blocks are different.

It might be possible that we hit a limit in some utility due to the extra dbg instructions.

May be something like that... I suspect that something around the NewPM is problematic, I will try to find some time to take a look again into this. I guess it would be interesting to check code-coverage for both execution, to see where the difference is.

@djtodoro
Copy link
Collaborator

@fhahn this bug has been fixed recently. After we started checking if the CFGAnalyses was preserved within ADCEPass, we cannot see this anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants