Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presence of dbg.declare affects result from DeadStoreElimination #58285

Closed
mikaelholmen opened this issue Oct 11, 2022 · 26 comments
Closed

Presence of dbg.declare affects result from DeadStoreElimination #58285

mikaelholmen opened this issue Oct 11, 2022 · 26 comments

Comments

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Oct 11, 2022

llvm commit: ac47db6
Reproduce with:

opt -passes="default<O2>" -S -o - bbi-74540_2.ll

Result:

define i16 @m() local_unnamed_addr #0 {
[...]
if.then.i.i:                                      ; preds = %entry
  store i32 1, ptr getelementptr inbounds ([1 x [1 x [1 x i32]]], ptr @g, i64 1, i64 0, i64 0, i64 0), align 1
  %t7.i.i = load i16, ptr @h, align 1
  %0 = and i16 %t7.i.i, 1
  store i16 %0, ptr @h, align 1
  %tobool.not.i.i = icmp eq i16 %0, 0
  br i1 %tobool.not.i.i, label %j.exit, label %if.then7.i.i

if.then7.i.i:                                     ; preds = %if.then.i.i
  %t8.i.i = load i64, ptr @i, align 1
  %inc.i.i = add i64 %t8.i.i, 1
  store i64 %inc.i.i, ptr @i, align 1
  br label %j.exit

j.exit:                                           ; preds = %entry.j.exit_crit_edge, %if.then.i.i, %if.then7.i.i

However, if we comment out the call to the dbg.declare in the input, we iinstead get:

define i16 @m() local_unnamed_addr #0 {
[...]
if.then.i.i:                                      ; preds = %entry
  store i32 1, ptr getelementptr inbounds ([1 x [1 x [1 x i32]]], ptr @g, i64 1, i64 0, i64 0, i64 0), align 1
  %t7.i.i = load i16, ptr @h, align 1
  %0 = and i16 %t7.i.i, 1
  store i16 %0, ptr @h, align 1
  br label %j.exit

j.exit:                                           ; preds = %if.then.i.i, %entry.j.exit_crit_edge

So with the dbg.declare present, the bb if.then7.i.i and its contents are left in the code.
bbi-74540_2.ll.gz

@mikaelholmen
Copy link
Collaborator Author

mikaelholmen commented Oct 11, 2022

I've done some debugging and found the following:

I see a difference with/without debuginfo in that the dbg.declare is actually removed by the ADCE pass, which then makes the debug and non-debug-info cases look the same(!).
But following that, DSE then removes the contents of the if.then7.i.i block in the non-debug case, but in the debug case (where the input to DSE looks the same), the contents of if.then7.i.i are kept.

The problem seems to be related to DSE having some limit(s) where it cuts of searches.
In this case the debug variance goes away if we increase

static cl::opt<unsigned> MemorySSAUpwardsStepLimit(
    "dse-memoryssa-walklimit", cl::init(90), cl::Hidden,
    cl::desc("The maximum number of steps while walking upwards to find "
             "MemoryDefs that may be killed (default = 90)"));

So, what I think happens is that GVN does a rewrite of the code and something happens with internal data structures (memoryssa?). Then when we get to DSE, we get a different iteration order, and we get

  trying to get dominating access
   visiting 18 = MemoryDef(17) (  store i64 %inc.i.i, ptr @i, align 1)
   ...  hit walker step limit

but when we don't have the dbg.declare we instead examine 18 earlier and get:

  trying to get dominating access
   visiting 18 = MemoryDef(17) (  store i64 %inc.i.i, ptr @i, align 1)
  Checking for reads of 18 = MemoryDef(17) (  store i64 %inc.i.i, ptr @i, align 1)
   20 = MemoryPhi({if.then7.i.i,18},{if.then.i.i,17},{crit_edge,15})
    ... adding PHI uses
   19 = MemoryDef(20) (  store i64 %conv2, ptr @i, align 1)
    ... skipping killing def/dom access
 Checking if we can kill 18 = MemoryDef(17) (  store i64 %inc.i.i, ptr @i, align 1)
DSE: Remove Dead Store:
  DEAD:   store i64 %inc.i.i, ptr @i, align 1
  KILLER:   store i64 %conv2, ptr @i, align 1

So in a quite complicated and sort of unlucky way, the dbg.declare indirectly affects what code we get.

@fhahn
Copy link
Contributor

fhahn commented Oct 11, 2022

@mikaelholmen is it possible you attached the wrong file? It's named bbi-74541.ll not bbi-74540_2.ll like in the command you shared. The one you attached optimizes to unreachable.

@mikaelholmen
Copy link
Collaborator Author

mikaelholmen commented Oct 12, 2022

@fhahn: Yes, I messed up, thanks!
Now attaching the right file I hope
bbi-74540_2.ll.gz

Also updated the original post to contain the right file.

@fhahn
Copy link
Contributor

fhahn commented Oct 12, 2022

This is an interesting one. I think the issue is that ADCE invalidates MSSA due to making a change. This means with the dbg intrinsic, DSE will use a freshly computed version of MSSA while otherwise it re-uses the one computed earlier, which may have some optimizations applied.

Not sure yet what the best fix would be here.

@fhahn
Copy link
Contributor

fhahn commented Oct 12, 2022

Slightly reduced reproducer: https://llvm.godbolt.org/z/b6f1a34EM

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 22, 2022

@llvm/issue-subscribers-debuginfo

@mikaelholmen
Copy link
Collaborator Author

This is an interesting one. I think the issue is that ADCE invalidates MSSA due to making a change. This means with the dbg intrinsic, DSE will use a freshly computed version of MSSA while otherwise it re-uses the one computed earlier, which may have some optimizations applied.

Would it be ok to not invalidate existing analyses in case ADCE only has removed dbg intrinsics? That would at least be an easy fix. But maybe we really need to invalidate even in that case?

@dwblaikie
Copy link
Collaborator

(shooting from the hip/driveby commentary)

The problem seems to be related to DSE having some limit(s) where it cuts of searches.

That limit should be applied ignoring debug info intrinsics (eg: should be N non-dbg instructions - there are iterators that expose only these non-dbg instructions)

Would it be ok to not invalidate existing analyses in case ADCE only has removed dbg intrinsics?

Probably ADCE should'nt touch the dbg intrinsics if it isn't touching any code?

@mikaelholmen
Copy link
Collaborator Author

Probably ADCE should'nt touch the dbg intrinsics if it isn't touching any code?

Thanks @dwblaikie! Maybe so. I actually tried that first but got a little put off when I saw that in e.g. @f() in
test/Transforms/MergeFunc/mergefunc-preserve-debug-info.ll such a change would leave

         36:  call void @llvm.dbg.value(metadata i32 1, metadata !21, metadata !DIExpression()), !dbg !32 
         37:  call void @llvm.dbg.value(metadata i32 9, metadata !23, metadata !DIExpression()), !dbg !32 
         38:  call void @llvm.dbg.declare(metadata ptr undef, metadata !34, metadata !DIExpression()), !dbg !35 
         39:  call void @llvm.dbg.declare(metadata ptr undef, metadata !36, metadata !DIExpression()), !dbg !37 
         40:  call void @llvm.dbg.value(metadata i32 9, metadata !38, metadata !DIExpression()), !dbg !32 

which would otherwise be removed. So I was worried about what kind of bloating would occur if ADCE wouldn't be able to remove them.
But perhaps it's the right thing to do anyway?

@dwblaikie
Copy link
Collaborator

Maybe - I'm not 100% sure either way.

Might be that the right solution to all those dbg intrinsics is for them to be cleaned up closer to when they became dead?

DSE will use a freshly computed version of MSSA while otherwise it re-uses the one computed earlier, which may have some optimizations applied.

That bit's confusing - recomputing an analysis versus using an old one shouldn't produce different results. So an optimization removing only dbg intrinsics and reporting "yeah, I changed stuff", causing an analysis to be invalidated and recomputed, shouldn't change the result of some future non-debug optimization?

@nikic
Copy link
Contributor

nikic commented Feb 16, 2023

That bit's confusing - recomputing an analysis versus using an old one shouldn't produce different results.

That's the ideal, but the actual requirement is just the the analysis produces correct results, though they might be better or worse than recomputing from scratch. For some analyses, there isn't even any sensible notion of what the "right" results are, because they are query-order dependent (such as SCEV or LVI, not sure whether this affects MemorySSA).

@dwblaikie
Copy link
Collaborator

That bit's confusing - recomputing an analysis versus using an old one shouldn't produce different results.

That's the ideal, but the actual requirement is just the the analysis produces correct results, though they might be better or worse than recomputing from scratch. For some analyses, there isn't even any sensible notion of what the "right" results are, because they are query-order dependent (such as SCEV or LVI, not sure whether this affects MemorySSA).

Ah, fascinating. Not sure what to do with that, then - yeah, not sure what the right call is to remove debug info IR - respond with "this didn't change the IR" perhaps on the rationale that that response should only account for non-debug-info IR... - or some other path.

@mikaelholmen
Copy link
Collaborator Author

Thanks for the discussion and explanations!

There are probably several other different solutions to this as well but the most obvious to me look like either

  1. Let ADCE remove debug intrinsics as it does today, but only invalidate analyses if some non-debug instruction was removed as well.
    or
  2. Only let ADCE remove debug intrinsics if it also removes some non-debug instruction.

If 1) is correct/allowed I think prefer that, but I don't know if it is?
I've at least managed to run a lot of tests downstream with 1) without seeing problems but that's no real proof.

What do you say?

@bjope
Copy link
Collaborator

bjope commented Feb 17, 2023

That bit's confusing - recomputing an analysis versus using an old one shouldn't produce different results.

That's the ideal, but the actual requirement is just the the analysis produces correct results, though they might be better or worse than recomputing from scratch. For some analyses, there isn't even any sensible notion of what the "right" results are, because they are query-order dependent (such as SCEV or LVI, not sure whether this affects MemorySSA).

That was a new thing for me as well. Is that requirement written down somewhere?

All these times when I've tried to narrow down pass dependency problems with stale analyses etc, and having tried to bisect by adding invalidations of analysis in various places in the pipeline to find out when the IR differs. All those times I've really had no guarantees that different IR really would indicate a bug or not. Interesting.

@bjope
Copy link
Collaborator

bjope commented Feb 17, 2023

Debug info is of course special as we want to get same codegen independent of debug info or not.
But still, why would ADCE invalidate analyses like MemorySSA if only removing a dbg intrinsic. And similarly, if there is a dead or-instruction that it removes I guess that won't impact MemorySSA.

So given the news that re-computing MemorySSA isn't guaranteed to give same result as before, then I assume that we either want to invalidate it more consistently regardless of what a pass like ADCE is doing (to make debugging easier and avoiding depending on what earlier passes did) , or we should only invalidate the analysis when doing something that makes the analysis invalid.

I mean, if we introduce some analysis that is about debug info., then we can't just say that ADCE did nothing and all analyses should be kept. Unless the pass manager is changed to return two statuses "did not change anything relate to non-dbg" vs "did not change anything related to dbg".

@mikaelholmen
Copy link
Collaborator Author

Thanks for the discussion and explanations!

There are probably several other different solutions to this as well but the most obvious to me look like either

1. Let ADCE remove debug intrinsics as it does today, but only invalidate analyses if some non-debug instruction was removed as well.
   or

2. Only let ADCE remove debug intrinsics if it also removes some non-debug instruction.

If 1) is correct/allowed I think prefer that, but I don't know if it is? I've at least managed to run a lot of tests downstream with 1) without seeing problems but that's no real proof.

What do you say?

Since we see this problem fairly often in downstream testing I'd like to try to move it forward in some way.

Do you think 1) above is correct/allowed?
If we think it is I can put up a patch for that to at least get a short term fix even if the ideal solution is to do something more elaborate.

@dwblaikie
Copy link
Collaborator

@adrian-prantl @JDevlieghere for debug info thoughts
@aeubanks for pass manager thoughts - how should we handle passes wanting to clean up debug info, while remaining debug-info invariant for the non-debug instructions/analysis resulrts/etc?

@aeubanks
Copy link
Contributor

given the "analyses just need to be correct, but may be different across computations" state of LLVM, I don't think there's a principled way of handling this; it'll have to be case by case to ensure that removing debug intrinsics causes the same analysis computation as not having debug intrinsics

if we just care about MemorySSA, the pass can just say that it preserves MemorySSA if it only removed debug intrinsics (assuming MemorySSA doesn't care about debug intrinsics).

if other analyses run into similar issues, we can introduce another AnalysisSetKey, say DebugInfoInvariantAnalyses, (e.g. PreservedAnalyses::preserveSet<CFGAnalyses>() preserves all analyses that only care about control flow), then have analyses like MemorySSA not be invalidated when that DebugInfoInvariantAnalyses is preserved

but the "don't touch debug intrinsics unless other instructions are modified" approach seems better to me

@pogo59
Copy link
Collaborator

pogo59 commented Feb 28, 2023

In the long run, there's an ongoing effort to eliminate debug-info instructions entirely.
But in the meantime, yeah, don't touch debug intrinsics unless real instructions are modified.

@mikaelholmen
Copy link
Collaborator Author

Thanks everyone.
I put up
https://reviews.llvm.org/D145051
as an attempt to do something about this.

mikaelholmen added a commit that referenced this issue Mar 3, 2023
…oved

We now limit ADCE to only remove debug intrinsics if it does something else
that would invalidate cached analyses anyway.
As we've seen in
 #58285
throwing away cached analysis info when only debug instructions are removed
can lead to different code when debug info is present or not present.

Differential Revision: https://reviews.llvm.org/D145051
@mikaelholmen
Copy link
Collaborator Author

Closing as I pushed a fix for this in 8aa9ab3
[ADCE] Only remove debug intrinsics if non debug instructions are removed

Thanks!

@mikaelholmen
Copy link
Collaborator Author

Reverted the fix in f5097ed

Revert "[ADCE] Only remove debug intrinsics if non debug instructions are removed"

This reverts commit 8aa9ab336889ae2eb8e4188036faeb151379ab7b.

Reverting due to compile-time regressions as pointed out in
 https://reviews.llvm.org/D145051#4166656
E.g.
 "In particular tramp3d-v4 with debuginfo regressed by 15%."

@mikaelholmen mikaelholmen reopened this Mar 3, 2023
@mikaelholmen
Copy link
Collaborator Author

Since I reverted the first fix due to compile-time regressions we need anyother way forward here.

@nikic suggested

I think what we can do instead is to mark MemorySSA as preserved if only debuginfo intrinsics have been removed. (This is
safe, MSSA explicitly never tracks debuginfo intrinsics.)

in https://reviews.llvm.org/D145051#4166656

Does anyone object to that or is that the way to go? @aeubanks ?

@aeubanks
Copy link
Contributor

aeubanks commented Mar 6, 2023

that's fine as a workaround especially if there are efforts to remove debug info instructions altogether

(although it would be good to know exactly why that patch caused major compile time regressions)

@mikaelholmen
Copy link
Collaborator Author

Another attempt here then:
https://reviews.llvm.org/D145478

mikaelholmen added a commit that referenced this issue Mar 8, 2023
As we've seen in
 #58285
throwing away MemorySSA when only debug instructions are removed can lead
to different code when debug info is present or not present.

This is the second attempt at fixing the problem. The first one was
 https://reviews.llvm.org/D145051
which was reverted due to a 15% compile time regression for tramp3d-v4
in NewPM-ReleaseLTO-g.
mikaelholmen added a commit that referenced this issue Mar 8, 2023
As we've seen in
 #58285
throwing away MemorySSA when only debug instructions are removed can lead
to different code when debug info is present or not present.

This is the second attempt at fixing the problem. The first one was
 https://reviews.llvm.org/D145051
which was reverted due to a 15% compile time regression for tramp3d-v4
in NewPM-ReleaseLTO-g.

Differential Revision: https://reviews.llvm.org/D145478
@mikaelholmen
Copy link
Collaborator Author

I pushed a fix in b9337b1

[ADCE] Preserve MemorySSA if only debug instructions are removed

As we've seen in
 https://github.com/llvm/llvm-project/issues/58285
throwing away MemorySSA when only debug instructions are removed can lead
to different code when debug info is present or not present.

This is the second attempt at fixing the problem. The first one was
 https://reviews.llvm.org/D145051
which was reverted due to a 15% compile time regression for tramp3d-v4
in NewPM-ReleaseLTO-g.

Differential Revision: https://reviews.llvm.org/D145478

so I'll close this again and hope it can stay that way. :)

Thanks for the help everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants