Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo][DivRemPairs] Wrong branch execution caused by hoisting an instruction #85078

Open
Apochens opened this issue Mar 13, 2024 · 12 comments

Comments

@Apochens
Copy link
Contributor

This bug caused by hoisting the instruction DivInst to the conditional predecessor without dropping its debug location. Here is the buggy line.

Please check this @SLTozer @dwblaikie . If this is confirmed, I would like to give the patch to it.

int func(int a, int b) {
    int rem = a % b;
    int ret = 3;
    if (rem == 42) {
        ret = a / b;
    } 
    return ret;
}

int main() {
    func(4, 2);
}
$ clang -S -emit-llvm -Xclang -disable-O0-optnone main.c -o main.ll -g
$ opt -S -passes=mem2reg,div-rem-pairs -debug main.ll -o main-opt.ll
$ clang main-opt.ll
$ mylldb a.out 
(lldb) target create "a.out"
Current executable set to '/data1/llvm-test/DivRemPairs/a.out' (x86_64).
(lldb) b func
Breakpoint 1: where = a.out`func + 9 at main.c:2:17, address = 0x0000000000001139
(lldb) r
Process 2794232 launched: '/data1/llvm-test/DivRemPairs/a.out' (x86_64)
Process 2794232 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000555555555139 a.out`func(a=4, b=2) at main.c:2:17
   1    int func(int a, int b) {
-> 2        int rem = a % b;
   3        int ret = 3;
   4        if (rem == 42) {
   5            ret = a / b;
   6        } 
   7        return ret;
(lldb) s
Process 2794232 stopped
* thread #1, name = 'a.out', stop reason = step in
    frame #0: 0x0000555555555142 a.out`func(a=4, b=2) at main.c:5:17
   2        int rem = a % b;
   3        int ret = 3;
   4        if (rem == 42) {
-> 5            ret = a / b;
   6        } 
   7        return ret;
   8    }
(lldb) v rem
(int) rem = 0
@llvmbot
Copy link

llvmbot commented Mar 13, 2024

@llvm/issue-subscribers-debuginfo

Author: Shan Huang (Apochens)

This bug caused by hoisting the instruction `DivInst` to the conditional predecessor without dropping its debug location. Here is the [buggy line](https://github.com/llvm/llvm-project/blob/9fa866020395b215ece6140c2fedc7c31950272c/llvm/lib/Transforms/Scalar/DivRemPairs.cpp#L298).

Please check this @SLTozer @dwblaikie . If this is confirmed, I would like to give the patch to it.

int func(int a, int b) {
    int rem = a % b;
    int ret = 3;
    if (rem == 42) {
        ret = a / b;
    } 
    return ret;
}

int main() {
    func(4, 2);
}
$ clang -S -emit-llvm -Xclang -disable-O0-optnone main.c -o main.ll -g
$ opt -S -passes=mem2reg,div-rem-pairs -debug main.ll -o main-opt.ll
$ clang main-opt.ll
$ mylldb a.out 
(lldb) target create "a.out"
Current executable set to '/data1/llvm-test/DivRemPairs/a.out' (x86_64).
(lldb) b func
Breakpoint 1: where = a.out`func + 9 at main.c:2:17, address = 0x0000000000001139
(lldb) r
Process 2794232 launched: '/data1/llvm-test/DivRemPairs/a.out' (x86_64)
Process 2794232 stopped
* thread #<!-- -->1, name = 'a.out', stop reason = breakpoint 1.1
    frame #<!-- -->0: 0x0000555555555139 a.out`func(a=4, b=2) at main.c:2:17
   1    int func(int a, int b) {
-&gt; 2        int rem = a % b;
   3        int ret = 3;
   4        if (rem == 42) {
   5            ret = a / b;
   6        } 
   7        return ret;
(lldb) s
Process 2794232 stopped
* thread #<!-- -->1, name = 'a.out', stop reason = step in
    frame #<!-- -->0: 0x0000555555555142 a.out`func(a=4, b=2) at main.c:5:17
   2        int rem = a % b;
   3        int ret = 3;
   4        if (rem == 42) {
-&gt; 5            ret = a / b;
   6        } 
   7        return ret;
   8    }
(lldb) v rem
(int) rem = 0

@SLTozer
Copy link
Contributor

SLTozer commented Mar 13, 2024

This looks like a bug - for both the div and the rem it looks like we're necessarily hoisting them past a conditional branch. Though is div/rem is another pair of instructions that could trap? I'm not sure if this is a case where we'd consider the debug locations to be worth preserving - it doesn't help that we don't have a formal "rule" about when to choose accurate backtraces over accurate stepping.

@jryans
Copy link
Member

jryans commented Mar 13, 2024

it doesn't help that we don't have a formal "rule" about when to choose accurate backtraces over accurate stepping.

I don't want to totally derail this issue, but I've been thinking lately we may need an additional "line table tuning" option so the user could specify their use case ("debugging", "profiling", etc.) which may then allow for different choices in how we produce the line table. Anyway, I'll save further discussion for a separate thread some day. 🙂

@jmorse
Copy link
Member

jmorse commented Mar 13, 2024

I'd guess that a % b dominating the division is relied upon to guarantee that b is nonzero. However, I'd also guess that there are inputs where the mod/rem instruction can be subsequently DCE'd and cause crashy inputs to crash on the wrong line, which would be infuriating (though probably rare).

@Apochens as you're reporting a lot of these faults, would you be able to group them under an umbrella bug. On bugzilla we used to be able to connect bugs as blocking/depending, which might not be possible on github, but we can put a reference to a relevant bug into a comment. That'll help verifying/testing when there's a plan to fix this.

@Apochens
Copy link
Contributor Author

It seems I am trapped into the war between stepping and backtraces. 😂

@Apochens as you're reporting a lot of these faults, would you be able to group them under an umbrella bug. On bugzilla we used to be able to connect bugs as blocking/depending, which might not be possible on github, but we can put a reference to a relevant bug into a comment. That'll help verifying/testing when there's a plan to fix this.

@jmorse I am willing to do this. Are there some examples that could help me do this properly?

@SLTozer
Copy link
Contributor

SLTozer commented Mar 14, 2024

I don't want to totally derail this issue, but I've been thinking lately we may need an additional "line table tuning" option so the user could specify their use case ("debugging", "profiling", etc.) which may then allow for different choices in how we produce the line table.

I agree, this is something that I may be working on in the near future - there are competing concerns for line information that could be better tailored to each consumer's needs. There aren't actually that many callsites where we update DebugLocs on existing instructions AFAICT, so this could be done in the not-too-distant future.

@dwblaikie
Copy link
Collaborator

This looks like a bug - for both the div and the rem it looks like we're necessarily hoisting them past a conditional branch. Though is div/rem is another pair of instructions that could trap? I'm not sure if this is a case where we'd consider the debug locations to be worth preserving - it doesn't help that we don't have a formal "rule" about when to choose accurate backtraces over accurate stepping.

I don't want to totally derail this issue, but I've been thinking lately we may need an additional "line table tuning" option so the user could specify their use case ("debugging", "profiling", etc.) which may then allow for different choices in how we produce the line table. Anyway, I'll save further discussion for a separate thread some day.

I've missed a step here - I thought we did have a fairly well formalized process here: https://llvm.org/docs/HowToUpdateDebugInfo.html

And that there isn't a tradeoff between accurate backtraces and accurate stepping - the goal is not to show code as reachable that is not reachable. That's as true for human users as it is for machine uses - at least that's been the principle of LLVM's debug info line information for some time now.

How does 'line table tuning'/backtraces-versus-stepping uncertainty apply in this case?

I do agree that it's unfortunate when we might hoist code that subsequently crashed (merge a load from two distinct branches of an if/else, then the load crashes/asan-fails) and can't tell the user either of the lines that caused it. But the alternative of possibly telling them the wrong one seems problematic too. As we've discussed - it'd be valuable to have some way to describe a location that is speculated or sunk or otherwise "this is where the instruction came from but not where it ended up" - both for human users (eg: a crash report, so it could include this nuance in the message) and for machine uses (eg: a profiler, so it could know to ignore such a location). But in the absence of such an ability to differentiate, I don't think it'd be suitable to have this information - for humans or machines - it'd be too misleading. And I thought we were all pretty well in agreement on that for some time/that was/is LLVM's philosophy... - so I'm a bit confused.

@SLTozer
Copy link
Contributor

SLTozer commented Mar 19, 2024

You're correct, the rules for updating debug info do not seek to preserve debug locations for instructions on the basis of backtrace quality. We've recently been looking at debug location updates with these concerns in mind and have tried to give more weight towards preserving locations when they may be important for backtraces (mostly memory instructions). I think there's value in this, but as discussed we may be able to introduce improvements that would make that idea redundant.

Even before that I think @jryans point about adding a flag would be a more suitable solution for what we want; adding a flag that requests that LLVM preserve debug locations for instructions that may trap allows a user who specifically wants improved crash info to get that without introducing incorrect stepping/profiling information by default (I'd be happy to open a patch for that since I've already done an appreciable amount of the work for the purpose of instrumenting dropped debug locations). With that in mind, I'm fine to say we should follow the rules and drop debug locations for these hoists.

@jryans
Copy link
Member

jryans commented Mar 19, 2024

For this issue, following the existing rules is likely the right thing to do, and they are of course the current policy, so any potential changes should have a larger discussion (e.g. RFC).

My comment suggesting a flag for use cases that are sometimes in conflict was coming from a more general place than just this issue alone (hence my worry about derailing things here). It would be better to discuss that further in its own issue / PR / RFC where multiple justifying examples can be provided and debated.

@adrian-prantl
Copy link
Collaborator

adrian-prantl commented Mar 20, 2024

The idea of a separate tuning mode profiling is something that has come up before.

Depending on what the two alternative locations are, it may be possible to just introduce a bit in the line table: E.g., if the the options are "no location" and "some location", we could have a "backtrace-only" bit that instructs profilers and stepping to ignore this location.

@SLTozer
Copy link
Contributor

SLTozer commented Mar 21, 2024

Depending on what the two alternative locations are, it may be possible to just introduce a bit in the line table

I agree - I've been looking at debug line issues recently and it seems like if we want to preserve as much line information as possible then we need to be able to describe this case, and nothing in the current DWARF line table is suitable; I'm currently trying to estimate the cost-benefit of making that change, so I may open a discussion for this later.

@dwblaikie
Copy link
Collaborator

Yes, a bit is basically what's needed - though I'd hope/suggest that this isn't a "profiling V interactive debugging" separation - even for human debugger users, I think the distinction is important (that's been how we've framed things so far - it was valuable to Sony to reduce noisy stepping, but also I think beyond that it's useful for users to avoid incorrect conclusions about reachability/truth of conditions, etc). It's as confusing to users as it would be to profilers - the code isn't actually reachable.

So hopefully if/when we add some kind of flag for this, debuggers, sanitizers, etc can render this to a user in such a way that users have a better chance of not being confused by the reachability/truthiness of any condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants