Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flake in TestZerothFrame.py #96685

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

kendalharland
Copy link
Contributor

@kendalharland kendalharland commented Jun 25, 2024

This test is currently flaky on a local Windows amd64 build. The reason is that it relies on the order of process.threads but this order is nondeterministic:

If we print lldb's inputs and outputs while running, we can see that the breakpoints are always being set correctly, and always being hit:

runCmd: breakpoint set -f "main.c" -l 2
output: Breakpoint 1: where = a.out`func_inner + 1 at main.c:2:9, address = 0x0000000140001001

runCmd: breakpoint set -f "main.c" -l 7
output: Breakpoint 2: where = a.out`main + 17 at main.c:7:5, address = 0x0000000140001021

runCmd: run
output: Process 52328 launched: 'C:\workspace\llvm-project\llvm\build\lldb-test-build.noindex\functionalities\unwind\zeroth_frame\TestZerothFrame.test_dwarf\a.out' (x86_64)
Process 52328 stopped
* thread #1, stop reason = breakpoint 1.1
    frame #0: 0x00007ff68f6b1001 a.out`func_inner at main.c:2:9
   1    void func_inner() {
-> 2        int a = 1;  // Set breakpoint 1 here
                ^
   3    }
   4
   5    int main() {
   6        func_inner();
   7        return 0; // Set breakpoint 2 here

However, sometimes the backtrace printed in this test shows that the process is stopped inside NtWaitForWorkViaWorkerFactory from ntdll.dll:

Backtrace at the first breakpoint:
frame #0: 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
frame #1: 0x00007ffecc74585e ntdll.dll`RtlClearThreadWorkOnBehalfTicket + 862
frame #2: 0x00007ffecc3e257d kernel32.dll`BaseThreadInitThunk + 29
frame #3: 0x00007ffecc76af28 ntdll.dll`RtlUserThreadStart + 40

When this happens, the test fails with an assertion error that the stopped thread's zeroth frame's current line number does not match the expected line number. This is because the test is looking at the wrong thread: process.threads[0].

If we print the list of threads each time the test is run, we notice that threads are sometimes in a different order, within process.threads:

Thread 0: thread #4: tid = 0x9c38, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #2: tid = 0xa950, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 2: thread #1: tid = 0xab18, 0x00007ff64bc81001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 3: thread #3: tid = 0xc514, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

Thread 0: thread #3: tid = 0x018c, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #1: tid = 0x85c8, 0x00007ff7130c1001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 2: thread #2: tid = 0xf344, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 3: thread #4: tid = 0x6a50, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

Use self.thread() to consistently select the correct thread, instead.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Jun 25, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 25, 2024

@llvm/pr-subscribers-lldb

Author: Kendal Harland (kendalharland)

Changes

This test is currently flaky on a local Windows amd64 build.

If we print lldb's inputs and outputs while running, we can see that the breakpoints are always being set correctly, and always being hit:

runCmd: breakpoint set -f "main.c" -l 2
output: Breakpoint 1: where = a.out`func_inner + 1 at main.c:2:9, address = 0x0000000140001001

runCmd: breakpoint set -f "main.c" -l 7
output: Breakpoint 2: where = a.out`main + 17 at main.c:7:5, address = 0x0000000140001021

runCmd: run
output: Process 52328 launched: 'C:\workspace\llvm-project\llvm\build\lldb-test-build.noindex\functionalities\unwind\zeroth_frame\TestZerothFrame.test_dwarf\a.out' (x86_64)
Process 52328 stopped
* thread #<!-- -->1, stop reason = breakpoint 1.1
    frame #<!-- -->0: 0x00007ff68f6b1001 a.out`func_inner at main.c:2:9
   1    void func_inner() {
-&gt; 2        int a = 1;  // Set breakpoint 1 here
                ^
   3    }
   4
   5    int main() {
   6        func_inner();
   7        return 0; // Set breakpoint 2 here

However, sometimes the backtrace printed in this test shows that the process is stopped inside NtWaitForWorkViaWorkerFactory from ntdll.dll:

Backtrace at the first breakpoint:
frame #<!-- -->0: 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
frame #<!-- -->1: 0x00007ffecc74585e ntdll.dll`RtlClearThreadWorkOnBehalfTicket + 862
frame #<!-- -->2: 0x00007ffecc3e257d kernel32.dll`BaseThreadInitThunk + 29
frame #<!-- -->3: 0x00007ffecc76af28 ntdll.dll`RtlUserThreadStart + 40

If we print the list of threads each time the test is run, we notice that threads are sometimes in a different order, within process.threads:

Thread 0: thread #<!-- -->4: tid = 0x9c38, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #<!-- -->2: tid = 0xa950, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 2: thread #<!-- -->1: tid = 0xab18, 0x00007ff64bc81001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 3: thread #<!-- -->3: tid = 0xc514, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

Thread 0: thread #<!-- -->3: tid = 0x018c, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #<!-- -->1: tid = 0x85c8, 0x00007ff7130c1001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 2: thread #<!-- -->2: tid = 0xf344, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 3: thread #<!-- -->4: tid = 0x6a50, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

We're interested in whichever thread is executing a.out. If we print more info, we can see that this thread always has an IndexID of 1 regardless of where it appars in process.threads:

Thread 0: thread #<!-- -->3: tid = 0x2474, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 3
Thread 1: thread #<!-- -->2: tid = 0x2864, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 2
Thread 2: thread #<!-- -->4: tid = 0x9c90, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 4
Thread 3: thread #<!-- -->1: tid = 0xebbc, 0x00007ff643331001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 1

By switching from process.GetThreadAtIndex to
process.GetThreadByIndex we consistently retrieve the correct thread.

This raises a few more questions that might lead to a different followup solution:

  1. Why does our target thread always have an IndexID of 1? Why not 0 or any other value?
  2. Why is process.threads non-deterministically ordered?

Full diff: https://github.com/llvm/llvm-project/pull/96685.diff

1 Files Affected:

  • (modified) lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py (+4-3)
diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py
index f4e883d314644..7e4078bbe887f 100644
--- a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py
+++ b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py
@@ -53,14 +53,15 @@ def test(self):
         process = target.LaunchSimple(None, None, self.get_process_working_directory())
         self.assertTrue(process, VALID_PROCESS)
 
-        thread = process.GetThreadAtIndex(0)
+        thread = process.GetThreadByIndexID(1)
         if self.TraceOn():
             print("Backtrace at the first breakpoint:")
             for f in thread.frames:
                 print(f)
+
         # Check that we have stopped at correct breakpoint.
         self.assertEqual(
-            process.GetThreadAtIndex(0).frame[0].GetLineEntry().GetLine(),
+            thread.frame[0].GetLineEntry().GetLine(),
             bp1_line,
             "LLDB reported incorrect line number.",
         )
@@ -70,7 +71,7 @@ def test(self):
         # 'continue' command.
         process.Continue()
 
-        thread = process.GetThreadAtIndex(0)
+        thread = process.GetThreadByIndexID(1)
         if self.TraceOn():
             print("Backtrace at the second breakpoint:")
             for f in thread.frames:

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assuming the thread's index or index ID, it might be a good idea to have a little helper method that can look through the inferior's threads to find the thread with the property that we care about. In this case, we're looking for the thread that's stopped in a specific module (a.out in this case).

Concurrent execution and thread order is not guaranteed to be stable between runs, even if in practice they end up being that way on some platforms. The less we can rely on specific indices or IDs that "should" be correct, the better.

@kendalharland
Copy link
Contributor Author

kendalharland commented Jun 25, 2024

Instead of assuming the thread's index or index ID, it might be a good idea to have a little helper method that can look through the inferior's threads to find the thread with the property that we care about. In this case, we're looking for the thread that's stopped in a specific module (a.out in this case).

Concurrent execution and thread order is not guaranteed to be stable between runs, even if in practice they end up being that way on some platforms. The less we can rely on specific indices or IDs that "should" be correct, the better.

Sounds reasonable. I'll work that out with a helper method and resend for review once I upload a new commit. @bulbazord any idea how I can access the module from the SBThread object, to compare it against the name a.out? I'm having trouble finding the right API.

@JDevlieghere
Copy link
Member

Sounds reasonable. I'll work that out with a helper method and resend for review once I upload a new commit. @bulbazord any idea how I can access the module from the SBThread object, to compare it against the name a.out? I'm having trouble finding the right API.

The module is a property of the target, so you could do:

process = thread.GetProcess()
target = process.GetTarget()
module = target.GetModuleAtIndex(0)

or you can create an ExecutionContext from the thread, and get the target from that:

exe_ctx = SBExecutionContext(thread)
exe_ctx.target.GetModuleAtIndex(0)

@labath
Copy link
Collaborator

labath commented Jun 26, 2024

The thread that you care about is stopped at a breakpoint, right?
If so, then you can use the lldbutil.get_threads_stopped_at_breakpoint utility to find it...

@kendalharland kendalharland force-pushed the kendal/fix-test-zeroeth-frame branch 2 times, most recently from 1ca25a2 to 836b5ee Compare June 27, 2024 16:57
@kendalharland
Copy link
Contributor Author

kendalharland commented Jun 27, 2024

The thread that you care about is stopped at a breakpoint, right? If so, then you can use the lldbutil.get_threads_stopped_at_breakpoint utility to find it...

Thanks, this eventually led to me what I think is the most obvious solution: We can just set the breakpoints in this test while holding onto to the SBBreakpoint objects, and then fetch the associated thread from those breakpoints later, and compare the thread's frames' current line numbers.

Thanks @JDevlieghere and @bulbazord for the additional suggestions!

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine, although looking at the test case again, I think that even simply replacing process.GetThreadAtIndex(0) with self.thread() should be enough. self.thread() returns the "selected" thread and lldb should always be selecting the thread which has stopped "for a reason" (e.g. a breakpoint) and not a random (or first) thread in the process.

@kendalharland
Copy link
Contributor Author

This should be fine, although looking at the test case again, I think that even simply replacing process.GetThreadAtIndex(0) with self.thread() should be enough. self.thread() returns the "selected" thread and lldb should always be selecting the thread which has stopped "for a reason" (e.g. a breakpoint) and not a random (or first) thread in the process.

This also works. Agreed it's much better. Ty for the suggestion! Updated.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Are you able to merge this on your own?

@kendalharland
Copy link
Contributor Author

Looks good. Are you able to merge this on your own?

Nope, I'll need someone with write access to do it.

Copy link

github-actions bot commented Jul 2, 2024

✅ With the latest revision this PR passed the Python code formatter.

@labath
Copy link
Collaborator

labath commented Jul 2, 2024

Looks good. Are you able to merge this on your own?

Nope, I'll need someone with write access to do it.

Sounds good. Can you just patch the formatter changes in, and then I'll submit it.

@bulbazord, if you don't respond, I'm going to assume that new version addresses your comments as well.

@bulbazord
Copy link
Member

Yes, looks good to me. Thanks for taking the time! :)

@kendalharland
Copy link
Contributor Author

Sure thing, I hadn't hooked up the formatter yet so I'll run it and reupload. Thanks for the reviews!

@kendalharland kendalharland force-pushed the kendal/fix-test-zeroeth-frame branch 2 times, most recently from e5f11c6 to b880f6d Compare July 3, 2024 18:41
@kendalharland
Copy link
Contributor Author

Applied formatter patch and rebased onto main

@kendalharland kendalharland force-pushed the kendal/fix-test-zeroeth-frame branch 2 times, most recently from 401e4a8 to 0ec149a Compare July 3, 2024 19:09
This test is relying on the order of `process.threads` which is
nondeterministic. By selecting the thread based on whether it is
stopped at our breakpoint we can reliably select the correct one.
@kendalharland kendalharland force-pushed the kendal/fix-test-zeroeth-frame branch from 0ec149a to e3715f9 Compare July 3, 2024 19:10
@kendalharland
Copy link
Contributor Author

kendalharland commented Jul 3, 2024

Apologies, I pushed the wrong commits to this PR and just had the fix them up. Should be good to go.

@kendalharland kendalharland requested a review from labath July 3, 2024 23:06
@kendalharland
Copy link
Contributor Author

I'll need help merging this since I don't have write access to the repo.

@labath labath merged commit 4c23625 into llvm:main Jul 8, 2024
6 checks passed
Copy link

github-actions bot commented Jul 8, 2024

@kendalharland Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@kendalharland kendalharland deleted the kendal/fix-test-zeroeth-frame branch July 8, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants