Skip to content

Commit

Permalink
Workaround TestConcurrentMany* flakiness in a more pricipled way
Browse files Browse the repository at this point in the history
The flakiness on our local machines seems to come for a race in the kernel
between task_suspend and the creation of the Mach exceptions for the threads
that hit breakpoints. The debugserver code is written with the assumption
that the kernel will be able to provide us with all the exceptions for a
given task once task_suspend returns. On machines with higher core counts,
this seems not to be the case. The first batch of exceptions we get after
task_suspend does not contain exceptions for all the threads that have hit
a breakpoint, thus they get misreprorted in the first stop packet.

Adding a 1ms timeout to the call that retrieves the batch of exceptions
seems to workaround the issue reliably on our machines, and it shoulnd't
impact standard debugging scenarios too much (a stop will incur an additional
1ms delay). We'll be talking to the kernel team to figure out the right
contract for those APIs.

This patch also reverts part of Jonas' previous workaround for the
issue (r370785).

llvm-svn: 370916
  • Loading branch information
fredriss committed Sep 4, 2019
1 parent 5465875 commit cc5b509
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 2 deletions.
1 change: 0 additions & 1 deletion lldb/packages/Python/lldbsuite/test/make/pseudo_barrier.h
Expand Up @@ -7,7 +7,6 @@ static inline void pseudo_barrier_wait(pseudo_barrier_t &barrier) {
--barrier;
while (barrier > 0)
std::this_thread::yield();
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}

static inline void pseudo_barrier_init(pseudo_barrier_t &barrier, int count) {
Expand Down
2 changes: 1 addition & 1 deletion lldb/tools/debugserver/source/MacOSX/MachTask.mm
Expand Up @@ -754,7 +754,7 @@ static void get_threads_profile_data(DNBProfileDataScanType scanType,
// to get all currently available exceptions for this task
err = exception_message.Receive(
mach_task->ExceptionPort(),
MACH_RCV_MSG | MACH_RCV_INTERRUPT | MACH_RCV_TIMEOUT, 0);
MACH_RCV_MSG | MACH_RCV_INTERRUPT | MACH_RCV_TIMEOUT, 1);
} else if (periodic_timeout > 0) {
// We need to stop periodically in this loop, so try and get a mach
// message with a valid timeout (ms)
Expand Down

0 comments on commit cc5b509

Please sign in to comment.