Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race_condition failing randomly #9212

Open
0xFFFC0000 opened this issue Mar 2, 2024 · 1 comment
Open

race_condition failing randomly #9212

0xFFFC0000 opened this issue Mar 2, 2024 · 1 comment

Comments

@0xFFFC0000
Copy link
Collaborator

0xFFFC0000 commented Mar 2, 2024

I have noticed recently that the race_condition test fails randomly. Here is a list of runs that have failed, some of these are even document-only PRs:

  1. https://github.com/monero-project/monero/actions/runs/8032901697/job/21944925602
  2. https://github.com/monero-project/monero/actions/runs/8070978087/job/22050319484
  3. https://github.com/monero-project/monero/actions/runs/8121455963/job/22200074675
  4. https://github.com/monero-project/monero/actions/runs/8031392481/job/21941541359
  5. https://github.com/monero-project/monero/actions/runs/8031412684/job/21942084680
  6. https://github.com/monero-project/monero/actions/runs/8070978087/job/22050319484
  7. https://github.com/monero-project/monero/actions/runs/8032901697/job/21944925602
  8. https://github.com/monero-project/monero/actions/runs/7993111361/job/21828933742

In the meantime I have left my PC to run unit_tests indefinitely under the debugger with the parameters:

--gtest_break_on_failure --gtest_filter=-apply_permutation*:AddressFromTXT*:base58*:bulletproofs*:decompose_amount_into_digits_test*:device*:checkpoints_is_alternative_block_allowed*:bulletproof*:canonical_amounts*:bootstrap_node_selector*:chacha8*:block_reward_and_last_block_weights*:block_reward_and_current_block_weight*:block_reward_and_already_generated_coins*:socks*:DNSResolver*:DNS_PUBLIC*:DNS*:uri*:test_epee_connection*:ringct*:zmq*:reader_writer_lock*:sha256*:multisig*:logging*:select_outputs --gtest_repeat=-1

But so far no luck.

In case anyone has any info on this, I appreciate it if shares it.

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 2, 2024

This bug happens rarely. The reason for this bug is if join throws an exception (due to low-level blocking of the thread) and is not successful here [1] to end the thread. That worker thread will be alive. Once leaving the scope, the error will happen [2].

io_context.stopped() returns false for the cases that race_condition error happens, showing that io_context has not stopped. As the screenshot shows.

image

In an ordinary run, where there are no errors, io_context.stopped() returns true. Look at the screenshot at the end of this comment.

This one-line PR will solve this problem [3].

  1. 0xFFFC0000@c0b0742#diff-0058af414c774eb3a4a62a610caba46f573f690c45a4a5dda734e9d7bcefad4aR1115
  2. https://stackoverflow.com/a/7989043
  3. Dev/0xfffc/fix race condition node server 0xFFFC0000/monero#6

This is debug info from a correct run which does not result in race_condition:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant