Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix thread status synchronization in thread_list_test #7825

Closed
wants to merge 1 commit into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Dec 31, 2020

The test was flaky because the BG threads could increase
running_count_ up to job_count_ before applying their thread status
updates. Then the test thread would see non-deterministic results when
counting threads with each status. The fix is to acquire mutex in test
thread so it sees running_count_ and thread status updated atomically.
I think simply reordering the two updates would have been insufficient
since the thread status update uses memory_order_relaxed. This change
happens to also eliminate an undesirable sleep loop.

Test Plan: injected sleeps to verify the failure repros before this PR and does not
repro after.

The test was flaky because the BG threads could increase
`running_count_` up to `job_count_` before applying their thread status
updates. Then the test thread would see non-deterministic results when
counting threads with each status. The fix is to acquire mutex in test
thread so it sees `running_count_` and thread status updated atomically.
I think simply reordering the two updates would have been insufficient
since the thread status update uses `memory_order_relaxed`. This change
happens to also eliminate an undesirable sleep loop.

Test Plan: injected sleeps to verify the failure repros before this PR and does not
repro after.
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from jay-zhuang January 4, 2021 07:14
@ajkr
Copy link
Contributor Author

ajkr commented Jan 4, 2021

The test failure is unrelated:

g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
make: *** [db/db_basic_test.o] Error 4

Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ajkr
Copy link
Contributor Author

ajkr commented Jan 4, 2021

Thanks for the review!

@facebook-github-bot
Copy link
Contributor

@ajkr merged this pull request in 61e3244.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
The test was flaky because the BG threads could increase
`running_count_` up to `job_count_` before applying their thread status
updates. Then the test thread would see non-deterministic results when
counting threads with each status. The fix is to acquire mutex in test
thread so it sees `running_count_` and thread status updated atomically.
I think simply reordering the two updates would have been insufficient
since the thread status update uses `memory_order_relaxed`. This change
happens to also eliminate an undesirable sleep loop.

Pull Request resolved: facebook#7825

Test Plan:
injected sleeps to verify the failure repros before this PR and does not
repro after.

Reviewed By: jay-zhuang

Differential Revision: D25742409

Pulled By: ajkr

fbshipit-source-id: 926a2223fe856e20bc4c0c27df6736ee5cb02c97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants