New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix top block locking race condition #485
Conversation
d_state is intended to track running state of top block. * initial state is IDLE * set state to RUNNING when start() is called * set state to IDLE when stop() is called * set state to IDLE when all work is done (reqires call of wait()) * after unlock() start execution (continue execution) only if state is RUNNING (do nothingh otherwise)
I'd like you to write out the issues that this proposed patch fixes, how they happen, and how these changes address them. |
First patch have extensive comment which describes how d_state is expected to behave (I would appreciate any comment if my expectations are wrong) and fixes its behaviour accordingly. This patch can be cherry-picked alone, there is no QA attached for this patch. What is not mentioned in comment is it fixes case where d_state is changed when d_mutex is unlocked. This is wrong because d_state can be modified by multiple threads concurently and results in udefined behaviour. It should be noted the probability of race condition is pretty low almost negligible, but in my opinion it is still wrong. Second patch is QA. It fails in more or less undefined manner because tests case where race condition occurs and its propagation depends on some hidden states, sometimes does not fail just because of some lucky coincidence (is the QA working for you?). Third patch fixes race condition tested by QA in previous patch. The care should be taken here because i did not draw flow graph for all possible states. This patch fixes single case of race condition which can be described in this way.
The fix add new state which is modified whe d_mutex is locked and indicate whatever job state should be checked again. I hope the description above can be understand. I admit there should be test for each possible workflow or at least each possible workflow should be checked for race condition. I did neither of those, but if You persist I can try write mathematical proof whatever some race conditions can or cannot happen. |
Anny possible progress here? Need more info or something? |
We're going to be doing a systematic review of this area of code before making further changes to it. |
We're going to be making stable release 3.7.9 soon. After that, I'd like to walk through with you where you see issues here, what fixes you are proposing, and how to implement tests that pass/fail on this fixes. In the mean time, I am closing this PR. |
(ranging from invalid value to segfault), what is more unfortunate it sometimes pass (I'm not sure why)