-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for race condition in AbstractAsyncWriter #565
Conversation
Can you run the test suite a sufficient number of times to convince yourself that the race is gone, and post the results here? |
@droazen @d-cameron @tfenne can you have a look at this and confirm (or debunk) that a) there was a race condition, b) this fixes it and c) suggest how to test it? |
@droazen the htsjdk test suite is insufficient here and it never did fail with this error |
@akiezun I meant run the GATK4 test suite repeatedly (with this patch) |
It would also be good if we could craft a direct htsjdk test that fails (even intermittently) with the same error. |
can you make me a snaphot build of htsjdk? |
@akiezun You can temporarily modify GATK4's |
@akiezun I followed the discussion in the issue, and I think this will do the trick. The other thing you could do to further protect things would be to remove the assertion in the |
@tfenne thanks. I added queue draining as a second commit. can you look at it? |
Looks good to me @akiezun . 👍 |
thanks @tfenne. Unless someone objects, I will squash and merge by EOB tomorrow (~24 hours). I have not been able to write a test that reproduces this (either in htjdk or gatk). But I have successfully run GATK4 test suite with this update. |
There's still a data loss race condition but it's between write() and close(). A thread already in write() can add an element to the queue once close() has set isClosed and this additional element may or may not get written by the safety flush that was added. This could be fixed by putting write() and close() in synchronized blocks, or just left as-is and documentation added to say that the class does not expose a thread-safe API and if you call write() and close() at the same time, the write may or may not get written. Given the synchronous implementations of the concrete writers are not thread-safe and are likely to die much more horribly if you do concurrent write() and close() there's very little gain from removing the write() close() race condition as a multi-threaded user would be wrapping their own synchronisation around it anyway. |
You're right @d-cameron, and I would go with the second suggestion. The goal of this class and it's subclasses has never been to create writers that can be invoked from multiple threads - just like the non async |
Agreed, I'll add documentation that this class (write and close methods) is On Tuesday, April 12, 2016, Tim Fennell notifications@github.com wrote:
Sent from Gmail Mobile |
be684be
to
7a14b01
Compare
Description
Fixes #564 by reverting the order of conditions in the while loop and making checking them effectively atomic (ie nothing can change the values of once the
while
condition starts being evaluated