Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for race condition in AbstractAsyncWriter #565

Merged
merged 1 commit into from
Apr 13, 2016

Conversation

akiezun
Copy link
Contributor

@akiezun akiezun commented Apr 11, 2016

Description

Fixes #564 by reverting the order of conditions in the while loop and making checking them effectively atomic (ie nothing can change the values of once the while condition starts being evaluated

@droazen
Copy link
Contributor

droazen commented Apr 11, 2016

Can you run the test suite a sufficient number of times to convince yourself that the race is gone, and post the results here?

@akiezun
Copy link
Contributor Author

akiezun commented Apr 11, 2016

@droazen @d-cameron @tfenne can you have a look at this and confirm (or debunk) that a) there was a race condition, b) this fixes it and c) suggest how to test it?

@droazen
Copy link
Contributor

droazen commented Apr 11, 2016

@akiezun I've added my comments to #564

@akiezun
Copy link
Contributor Author

akiezun commented Apr 11, 2016

@droazen the htsjdk test suite is insufficient here and it never did fail with this error

@droazen
Copy link
Contributor

droazen commented Apr 11, 2016

@akiezun I meant run the GATK4 test suite repeatedly (with this patch)

@droazen
Copy link
Contributor

droazen commented Apr 11, 2016

It would also be good if we could craft a direct htsjdk test that fails (even intermittently) with the same error.

@akiezun
Copy link
Contributor Author

akiezun commented Apr 11, 2016

can you make me a snaphot build of htsjdk?

@droazen
Copy link
Contributor

droazen commented Apr 11, 2016

@akiezun You can temporarily modify GATK4's build.gradle in your branch to point to a custom local htsjdk build.

@tfenne
Copy link
Member

tfenne commented Apr 11, 2016

@akiezun I followed the discussion in the issue, and I think this will do the trick. The other thing you could do to further protect things would be to remove the assertion in the close() method and replace it with code to drain any remaining items, once the writer thread is dead, from the queue and write them using writeSynchronously().

@akiezun
Copy link
Contributor Author

akiezun commented Apr 11, 2016

@tfenne thanks. I added queue draining as a second commit. can you look at it?

@tfenne
Copy link
Member

tfenne commented Apr 11, 2016

Looks good to me @akiezun . 👍

@akiezun
Copy link
Contributor Author

akiezun commented Apr 11, 2016

thanks @tfenne. Unless someone objects, I will squash and merge by EOB tomorrow (~24 hours). I have not been able to write a test that reproduces this (either in htjdk or gatk). But I have successfully run GATK4 test suite with this update.

@d-cameron
Copy link
Contributor

There's still a data loss race condition but it's between write() and close(). A thread already in write() can add an element to the queue once close() has set isClosed and this additional element may or may not get written by the safety flush that was added.

This could be fixed by putting write() and close() in synchronized blocks, or just left as-is and documentation added to say that the class does not expose a thread-safe API and if you call write() and close() at the same time, the write may or may not get written.

Given the synchronous implementations of the concrete writers are not thread-safe and are likely to die much more horribly if you do concurrent write() and close() there's very little gain from removing the write() close() race condition as a multi-threaded user would be wrapping their own synchronisation around it anyway.

@tfenne
Copy link
Member

tfenne commented Apr 12, 2016

You're right @d-cameron, and I would go with the second suggestion. The goal of this class and it's subclasses has never been to create writers that can be invoked from multiple threads - just like the non async FastqWriter and SAMFileWriter, the expectation is that the writer is owned and used by a single thread. No harm in documenting that more clearly.

@akiezun
Copy link
Contributor Author

akiezun commented Apr 12, 2016

Agreed, I'll add documentation that this class (write and close methods) is
to be called from a single thread.

On Tuesday, April 12, 2016, Tim Fennell notifications@github.com wrote:

You're right @d-cameron https://github.com/d-cameron, and I would go
with the second suggestion. The goal of this class and it's subclasses has
never been to create writers that can be invoked from multiple threads -
just like the non async FastqWriter and SAMFileWriter, the expectation is
that the writer is owned and used by a single thread. No harm in
documenting that more clearly.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#565 (comment)

Sent from Gmail Mobile

@akiezun akiezun force-pushed the ak_raceInAbstractAsyncWriter branch from be684be to 7a14b01 Compare April 13, 2016 01:59
@akiezun akiezun merged commit ca2a042 into master Apr 13, 2016
@akiezun akiezun deleted the ak_raceInAbstractAsyncWriter branch April 13, 2016 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants