New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KITE-1053: Fix int overflow bug in FS writer. #403
Conversation
@mkwhitacre, could you review this? It fixes an issue where too many records written from a single reducer will cause data files to be discarded instead of committed. |
Change looks fine. For my own knowledge can you provide a link to the code that checks if any records were written? |
Yeah, the check is here: https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/filesystem/FileSystemWriter.java#L196 When debug logging is on, we see that the else case is taken with the log message: |
Since the value of count doesn't really matter (mostly only used for logging) would tracking if data was written as a boolean be safer to avoid any overflow errors? This would be in addition to the count value for debug purposes. |
@mkwhitacre, yes and no. Yes it would technically be safer. But I don't think Java can handle files that are larger than Long.MAX_VALUE bytes, which means that if you are overflowing the record count now, you'd also be unable to write anything to the file. In addition, check out #386 that adds size and time-based file rolling to the internal writers. That requires having the count so I'd rather not remove it to add it back later. |
Ok thanks. #386 and the ability to specify number of writers per partition should hopefully help me eliminate some custom code to try and more evenly distribute data across shuffles and still end up with appropriately sized files. |
@mkwhitacre, great! Feel free to review that one, too. So this one is good to go? |
+1 |
Keeping the number of records written in an int caused a bug where writing more than Integer.MAX_VALUE records (~2B) would overflow the counter and the check to see whether any records had been written would fail because count is less than 0. The fix is to use a long.
6430420
to
5fe56f3
Compare
Thanks for reviewing this, @mkwhitacre! |
Keeping the number of records written in an int caused a bug where
writing more than Integer.MAX_VALUE records (~2B) would overflow the
counter and the check to see whether any records had been written would
fail because count is less than 0. The fix is to use a long.