Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KITE-1053: Fix int overflow bug in FS writer. #403

Merged
merged 1 commit into from Jul 28, 2015

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 27, 2015

Keeping the number of records written in an int caused a bug where
writing more than Integer.MAX_VALUE records (~2B) would overflow the
counter and the check to see whether any records had been written would
fail because count is less than 0. The fix is to use a long.

@rdblue
Copy link
Contributor Author

rdblue commented Jul 27, 2015

@mkwhitacre, could you review this? It fixes an issue where too many records written from a single reducer will cause data files to be discarded instead of committed.

@mkwhitacre
Copy link
Contributor

Change looks fine. For my own knowledge can you provide a link to the code that checks if any records were written?

@rdblue
Copy link
Contributor Author

rdblue commented Jul 27, 2015

Yeah, the check is here: https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/filesystem/FileSystemWriter.java#L196

When debug logging is on, we see that the else case is taken with the log message: DEBUG [main] org.kitesdk.data.spi.filesystem.FileSystemWriter: Discarded hdfs://.../.33c9684a-f4e9-432b-a003-2efaccfaa0d4.avro.tmp (-1969937604 entities)

@mkwhitacre
Copy link
Contributor

Since the value of count doesn't really matter (mostly only used for logging) would tracking if data was written as a boolean be safer to avoid any overflow errors? This would be in addition to the count value for debug purposes.

@rdblue
Copy link
Contributor Author

rdblue commented Jul 27, 2015

@mkwhitacre, yes and no. Yes it would technically be safer. But I don't think Java can handle files that are larger than Long.MAX_VALUE bytes, which means that if you are overflowing the record count now, you'd also be unable to write anything to the file.

In addition, check out #386 that adds size and time-based file rolling to the internal writers. That requires having the count so I'd rather not remove it to add it back later.

@mkwhitacre
Copy link
Contributor

Ok thanks. #386 and the ability to specify number of writers per partition should hopefully help me eliminate some custom code to try and more evenly distribute data across shuffles and still end up with appropriately sized files.

@rdblue
Copy link
Contributor Author

rdblue commented Jul 27, 2015

@mkwhitacre, great! Feel free to review that one, too.

So this one is good to go?

@mkwhitacre
Copy link
Contributor

+1

Keeping the number of records written in an int caused a bug where
writing more than Integer.MAX_VALUE records (~2B) would overflow the
counter and the check to see whether any records had been written would
fail because count is less than 0. The fix is to use a long.
@rdblue rdblue force-pushed the KITE-1053-fix-int-overflow branch from 6430420 to 5fe56f3 Compare July 28, 2015 20:07
@rdblue rdblue merged commit 5fe56f3 into kite-sdk:master Jul 28, 2015
@rdblue
Copy link
Contributor Author

rdblue commented Jul 28, 2015

Thanks for reviewing this, @mkwhitacre!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants