Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query sometimes fails with "Failed to delete temporary file" when writing to a sorted (sorted_by) bucketed Hive table on S3 #2296

Closed
grantatspothero opened this issue Dec 17, 2019 · 7 comments · Fixed by #2990
Assignees
Labels
bug Something isn't working

Comments

@grantatspothero
Copy link
Contributor

grantatspothero commented Dec 17, 2019

Right now the sorted_by table property in the hive connector does not work very well when the tables are stored in s3. SortingFileWriter.java writes lots of sorted temporary files to s3 (in order to not consume a bunch of memory for sorting) and then merges them together. This is a problem right now because:

  1. From a performance perspective, writing tmp files to s3 is slower than local disk.
  2. From a correctness perspective, SortingFileWriter.mergeFiles does not work on s3 because s3 lacks read after write consistency for file deletes. Specifically these lines:
    https://github.com/prestosql/presto/blob/master/presto-hive/src/main/java/io/prestosql/plugin/hive/SortingFileWriter.java#L226-L229

Will sometimes error out as s3 temporarily thinks the file still exists as it was just deleted.

See here for the stacktrace reproducing 2:

io.prestosql.spi.PrestoException: Failed to write temporary file: s3://spothero-science-dev/segment_local/processed/grant_test_bucketing_sorting_limit/.tmp-sort.000007_0_20191217_190758_08542_eedb5.77
	at io.prestosql.plugin.hive.SortingFileWriter.writeTempFile(SortingFileWriter.java:251)
	at io.prestosql.plugin.hive.SortingFileWriter.combineFiles(SortingFileWriter.java:200)
	at io.prestosql.plugin.hive.SortingFileWriter.writeSorted(SortingFileWriter.java:186)
	at io.prestosql.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:139)
	at io.prestosql.plugin.hive.HiveWriter.commit(HiveWriter.java:86)
	at io.prestosql.plugin.hive.HivePageSink.doFinish(HivePageSink.java:191)
	at io.prestosql.plugin.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:360)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
	at io.prestosql.plugin.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
	at io.prestosql.plugin.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
	at io.prestosql.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:80)
	at io.prestosql.plugin.hive.HivePageSink.finish(HivePageSink.java:182)
	at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:74)
	at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:193)
	at io.prestosql.operator.Driver.processInternal(Driver.java:397)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_314____20191210_204803_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.UncheckedIOException: java.io.IOException: Failed to delete temporary file: s3://spothero-science-dev/segment_local/processed/grant_test_bucketing_sorting_limit/.tmp-sort.000007_0_20191217_190758_08542_eedb5.76
	at io.prestosql.plugin.hive.SortingFileWriter.mergeFiles(SortingFileWriter.java:236)
	at io.prestosql.plugin.hive.SortingFileWriter.lambda$combineFiles$2(SortingFileWriter.java:200)
	at io.prestosql.plugin.hive.SortingFileWriter.writeTempFile(SortingFileWriter.java:245)
	... 26 more
Caused by: java.io.IOException: Failed to delete temporary file: s3://spothero-science-dev/segment_local/processed/grant_test_bucketing_sorting_limit/.tmp-sort.000007_0_20191217_190758_08542_eedb5.76
	at io.prestosql.plugin.hive.SortingFileWriter.mergeFiles(SortingFileWriter.java:231)
	... 28 more

A short term solution to just the s3 problem consistency problem is to not do the read after delete and instead to check the return boolean of fileSystem.delete.

A longer term solution to both problems might be to use local disk for spilling tmp files instead of s3. This would improve performance as well as avoid consistency problems with s3.

@findepi And I talked about this in a slack thread here:
https://prestosql.slack.com/archives/CGB0QHWSW/p1576484582050100?thread_ts=1576180076.157800&cid=CGB0QHWSW

@electrum
Copy link
Member

I think it's fine if we simply remove the check, per the Hadoop FS specification

@findepi findepi assigned grantatspothero and unassigned findepi Dec 18, 2019
@findepi findepi changed the title Better support for sorted_by table property in hive connector on s3 Query sometimes fails with "Failed to delete temporary file" when writing to a sorted (sorted_by) bucketed Hive table on S3 Dec 18, 2019
@findepi findepi added the bug Something isn't working label Dec 18, 2019
@findepi
Copy link
Member

findepi commented Dec 18, 2019

@grantatspothero can I assign this to you?
I've split this issue into two, as it seems (per @electrum 's comment) that a query failure fix is much simpler than the performance improvement we can get.
The bullet no 1 is covered by: #2301

@grantatspothero
Copy link
Contributor Author

@findepi 👍 I can remove the unnecessary file checks and test sorting on s3.

@ddrinka
Copy link
Member

ddrinka commented Mar 3, 2020

@grantatspothero I'm still getting exceptions as you described above on master. Were you able to submit a PR for the fix you described?

@grantatspothero
Copy link
Contributor Author

grantatspothero commented Mar 4, 2020

@ddrinka I got distracted and never got a chance, thanks for opening a PR. You might want to audit the whole SortingFileWriter I remember there was potentially at least one other place where a similar readafterwrite consistency problem could occur.

ddrinka added a commit to ddrinka/presto that referenced this issue Mar 5, 2020
@ddrinka
Copy link
Member

ddrinka commented Mar 16, 2020

@electrum Anything I can do on the PR to get this in the next release? I'm not able to run with bucketed sorting at all without this change.

@gauravphoenix
Copy link

gauravphoenix commented Apr 14, 2020

I got to know this issue while discussing #3410. I tried setting hive.writer-sort-buffer-size to a high value such as 512MB on a high memory machine. Although I no longer query memory size exceeded error, the insert takes forever to finish. The bottleneck being upload of temporary files.

To make things worse, when the insert query is aborted the .tmp-sort... files uploaded to the s3 target bucket are left behind. And there is no "rm -rf" feature on s3 :(

Just as a data point, the machine in question has 256 GB RAM but the presto JVM is only taking ~9 GB of memory which points to the severe bottleneck created by s3 temp file uploads.

electrum pushed a commit that referenced this issue Jul 1, 2020
@findepi findepi mentioned this issue Jul 3, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

5 participants