New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing to S3 sometimes results in corrupted files #10710
Comments
this is very suspicious
Sounds like #9715 is related then
good to hear that. that also means you were testing with two builds, ie local modifications |
If revert helps I vote for reverting as initial fix and then rework original improvement |
No longer considered a release blocker, as #10716 merged. |
@sshkvar can you please confirm the current master no longer exhibits the problem? |
We had the same problem with corrupt parquet and ORC files. In our case the CTAS in iceberg failed with |
Thanks @sshkvar. Very much appreciate your help here. |
Add a test testInsertIntoPartitionedTableLargeFiles to exercise multiple code paths of S3 streaming upload, with upload part size 5MB: 1. file size <= 5MB (shortcut to direct upload) 2. file size > 5MB but <= 10MB (which triggered trinodb#10710) 3. file size > 10MB
Add a test testInsertIntoPartitionedTableLargeFiles to exercise multiple code paths of S3 streaming upload, with upload part size 5MB: 1. file size <= 5MB (shortcut to direct upload) 2. file size > 5MB but <= 10MB (which triggered #10710) 3. file size > 10MB
Hi we have migrated to Trino 367 and faced issue with create table as select in iceberg (Potentially all connectors which uses TrinoS3FileSystem affected)
We are doing CTAS in iceberg from other table (~482013195 records).
After when we try to query created table
we got exception
After some investigation we found that because of some reasons we have multiple versions of parquet files, this is strange because each parquet file has uuid in name and we just created this table.
you can find screenshot in this thread in slack
Additional retries of table creation shoved us that first version of parquet file always has the same size: 16 MB which is equal to s3MultipartMinFileSize.
After it we checked all changes in
io.trino.plugin.hive.s3
package and found this one 7383734.We have reverted this change and it helped. We didn't have corrupted parquets in iceberg tables anymore.
Important note: issue is accidental, sometimes table created without issue.
The text was updated successfully, but these errors were encountered: