New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed about ParallelMixMultipartUpload #1313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add any tests which failed before but succeed with this PR?
This bug led to two consequences:
Regarding first case, I think that I can identify the test case that caused the error and incorporate it into the test. So I try to prepare it. |
I added test code for this case. If this added test code is tested in my local environment connected to AWS S3, old code before this PR failed. I'm not sure if the EntityTooSmall error is an error that also occurs on S3 other than AWS. |
I understand the S3Proxy |
@gaul Thanks, and wait for new s3proxy.:-) |
Relevant Issue (if applicable)
n/a(or some issues)
Details
The following bugs were found in the processing of mixed multipart upload.
Details of the bug
For example, there is a 500MB file and it doesn't exist in the s3fs cache file yet.
In this case, if user perform the following operations on the target file, the result will be different for each.
This is working correctly.
It fails with a 400 HTTP error(EIO).
It fails with a 400 HTTP error(EIO).
It also unnecessarily downloads before uploading, which reduces performance.
These fatal errors are
EntityTooSmall
.Cause
In the processing of mixed multipart upload, there was a mistake in the calculation of the range of Copy and Upload.
It may be possible to complete normally, but it will fail depending on the write position without the cache file.
Fixes
Made following fixes.
Download and upload range processing
In case of mixed multi-part upload, the insufficient area is downloaded in advance by s3fs because of the minimum upload range.
For this reason, it is necessary to calculate the areas of Copy and Uoload.
But there were some bugs in this logic.
For fixing, the following functions that performed these processes were changed/deleted.
It was rewritten significantly.
And the function name(PageList::GetPageListsForMultipartUpload) and arguments have also been changed.
It has been absorbed in PageList::GetPageListsForMultipartUpload, is no longer needed, and has been removed.
It has been absorbed in PageList::GetPageListsForMultipartUpload, is no longer needed, and has been removed.
It has been modified to prepare local functions to operate on fdcache_list_t and only call it.
This has changed with the changes to GetPageListsForMultipartUpload.
Changes such as calling ParallelMixMultipartUploadRequest
Others
Impact
In mixed multipart upload, EIO(HTTP response code=400) problem may be the same cause as this PR.
Also, there are times when performance is poor, which may also be affecting it.(Performance gets worse because it downloads a range that does not need to be downloaded due to a bug.)