-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload fails after server to server redirect with cache #2104
Comments
I have the same issue here with 4.3.0, ubuntu 20.04, see https://groups.google.com/g/iROD-Chat/c/LhSPiQ4t0fs |
When I was testing your issue, I was not using two servers. That is likely why I could not reproduce it. I will try this with multiple servers. |
I have reproduced this issue. With wire logging enabled I noticed we were getting a huge Content-Length being sent to UploadPart. I believe this happens when the part size > 2^31 - 1 and it is due to an int64_t being converted to an int and then printed as an int64_t. This will require a change to libs3. Please try the following workarounds and tell me if these work for you (both set in the resource context string):
Note that I am getting a timeout error after 2 minutes when I try the second option. I am not sure why that is the case as each thread is still actively sending data. This might be a second issue. |
excellent. what is the default |
Thank you for the explanation and the formula. It will make it easier to know what to test for, when needed. Yes, setting the Was it also possible to reproduce the successful exit code for the failed 45GB file upload? |
I have not reproduced that. I will keep an eye open for it. |
The default is 10. |
Ah, very good - please get that into the README somewhere as part of one of these tweaks/commits. Thanks. |
@JustinKyleJames - Please close if complete. Thanks! |
Closing |
Bug Report
iRODS Version, OS and Version
OS: Ubuntu 18.04
iRODS: 4.2.11
What did you try to do?
Upload (iput) 20+ GB files to a coordinating replication parent resource with S3 resource children
Expected behavior
A successful operation or exit status that indicate failure in case of error
Observed behavior (including steps to reproduce, if applicable)
Step to reproduce:
The situation is most likely highly related to "Uploads fail after server to server redirect" #1980
In the issue, it is mentioned that:
But, the issue seems to be triggered in our environment for file size above 20 GB, even if a cache file is created during the upload
Most importantly, for file size above 45 GB, the error is silent, meaning the return status code is 0 - success.
We get this following line in iCAT logs, but nothing in the S3 servers logs:
If the iput is performed directly on the S3 server toward a S3 resource (without a coordinating replication parent), it works fine.
But with a coordinating replication parent, it fails:
Note: We are already planning to move in production to the
cacheless_detached
mode which seems to solve the issue in our environment.The text was updated successfully, but these errors were encountered: