Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

400 RequestTimeout - s3cmd is not re-trying #737

Closed
jblachly opened this issue May 8, 2016 · 6 comments
Closed

400 RequestTimeout - s3cmd is not re-trying #737

jblachly opened this issue May 8, 2016 · 6 comments

Comments

@jblachly
Copy link

jblachly commented May 8, 2016

s3cmd version 1.6.1
SunOS cloudbackup 5.11 joyent_20160428T170316Z i86pc i386 i86pc Solaris

When doing an initial sync of about 100k files to S3:

s3cmd sync /path/to/photos s3://photo-archive-bucket/

.. plenty of successes for > 12 hours (averaging 600 kB/sec) ...
then:

WARNING: photos/Photos/20 - Mexico - Group1.jpg: Owner groupname not known. Storing GID=100 instead.
upload: 'photos/Photos/20 - Mexico - Group1.jpg' -> 's3://photo-archive-bucket/photos/Photos/20 - Mexico - Group1.jpg'  [15078 of 53083]
313221 of 313221   100% in   33s     9.00 kB/s  done
ERROR: S3 error: 400 (RequestTimeout): Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

I found the following (very old) issue in AWS' official repository:
aws/aws-sdk-js#281
It seems like in most cases the S3 error 400 RequestTimeout Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. is related to incorrect content-length header. However, I had not run with --debug and so cannot confirm.

Importantly, and the reason I am filing this issue, is that I also found the following on the s3tools mailing list:
https://sourceforge.net/p/s3tools/mailman/message/25713745/
This indicates that the code had been fixed to retry on 400 status codes.

Indeed, looking at the s3cmd source code:
https://github.com/s3tools/s3cmd/blob/master/S3/S3.py#L1289
It should be retrying (up to 5 times total as the S3 class _max_retries is initialized at 5), but it is not. Why not?

The 9.00 kB/s (my other files all transmitted at > 600 kB/sec) suggest perhaps I had an intermittent internet connectivity issue. EVEN SO, s3cmd should have retried several additional times.

Looking at the code path this could happen if

  1. retries is initialized as 0 -- I don't see this happening, but I could have missed it
  2. if err.code in [ 'BadDigest', 'OperationAborted', 'TokenRefreshRequired', 'RequestTimeout' ]: evaluates to false, which would then skip the if try_retry: block and fall through to raise S3Error(response).

Condition 2 could happen if the response did not have a key data or data were empty:
https://github.com/s3tools/s3cmd/blob/master/S3/Exceptions.py#L52

Any other suggestions for debugging?

@jblachly
Copy link
Author

jblachly commented May 8, 2016

Oh, one additional clue.
After re-running the same sync command, s3cmd scans all files, local and remote, then resumes. Interestingly, despite that the last line of the previous run indicated "100%" had been uploaded, the first uploaded file of the new run is the same file.

WARNING: photos/Photos/20 - Mexico - Group1.jpg: Owner groupname not known. Storing GID=100 instead.
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
upload: 'photos/Photos/20 - Mexico - Group1.jpg' -> 's3://photo-archive-bucket/photos/Photos/20 - Mexico - Group1.jpg'  [1 of 37540]
313221 of 313221   100% in    0s   605.93 kB/s  done

While re-running sync, it only ran for about 30 minutes this time; when it aborted again I went to my S3 bucket and found that the last file (some other ~250k photograph entirely) was not stored in the S3 bucket.

Next time I'll run with --debug

@jblachly
Copy link
Author

jblachly commented May 9, 2016

Ok, I captured with --debug a failure; is there any security risk in pasting it (since it contains all x-amz headers etc?) if I scrub out my bucket name?

@fviard
Copy link
Contributor

fviard commented May 9, 2016

Thank you for your investigation of the issue.
In the beginning. The only thing to take care is your access, secret, token key that are printed with the configuration at the beginning of the log.

@fviard
Copy link
Contributor

fviard commented Sep 2, 2016

@jblachly Were you able to fix your issue? Otherwise don't hesitate to send the log here so we can have a look at it.

@jblachly
Copy link
Author

jblachly commented Sep 4, 2016

@fviard This only seems to happen during long uploads ; I wasn't (back in May) able to re-capture the failure in a debug log and my scheduled s3cmd syncs don't write debug logs.

I will try to do another large-scale sync by hand and capture it in a log. Thanks for patience.

@jblachly
Copy link
Author

stale; will re-test in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants