Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung Connections leading to RequestTimeTooSkewed #24

Open
loe opened this issue Jun 23, 2010 · 21 comments

Comments

Projects
None yet
9 participants
@loe
Copy link

commented Jun 23, 2010

I am trying to use right_aws + right_http_connection to use one persistent connection per-process to reduce the overhead of dealing with S3.

I've got a module in lib/onehub.rb that keeps the connection and bucket objects.
https://gist.github.com/7ef90619fbd331479c6a

Then from my models I'll call something like Onehub.bucket.put to upload the file in background task, with the idea that this should be a persistent connection since these background workers are simply uploaders.

What I get quite frequently is 'hung' sockets. The socket doesn't get written to for > 15 minutes, but eventually this recovers (maybe related to a thread not getting scheduled?). The problem is that the request is signed and we've now left the 15 minute grace period that S3 will tolerate so I get the exception: RightAws::AwsError: RequestTimeTooSkewed: The difference between the request time and the current time is too large.

Here is a backtrace:
https://gist.github.com/26ddd66d2cc5de223c9c

Is there a better way to handle a per-process persistent connection? Is this some subtle threading issue where the thread that writes isn't being scheduled by the interpreter? I am not using this in a multi-threaded environment. Is this because S3 hangs up after 60 seconds but the library expects the connection to still be open?

We diagnosed the issue by instrumenting the PUT operations and dumping to a log file, but could never create a case that reliably reproduced it.

@yodal

This comment has been minimized.

Copy link

commented Jun 24, 2010

Have been suffering from the same issue here.

@loe

This comment has been minimized.

Copy link
Author

commented Jun 24, 2010

The hung sockets? I keep thinking this is related to Green Threading but I have no way to reproduce it!

@yodal

This comment has been minimized.

Copy link

commented Aug 3, 2010

Nor me, yet. We are getting a 'RequestTimeTooSkewed' error around once a day. Have you found a workaround?

@cdunn

This comment has been minimized.

Copy link

commented Aug 16, 2010

I'm in the same boat. Anyone have any solutions?

@basex

This comment has been minimized.

Copy link

commented Jan 18, 2011

Anyone found a solution to this? I have the system time of my server updated and I still get this error many times a day.

@loe

This comment has been minimized.

Copy link
Author

commented Jan 18, 2011

Switched to using aws-s3, no problems. There is something in the threading code that causes hangs and then when the request is retried the headers are not regenerated.

@basex

This comment has been minimized.

Copy link

commented Jan 18, 2011

I'm using a folder structure + EU buckets and both are not so well supported by aws-s3 =/

@konstantin-dzreev

This comment has been minimized.

Copy link
Contributor

commented Jan 20, 2011

Hi,

Is RequestTimeTooSkewed you are discussing an HTTP 403 RequestTimeTooSkewed error?

If yes, can you just rescue it an do retry the upload attempt?

If no, can you plz describe the error (http error code and its message)?
(s3interface.last_response.code and s3interface.last_response.body)

Thanks

@loe

This comment has been minimized.

Copy link
Author

commented Jan 20, 2011

RequestTimeTooSkewed is the result of a 403 yes, but it happens because RightAws gets in a state that prevents any data from being written to the socket.

At some point it just hangs and then when it eventually starts (it takes like 15 minutes!) it retries with the same headers as the original request, which is outside Amazon's acceptable time window, throwing the 403. Rescue -> Retry works, but why does it hang for 15 minutes in the first place!

@vivienschilis

This comment has been minimized.

Copy link

commented Jan 31, 2011

I have the same problem and my platform uploads several GB per day and I have the problem a dozen times per day. (using the threaded option)

@konstantin-dzreev

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2011

Hi,

Right_aws does not support multi-threading (and we don't have that option any more). If you need multiple threads then you must have RightAws::S3 or RightAws::S3Interface instance per thread. Once created the RightAws::S3 instance must be used in the thread if was created.

Plz make sure you do this and you do not access one RightAws::S3 instance from different threads

@vivienschilis

This comment has been minimized.

Copy link

commented Jan 31, 2011

I actually don't use thread and keep getting this error.
Those 15minutes seems to correspond to the 900000 milliseconds return by AWS.

<MaxAllowedSkewMilliseconds>900000</MaxAllowedSkewMilliseconds>

How can I log what is going on during those 15 minutes?

@conradchu

This comment has been minimized.

Copy link

commented Feb 18, 2011

Yup, finally got the same error today. Very annoying.

@konstantin-dzreev

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2011

Hi All

As I see some Pyton guys run into the error also using Pyton's boto library: http://www.saltycrane.com/blog/2010/04/using-python-timeout-decorator-uploading-s3

I'm not sure what we can fix there because we use Ruby 'net/https' library. Are you 101% sure there is not a time sync issue between your boxes and Amazon?

Any case any help with debugging that is very appreciated!

Konstantin

@vivienschilis

This comment has been minimized.

Copy link

commented Feb 22, 2011

I think it's an S3 Pb when you don't don't specify the endpoint.
The pb is that RightAWS wait 15 min before noticing the request fails (maybe due to right_http_connection which retries with the same headers? without closing and reopening the socket)

I have switched to Fog and I don't have any pb.

@conradchu

This comment has been minimized.

Copy link

commented Feb 22, 2011

Actually, I got it working again. I realized the system time of my xen instances was drifting from actual time and didn't have ntpd running.

@bradly

This comment has been minimized.

Copy link

commented Apr 7, 2011

We are getting this error in our app form some calls that are made from our Delayed Jobs queue.

@konstantin-dzreev

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2011

Plz make sure that the box that performs requests does not have system time issues (ntpd etc)

@bradly

This comment has been minimized.

Copy link

commented Apr 11, 2011

We do not have any issues with time or ntpd. I think it may be due to delayed job running as a daemon, but I'm not sure.

@conradchu

This comment has been minimized.

Copy link

commented Apr 11, 2011

@bradly, you want to check how the time is being evaluated by delayed_job. Since delayed_job uses YAML to serialize the AR object, there is an outstanding YAML bug that we've found that might affect your time.

https://rails.lighthouseapp.com/projects/8994/tickets/340-yaml-activerecord-serialize-and-date-formats-problem

@ericmason

This comment has been minimized.

Copy link

commented Mar 25, 2013

I'm having the same issue. Anyone find a work-around?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.