Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeouts for transport.dialtls and http.client #22

Closed
wants to merge 7 commits into from
Closed

add timeouts for transport.dialtls and http.client #22

wants to merge 7 commits into from

Conversation

c3mb0
Copy link
Contributor

@c3mb0 c3mb0 commented May 2, 2016

Workaround for #17 and #20.

Discussed in detail in #20.

@coveralls
Copy link

coveralls commented May 2, 2016

Coverage Status

Coverage decreased (-3.6%) to 96.386% when pulling e02b1a0 on c3mb0:dialTimeoutPatch into 729daff on sideshow:master.

@imhoffd
Copy link
Contributor

imhoffd commented May 2, 2016

It would probably be a lot better to just allow users to directly modify the HTTPClient's transport. Faking "optional" parameters like this is not a clean solution.

@imhoffd
Copy link
Contributor

imhoffd commented May 2, 2016

For example, I believe right now we could do this:

c := apns2.Client(cert)
c.HTTPClient.Transport.DialTLS = func....

So I think a PR isn't even needed.

@c3mb0
Copy link
Contributor Author

c3mb0 commented May 2, 2016

@dwieeb I also thought about setting the dialer via c.HTTPClient.Transport, but it returns the RoundTripper instead of Transport.

@imhoffd
Copy link
Contributor

imhoffd commented May 2, 2016

RoundTripper is just an interface (see below where the DefaultTransport is constructed). Maybe I'm not understanding the problem. I haven't tried this, yet.

@c3mb0 c3mb0 changed the title add optional timeout parameter for newclient add tlshandshake timeout for client transport May 3, 2016
@imhoffd
Copy link
Contributor

imhoffd commented May 3, 2016

I'm not certain it's a good idea to set this default in the actual library without any kind of retry logic. I feel like this should just be documented describing the problem with example code on how to set the tls timeout.

@c3mb0
Copy link
Contributor Author

c3mb0 commented May 3, 2016

I believe timing out should be handled on package level. Other libraries in other languages (such as lowdown) handle timeouts internally. This way, even if no further steps are taken by the package user, at least system resources are freed and running goroutines are finalized.

Since redialing does not guarantee a connection, retrying to send the push is an implementation decision and should be left to the user.

Edit: After some thought, I partially take that back. It might make sense to cover some basic retry scenarios, such as retrying x times or retrying for x minutes. However, I think these features should be available under a different function such as PushTimes and PushFor, and not be so transparent about the intent.

@sideshow Any thoughts?

@zjx20
Copy link

zjx20 commented May 4, 2016

@c3mb0 First of all, thank you for the analysis and the PR! I applied your latest changes to my program, but the issue #20 was still there. However, if only the first two commits of this PR were applied, the problem goes away.

@zjx20
Copy link

zjx20 commented May 4, 2016

However, if only the first two commits of this PR were applied, the problem goes away.

@c3mb0 Oh no, that's not true... I hit the issue again.

@c3mb0
Copy link
Contributor Author

c3mb0 commented May 4, 2016

@zjx20 Strange, it is working consistently for us. Which version of Go are you using? configureTransport requires Go 1.6 or later.

@zjx20
Copy link

zjx20 commented May 4, 2016

@c3mb0 As I observed, it is not a handshake problem in my case. If a connection (from apns2 to apple server) become idle (e.g. keep alive for 10 minutes without sending any request), and then we use it to send requests again, the apple server just refuse to read incoming data. I could see the "Send-Q" column from netstat keep growing for each request.

A workaround for my case is to specify a Timeout to http.Client.

It is hard to tell, maybe your changes are necessary as well.

@coveralls
Copy link

coveralls commented May 6, 2016

Coverage Status

Coverage decreased (-2.4%) to 97.619% when pulling 90994d9 on c3mb0:dialTimeoutPatch into 729daff on sideshow:master.

@c3mb0
Copy link
Contributor Author

c3mb0 commented May 6, 2016

@zjx20 As you mentioned, lack of http.Client can also cause hangs.

I know there has been a lot of back and forth, but I'm reverting TLSHandshakeTimeout back to DialTLS for backwards compability and adding a timeout for http.Client for those rare cases where it can cause an infinite halt.

This is the setup we are running in production and it seems to handle all faulty requests so far.

@c3mb0 c3mb0 changed the title add tlshandshake timeout for client transport add timeouts for transport.dialtls and http.client May 6, 2016
@Nenblereeyw
Copy link

Does this patch need to be made compatible with go 1.4.3?

@sideshow
Copy link
Owner

sideshow commented May 11, 2016

Thanks for this, I think we are close.

The x/net/http2 package has deleted pre-Go1.5 request cancellation (hence the build fail), as it was buggy, so I am thinking of removing support for 1.4.3 and making 1.5 the minimum now that 1.6 is stable.

Also the Timeout property on the http.Client is go 1.6+ so we probably need to do something conditional for 1.5

@c3mb0
Copy link
Contributor Author

c3mb0 commented May 11, 2016

After a few days of usage, it seems like an http.Client timeout of 1s is a bit too aggressive. I've increased it to 5s and added a retry function.

@coveralls
Copy link

coveralls commented May 11, 2016

Coverage Status

Coverage decreased (-2.2%) to 97.802% when pulling 8e63fea on c3mb0:dialTimeoutPatch into 729daff on sideshow:master.

@coveralls
Copy link

coveralls commented May 11, 2016

Coverage Status

Coverage decreased (-2.2%) to 97.802% when pulling c99fad4 on c3mb0:dialTimeoutPatch into 729daff on sideshow:master.

@Nenblereeyw
Copy link

Nenblereeyw commented May 13, 2016

I think there is limited benefit to having a timeout that is even this aggressive.

I would recommend 20 seconds - the default linux TCP connection timeout.

Setting the timeout to be 3 seconds means that someday when apple's apns service is at capacity where the 3 second timeout is triggered then any apns client with a higher timeout will be able to deliver push notifications, and any apns client with a lower timeout will be not be able to deliver any push notifications at all.

@imhoffd
Copy link
Contributor

imhoffd commented May 16, 2016

Can we build the retry logic into the actual client? That way we wouldn't need an additional method in the API.

It would just be:

client := apns2.NewClient()
client.RetryAttempts = 10
client.RetryDuration = 20 * time.Second
client.Push(notification)

The retry logic would be built in to the Push method, checking for zero on RetryAttempts.

@imhoffd
Copy link
Contributor

imhoffd commented May 16, 2016

@sideshow What would need to be done, exactly, for 1.5? Wouldn't it not compile?

@sideshow sideshow mentioned this pull request May 31, 2016
@sideshow
Copy link
Owner

Hey @c3mb0 thanks - I cherry picked your code and pushed back another pull request here #25. This is essentially your code with the timeouts set as vars.

I agree that its bad having to set defaults for the HTTPClient, but I think its important out of the box for this to only return one of two things - A success or a failure. Without these defaults sadly theres a third shitty case as we know where it just doesn't return which is not good. I think this patch is an OK middle ground because at least the defaults can be configured.

I have left them pretty large because being HTTP there is always going to be issues with latency, proxys, etc. As @chimpmaster72 said, the default linux TCP connection is 20seconds.

As far as the retry logic goes, my thoughts are that this should be a higher level package which handles retries, queing, channels etc.

@sideshow
Copy link
Owner

Hey @dwieeb in answer to your question;

In 1.5 I was getting net/http: Client Transport of type *http2.Transport doesn't support CancelRequest; Timeout not supported. Im not getting this error anymore though.

Also, For go 1.4 looks like some of the internals in x/net/http2 have been changed and they removed the request cancellation in golang/net@b797637
So, I removed support for go 1.4 because its 3 versions ago.

@sideshow
Copy link
Owner

Closing this in favour of the recently merged timeout code.
We can come back to the retry logic as part of a higher level queuing package.

@sideshow sideshow closed this Jun 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants