Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send() can hang due to http2 client.request() apparently having no timeout #24

Closed
TysonAndre opened this issue Sep 30, 2020 · 10 comments
Closed

Comments

@TysonAndre
Copy link

TysonAndre commented Sep 30, 2020

It seems like many connection timeouts and socket timeout customizations were taken out, and if there are networking issues, I'm concerned that requests to send() could hang forever (until the server ran out of memory?).

  • Memory rose, probably because this was an express app waiting for send() to finish.
  • I was seeing similar issues in node 12, but I think the spikes in send() request duration did not last arbitrarily long. There was refactoring of the http2 server to change the timeout from 120 seconds to no timeout, but I can't imagine why the client would be affected

https://nodejs.org/api/http2.html#http2_http2_connect_authority_options_listener mentions options are passed to net.connect.
https://nodejs.org/api/net.html#net_socket_settimeout_timeout_callback (That doesn't look like the correct thing - I probably want the timeout to apply to requests instead)

Similar to node-apn#665
Mentioned in #23


https://nodejs.org/api/http2.html#http2_http2stream_settimeout_msecs_callback may be what I want added to ensure that the - there's examples in the documentation. Instead of NGHTTP2_CANCEL, I may want to close the client and destroy it asynchronously if requests hang for too long.

I'm assuming that each request calling req.setTimeout is its own timeout - otherwise it could get extended indefinitely

const http2 = require('http2');
const client = http2.connect('http://example.org:8000');
const { NGHTTP2_CANCEL } = http2.constants;
const req = client.request({ ':path': '/' });

// Cancel the stream if there's no activity after 5 seconds
// Instead of NGHTTP2_CANCEL, I want to close the client and destroy it asynchronously if requests hang for too long.
// Not sure what new errors that would cause this to need to handle.
// This should also use the logger to indicate that node is cancelling the stream because it took too long.
// 5 seconds may be too low for a default for a wide variety of use cases - consider using a higher timeout and making this configurable
req.setTimeout(5000, () => req.close(NGHTTP2_CANCEL));
@petitchevalroux
Copy link

I can confirme our worker is in the air doing nothing.

All the "write" part is a bit messy, maybe check to https://github.com/AndrewBarba/apns2/blob/master/lib/http2-client.js as a replacement of client.js. To me it appears cleaner ;)

(Despite session.setTimeout steel missing)

@TysonAndre
Copy link
Author

TysonAndre commented Oct 1, 2020

@petitchevalroux Refer to https://github.com/AndrewBarba/apns2/issues/33 - there's quite a few unaddressed review comments, and that file does call setTimeout but seems like it introduce different potential failure modes (e.g. it can be establishing multiple connections to APNs concurrently in connect(), but only use one).

You may be interested in #27 - A few unit tests of failure modes were added but I haven't tested this against the real apns servers for a prolonged period of time yet.

@petitchevalroux
Copy link

No I haven't seen this problem but it's a great find, @trevh3 comments sound interesting.
I dove into http2 for the first time yesterday so I don't understand all. We will check #27 and potentially test its behavior next week.

At the moment we are not sure that timeout is the real problem. But from my first client.js read and this issue it's my first guess We enabled debug mode in production to see if we can have more information on the "worker in the air" effect.

Thanks for your feedback ;)

@trx222
Copy link

trx222 commented Oct 5, 2020

I also do not understand why the problem is back now - and we have it really often. We actually have implemented an own watchdog .... waiting 10 sec and then creates a new APN Provider and continues. But we are sending packages with 100 receivers and I do not know which of them have received the notification and which not.
We will try now the new version but I think only the timeout will not solve the problem. We are already checking for others ways to send - with curl or whatever.
The version worked over month - but now it happens nearly on every message.

btw.
I posted in parse forum ( https://community.parseplatform.org/t/node-apn-problem-send-hangs/871/5 ) but cool that now issues is enabled here.

@TysonAndre
Copy link
Author

We actually have implemented an own watchdog .... waiting 10 sec and then creates a new APN Provider and continues. But we are sending packages with 100 receivers and I do not know which of them have received the notification and which not.

@trx222 frequently opening and closing connections to APNs may make the problem worse - APNs may treat frequently opening and closing connections as a malfunctioning client or a denial of service and block connections from your IP address. (I forget where I read that.)

@trevh3
Copy link

trevh3 commented Oct 6, 2020

You saw it in Apple's APNS documentation

Keep your connections with APNs open across multiple notifications; do not repeatedly open and close connections. APNs treats rapid connection and disconnection as a denial-of-service attack. You should leave a connection open unless you know it will be idle for an extended period of time—for example, if you only send notifications to your users once a day, it is acceptable practice to use a new connection each day.

@trx222
Copy link

trx222 commented Oct 6, 2020

Yes sure - I know this part and we only do a limited number of retrys ( 3 to 5 times ) and then we give up ... But what else should I do - I cannot completly stop sending in that case. We anyways have 30 parallel processes for sending to speed up everything - otherwise it takes to long for transmitting everything.

But The problem also happens if we have just 300 receivers sending in one process ... it makes no difference.

@davimacedo
Copy link
Member

Guys, could try with this branch? #29

It looks working better for me in my tests, but it would be good to have more feedback before merging.

@trx222
Copy link

trx222 commented Oct 6, 2020

I will setup a test tomorrow.

@rileyweber
Copy link

I think this can be closed now with the release of 4.0.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants