New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in idle socket handling #24980

Open
timcosta opened this Issue Dec 12, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@timcosta
Copy link
Contributor

timcosta commented Dec 12, 2018

  • Version: v8.14.0
  • Platform: macOS 10.13.3, node:carbon LTS docker image
  • Subsystem: http

I'd like to report a possible regression introduced to the http module between versions 8.9.4 and 8.14.0.

Sockets that are opened but data is not transferred are closed immediately after data transmission once they have idled for >40 seconds.

Reproduction available here: https://github.com/timcosta/node_tcp_regression_test

We ran a tcpdump on our servers that were 504ing, and saw that node is responding with an ACK followed almost immediately by a duplicate ACK with an additional RST on the same socket.

Timeouts are set to 60 seconds on the client (AWS ELB) and 2 minutes on the node server (hapi.js).

I'm filing this as a node core issue as the error can be reproduced by using both hapi and the bare node http module as can be seen in this travis build: https://travis-ci.com/timcosta/node_tcp_regression_test/builds/94440224

The error seems to be not consistent per travis on versions 8.14.0, 10.14.2, and 11.4.0 but the build consistently passes on v8.9.4 which leads me to believe there is a possible regression.

cc: @jtymann @dstreby

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Dec 12, 2018

@lpinca

This comment has been minimized.

Copy link
Member

lpinca commented Dec 12, 2018

Could this be caused by eb43bc0?

@lpinca lpinca added the http label Dec 12, 2018

@timcosta

This comment has been minimized.

Copy link
Contributor Author

timcosta commented Dec 12, 2018

Seems likely @lpinca. The timings and behavior match up. That seems to have broken AWS ELBs with default settings that front node.js back ends though, as this timeout is lower than the ELB default of 60 seconds.

@lpinca

This comment has been minimized.

Copy link
Member

lpinca commented Dec 12, 2018

I see, that change was part of a security release so it didn't go through the normal release cycle. I'm not sure why 40 sec was chosen as default value but it can be customised.

@timcosta

This comment has been minimized.

Copy link
Contributor Author

timcosta commented Dec 12, 2018

Hm okay, I'd propose the default value be changed to something > 60 seconds, as that's the default timeout for ELBs and this issue likely broke node in a default configuration behind ELBs for more than just us.

@lpinca

This comment has been minimized.

Copy link
Member

lpinca commented Dec 15, 2018

cc: @mcollina

@mcollina

This comment has been minimized.

Copy link
Member

mcollina commented Dec 15, 2018

Currently we are starting to wait for the headers when we receive a connection, however we could do this on first byte solving the issue at hand. I'll see if I can code something up that will address this.

Note that this is configurable https://nodejs.org/api/http.html#http_server_headerstimeout, so you can increase that to 60s, solving your immediate issue.

We picked 40 seconds, because it is the default Apache is using.

cc @nodejs/lts @MylesBorins

@MylesBorins MylesBorins self-assigned this Dec 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment