Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket leakage on tls handshake timeout in https protocol #3251

Closed
2 tasks done
gugu opened this issue Aug 11, 2021 · 12 comments · Fixed by #3254
Closed
2 tasks done

Socket leakage on tls handshake timeout in https protocol #3251

gugu opened this issue Aug 11, 2021 · 12 comments · Fixed by #3254

Comments

@gugu
Copy link
Contributor

gugu commented Aug 11, 2021

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure it has not already been reported

Fastify version

3.19.2

Plugin version

3.0.0

Node.js version

16.6.1

Operating system

Linux

Operating system version (i.e. 20.04, 11.3, 10)

alpine-3.13.5

Description

I've switched my code from koa to fastify on a single region. After switch I've started to get EMFILE error when uptime reaches 24 hours.
To debug the issue I've added running lsof every 2 minutes. This is the response:

  | 2021-08-11T13:43:42.999+03:00 | node 6 node 75u sock 0,8 0t0 24642 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 79u sock 0,8 0t0 24737 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 84u sock 0,8 0t0 30347 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 86u sock 0,8 0t0 25148 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 87u sock 0,8 0t0 24837 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 92u sock 0,8 0t0 33185 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 93u sock 0,8 0t0 28289 protocol: TCP
  | 2021-08-11T13:43:42.999+03:00 | node 6 node 98u sock 0,8 0t0 28314 protocol: TCP
... hundreds of them ...

When I've tried to track a history of a single socket from different lsof commands I got this:

timestamp,message
1628624501275,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (FIN_WAIT1)
1628624621276,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628624741286,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628624861284,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628624981274,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628625101280,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628625221279,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628625341277,node      7 node   99u     IPv4             999238      0t0     TCP 172.31.42.211:8443->xxx.xxx.xxx.xxx:49758 (CLOSING)
1628625461284,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
1628625581286,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
1628625701274,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
1628625821280,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
1628625941287,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
1628626061287,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
1628626181284,node      7 node   99u     sock                0,8      0t0  999238 protocol: TCP
... last line repeats till the death of the process

As you can see, this zombie socket (with id 999238) was initially a https socket, handled by fastify. It does not include all states (ESTABLISHED and so on) because snapshot was saved every 2 minutes

First time I've blamed proxywrap module (which worked for koa as well), but removing proxywrap did not change anything

Steps to Reproduce

I don't have yet steps to reproduce. All my attempts to reproduce did not succeed (in production it is still an issue). I think it is related with very slow EDGE mobile connection, but it is a guess

Here is my app config:

const app = fastifyFactory({
        serverFactory: (handler, opts) => {
            let server: http.Server;
            if (!httpsMode) {
                server = http.createServer((req: any, res: any) => {
                    handler(req, res);
                }) as http.Server;
            } else {
                server = https.createServer({
                    SNICallback: createSNICallback(),
                    key: selfSignedKey,
                    cert: selfSignedCert,
                    handshakeTimeout: 10000,
                }, handler);
                server.setTimeout(10000);
                server.on('error', serverErrorHandler);
                server.on('tlsClientError', serverErrorHandler);

                return server as https.Server;
            }
            server.keepAliveTimeout = opts.keepAliveTimeout as number;
            server.setTimeout(opts.connectionTimeout as number);
            return server;
        },
        keepAliveTimeout: 0,
        connectionTimeout: 10000,
        logger: false,
        trustProxy: [
            ...
        ]
    });

Expected Behavior

All sockets needs to be closed

@mcollina
Copy link
Member

You will need to set the keepAliveTimeout as something higher than 0 - that will definitely close the sockets.

@gugu
Copy link
Contributor Author

gugu commented Aug 11, 2021

Yes, I agree it is incorrect and I've changed it to default 5000, but it did not help - I add Connection: close to every response and there is general timeout of 10000

@mcollina
Copy link
Member

I don't think Fastify or Node.js are doing anything with the Connection: close header.

Could you upload a set of scripts to completely reproduce the problem.

@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

Found the source of the leakage. For some reason https server does not close client connections when timeout is reached, just emits clientError event.

Here is an example:

const fastify = require('fastify');
const good = fastify({ connectionTimeout: 10000 })
good.listen(12121)
const bad = fastify({ connectionTimeout: 10000, https: {handshakeTimeout: 10000} })
bad.listen(12122)

then:

nc 127.0.0.1 12121 and nc 127.0.0.1 12122 in different shells and do nothing

lsof:

node    638 kostenko   18u     IPv4             933154      0t0      TCP localhost:12121 (LISTEN)
node    638 kostenko   19u     IPv4             933155      0t0      TCP localhost:12122 (LISTEN)
node    638 kostenko   20u     IPv4             936058      0t0      TCP localhost:12121->localhost:45860 (ESTABLISHED)
node    638 kostenko   21u     IPv4             936014      0t0      TCP localhost:12122->localhost:48572 (ESTABLISHED)

lsof after 10 seconds:

node    638 kostenko   18u     IPv4             933154      0t0      TCP localhost:12121 (LISTEN)
node    638 kostenko   19u     IPv4             933155      0t0      TCP localhost:12122 (LISTEN)
node    638 kostenko   21u     IPv4             936014      0t0      TCP localhost:12122->localhost:48572 (ESTABLISHED)

@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

This code:

const fastify = require('fastify');
const good = fastify({ connectionTimeout: 10000 })
good.listen(12121)
const bad = fastify({ connectionTimeout: 10000, https: { handshakeTimeout: 10000 } })
bad.server.on('clientError', (err, socket) => {
    console.error(err);
    socket.end();
})
bad.listen(12122)

returns output

Error [ERR_TLS_HANDSHAKE_TIMEOUT]: TLS handshake timeout
    at new NodeError (node:internal/errors:363:5)
    at TLSSocket._handleTimeout (node:_tls_wrap:902:22)
    at Object.onceWrapper (node:events:471:28)
    at TLSSocket.emit (node:events:365:28)
    at TLSSocket.Socket._onTimeout (node:net:475:8)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7) {
  code: 'ERR_TLS_HANDSHAKE_TIMEOUT'
}

but lsof still shows established connection:

node    1568 kostenko   18u     IPv4             936246      0t0      TCP localhost:12121 (LISTEN)
node    1568 kostenko   19u     IPv4             936247      0t0      TCP localhost:12122 (LISTEN)
node    1568 kostenko   20u     IPv4             936253      0t0      TCP localhost:12122->localhost:48584 (ESTABLISHED)

@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

Calling socket.destroy() instead of socket.end() in onClientError solves the problem. I have no idea why https behaves as it behaves but I think it would be great either to update documentation or add clientError handler to the server

@climba03003
Copy link
Member

climba03003 commented Aug 12, 2021

Calling socket.destroy() instead of socket.end() in onClientError solves the problem. I have no idea why https behaves as it behaves but I think it would be great either to update documentation or add clientError handler to the server

I think we already have a clientError handler attached to the server.
https://github.com/fastify/fastify/blob/main/fastify.js#L400

The default one only call .end when the socket is writable.

@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

@climba03003 yes, you are right. It this use case socket.writable is equal to false. I have a proposal to call socket.destroy() if the socket is not writable

@gugu gugu changed the title Socket leakage [investigation in progress] Socket leakage on tls handshake timeout in https protocol Aug 12, 2021
@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

monkey patching fastify and removing onClientError handler also solves the problem

@Eomm
Copy link
Member

Eomm commented Aug 12, 2021

monkey patching fastify and removing onClientError handler also solves the problem

Was not possible to set a custom clientErrorHandler?

@climba03003 yes, you are right. It this use case socket.writable is equal to false. I have a proposal to call socket.destroy() if the socket is not writable

This seems good 👍🏽 Great spot!

@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

Was not possible to set a custom clientErrorHandler?
it is possible, I just wanted to check that removing fastify's error handler also solves the problem

here is my new clientErrorHandler:

export const clientErrorHandler = (err: ServerError | null, socket: Socket): void => {
    logger.error({ message: err?.message, destroyed: socket.destroyed, writable: socket.writable });
    if (socket.destroyed) {
        return;
    } else if (socket.writable) {
        socket.end();
    } else {
        socket.destroy();
    }
};

Instead of comparing error.type to ECONNRESET I check for socket.destroyed

gugu added a commit to gugu/fastify that referenced this issue Aug 12, 2021
1. Added check for writable state of the socket before trying to end request gracefully
2. Node.js updated their official documentation, so I changed link to the issue with a link to their official documentation

Fixes fastify#3251
@gugu
Copy link
Contributor Author

gugu commented Aug 12, 2021

More investigation.
Adding clientErrorHandler removes standard node error handler, which you can see here:

https://github.com/nodejs/node/blob/6ca23d7846cb47e84fd344543e394e50938540be/lib/_http_server.js#L666

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants