Skip to content

Commit d352dfd

Browse files
authored
usockets: don't defer TLS close when close_notify flush fails (#30368)
### What `ssl_handle_shutdown` in `packages/bun-usockets/src/crypto/openssl.c` returned 0 ("wait for the peer") when `SSL_shutdown` failed with `SSL_ERROR_WANT_WRITE` / `WANT_READ` during a graceful close (`force_fast_shutdown == 0`). Change it to return 1 so the raw socket closes (and `SSL_free` runs) immediately. ### Why `SSL_shutdown` allocates BoringSSL's `ssl->s3->write_buffer` to hold the encoded `close_notify` alert. If the BIO write fails (kernel buffer full, peer already gone), `SSL_shutdown` returns -1 with `WANT_WRITE`. The old code returned 0 from `ssl_handle_shutdown`, which told `us_internal_ssl_close` to leave the fd open and wait for the peer. That deferral is correct for the `SSL_shutdown() == 0` case (alert flushed, waiting for the peer's reply — see the comment in `us_internal_ssl_close`). It's wrong here: the alert never went out, `SSL_SENT_SHUTDOWN` is already set on the first call, and once `us_internal_ssl_is_shut_down` is true `on_writable`/`on_data` short-circuit without ever re-dispatching the queued alert. There is no retry path, so the socket stays in limbo holding `s->ssl` and the write buffer until *some* other event arrives — which may never happen. This shows up as an intermittent LSan failure on the Debian x64-asan lane in `test/js/node/http/node-https-checkServerIdentity.test.ts`: the spawned child calls `server.close()` and exits right after the request emits `error`, so the lingering accepted socket never gets another event and `SSL_free` never runs. ``` Direct leak of 417 byte(s) in 1 object(s) allocated from: bssl::SSLBuffer::EnsureCap vendor/boringssl/ssl/ssl_buffer.cc:72 bssl::do_tls_write vendor/boringssl/ssl/s3_pkt.cc:194 bssl::tls_dispatch_alert vendor/boringssl/ssl/s3_pkt.cc:373 SSL_shutdown vendor/boringssl/ssl/ssl_lib.cc:1039 ssl_handle_shutdown packages/bun-usockets/src/crypto/openssl.c:821 us_internal_ssl_close packages/bun-usockets/src/crypto/openssl.c:871 us_internal_ssl_on_data ``` This is a recurring pre-existing flake on `main` — see retrigger commits `8722a10109`, `13a267eabb`, `ded11f3fb7`. ### Behavior - The unsent `close_notify` is best-effort. Closing without it produces an abrupt FIN, which is indistinguishable from a half-close to the peer (TLS 1.3 doesn't require it; TLS 1.2 implementations tolerate it). - The deferred-graceful-close path (`SSL_shutdown() == 0` → wait for the peer's `close_notify`) is unchanged. - `force_fast_shutdown == 1` (forceful close from `_destroy()` / abort) already returned 1 here, so no change. ### Testing - `bun bd test test/js/node/http/node-https-checkServerIdentity.test.ts` — 4 pass, 0 fail. - `bun bd test test/js/node/tls/node-tls-connect.test.ts test/js/bun/net/socket.test.ts` — 57 pass, 3 skip, 1 fail. The single failure (`setSession() should not leak the SSL_SESSION returned by d2i_SSL_SESSION`, RSS-growth threshold 40 MiB exceeded at ~293 MiB) reproduces identically on unmodified `main` in a macOS debug build — pre-existing, unrelated to this change. - The LSan leak itself is Linux-ASAN-only and timing-dependent; not reproducible on macOS, so there is no deterministic regression test.
1 parent 58fcdad commit d352dfd

1 file changed

Lines changed: 13 additions & 1 deletion

File tree

packages/bun-usockets/src/crypto/openssl.c

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -828,7 +828,19 @@ static int ssl_handle_shutdown(struct us_socket_t *s, int force_fast_shutdown) {
828828
return 1;
829829
}
830830
if (err == SSL_ERROR_WANT_READ || err == SSL_ERROR_WANT_WRITE) {
831-
return force_fast_shutdown ? 1 : 0;
831+
/* The close_notify could not be flushed (BIO write failed: kernel
832+
* buffer full or peer already gone). There is no retry path —
833+
* SSL_SENT_SHUTDOWN is already set, so on_writable/on_data
834+
* short-circuit through is_shut_down and never re-dispatch the
835+
* alert. Returning 0 here would keep s->ssl (and BoringSSL's
836+
* write_buffer holding the encoded alert) alive until the next
837+
* socket event, which may never arrive — observed as an LSan
838+
* leak in node-https-checkServerIdentity.test.ts when the child
839+
* exits right after server.close(). The deferred-close contract
840+
* documented in us_internal_ssl_close only applies to the
841+
* SSL_shutdown()==0 case where the alert *was* flushed; here it
842+
* never went out, so close now. */
843+
return 1;
832844
}
833845
s->ssl_fatal_error = 1;
834846
return 1;

0 commit comments

Comments
 (0)