Skip to content

fix(proxy): reduce SSL connection overhead by setting TCP_NODELAY#15

Open
nik-localstack wants to merge 1 commit into
masterfrom
pnx-768-tcp-nodelay-ssl-proxy
Open

fix(proxy): reduce SSL connection overhead by setting TCP_NODELAY#15
nik-localstack wants to merge 1 commit into
masterfrom
pnx-768-tcp-nodelay-ssl-proxy

Conversation

@nik-localstack
Copy link
Copy Markdown

@nik-localstack nik-localstack commented May 14, 2026

Summary

  • Set TCP_NODELAY on both the client-facing socket and the proxy-to-PostgreSQL socket
  • Disables Nagle's algorithm, which was buffering small packets for up to 40ms waiting to batch them — the opposite of what a request/response protocol needs

Background

SSL connections through the proxy showed ~3x latency overhead compared to no-SSL for workloads that open many short-lived connections (the customer-reported pattern). Root cause analysis showed the overhead was entirely in connection setup, not per-query processing (a single reused connection had no measurable difference between SSL and no-SSL).

PostgreSQL connection startup is a rapid exchange of small messages (auth, parameter status, ready-for-query). With SSL there are even more round trips (SSLRequest → "S" → TLS handshake → startup). Nagle's algorithm was delaying each of these small packets, compounding the latency.

TCP_NODELAY is the standard setting for interactive protocol proxies. libpq and JDBC both set it unconditionally.

Results

Measured with 101 connections × 3 queries each:

no-SSL SSL Delta
Before 3s 9s +6s
After 3s 5s +2s

Related readings

https://en.wikipedia.org/wiki/Nagle%27s_algorithm
https://brooker.co.za/blog/2024/05/09/nagle.html

@nik-localstack nik-localstack self-assigned this May 14, 2026
@nik-localstack nik-localstack marked this pull request as ready for review May 14, 2026 19:16
Set TCP_NODELAY on both the client-facing and proxy-to-PostgreSQL sockets
to disable Nagle's algorithm. PostgreSQL's connection startup involves rapid
small-message exchanges (auth, parameter status, ready-for-query), and with
SSL there are additional round trips for the SSLRequest handshake. Nagle's
buffering was delaying these small packets by up to 40ms each, compounding
into significant latency for workloads that open many short-lived connections.

Measured improvement on 101 connections x 3 queries: SSL overhead reduced
from +6s to +2s vs no-SSL baseline. Per-query overhead with connection reuse
is unaffected (remains ~0s).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nik-localstack nik-localstack force-pushed the pnx-768-tcp-nodelay-ssl-proxy branch from d74531f to c6bd2b3 Compare May 15, 2026 11:11
Copy link
Copy Markdown
Member

@cloutierMat cloutierMat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge this, I think we should measure the impact on larger queries as well. The algo helps with the burden of sending smaller packets, so removing it will show improvement on smaller data transfer, but are we losing on bigger data transfer?

Is it safe to disable during ssl handshake and re-enable after the handshake to get the best of both world?

Note: Don't forget to update version and changelog in order to be able to publish from it

Comment thread postgresql_proxy/proxy.py

# Accept the raw connection
clientsocket, address = sock.accept()
clientsocket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Would there be a point to enable only for ssl?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Nagle's delay applies to any small-packet exchange, not just SSL. SSL makes the improvement more obvious because of the extra SSLRequest round trip, but auth and ready-for-query messages are small on every connection.

Comment thread postgresql_proxy/proxy.py
redirect_config = self.instance_config.redirect

pg_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
pg_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you measured the impact on larger queries? Will we lose significant performance?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a small benchmark with TCP_NODELAY=1 vs Nagle across payload sizes from 1B to 10MB:

(mean end-to-end latency for a single SELECT query)

Payload TCP_NODELAY=1 Nagle
1 B 0.22ms 0.24ms
100 KB 0.69ms 0.69ms
1 MB 6.75ms 6.68ms
10 MB 634ms 628ms

No meaningful difference for larger queries.
The benchmark did surface a pre-existing bug where large payloads could cause connection hangs. I will open a follow-up PR for this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR for the large payload hang #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants