Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: TLS should dynamically tune record size for better HTTP performance #14376

Closed
tombergan opened this issue Feb 18, 2016 · 10 comments
Closed
Milestone

Comments

@tombergan
Copy link
Contributor

@tombergan tombergan commented Feb 18, 2016

Currently, the crypto/tls package sends all data records using a max payload size of 16 KB, which is the maximum allowed by the TLS spec. This is optimal for throughput-sensitive applications, but not for latency-sensitive applications, such as web browsing, because the entire 16KB record must be downloaded and decrypted before any of the plaintext can be processed by the application layer. On slow 2G connections this can easily add a 500-1000ms latency to web page load time, since the browser cannot start fetching subresources until that first 16KB record has been fully decrypted.

A simple heuristic that works well in practice is to use small records for the first ~1MB of data, then switch to 16KB records for subsequent data, then switch back to smaller records after the connection goes idle. Ideally, the smaller records should fit exactly within one TCP packet.

For more background reading, see:
https://www.igvita.com/2013/10/24/optimizing-tls-record-size-and-buffering-latency/
http://chimera.labs.oreilly.com/books/1230000000545/ch04.html#TLS_RECORD_SIZE
https://issues.apache.org/jira/browse/TS-2503

I have an implementation sketch here:
https://go-review.googlesource.com/#/c/19591/

I can demonstrate the latency improvement in a simple test environment:

  • Go HTTPS server
  • Chrome client browser
  • Simulated cellular network (bandwidth = ~200kbps)

Without this change, I notice a ~500ms delay between the time that Chrome receives the first byte of the HTTP response and the time Chrome sends the first request for a subresource. With this change, that difference drops to ~10ms.

If the general approach looks good, I can mail the change for review. +@bradfitz and +@agl for TLS expertise.

@bradfitz bradfitz added this to the Go1.7 milestone Feb 18, 2016
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Feb 18, 2016

Great! I have TODOs for investigating this but I didn't get it done in time for Go 1.6. I'm glad you're already on it.

I'll be able to start reviewing & helping next week. (in India and mostly occupied for the next couple days)

@ianlancetaylor ianlancetaylor changed the title TLS should dynamically tune record size for better HTTP performance net/http: TLS should dynamically tune record size for better HTTP performance Feb 18, 2016
@mikioh mikioh changed the title net/http: TLS should dynamically tune record size for better HTTP performance crypto/tls: TLS should dynamically tune record size for better HTTP performance Feb 18, 2016
@mikioh

This comment has been minimized.

Copy link
Contributor

@mikioh mikioh commented Feb 18, 2016

I overlooked this issue and left the following comment on CL 19591.

i think it would be better to open a new issue for discussion.
to be honest, i'm ambivalent about this because codel already exists and tcp_notsent_lowat is available on latest linux and darwin kernels. i'm wondering how much this tweak would be effective as compared with codel+unsent sockbuf controlling.

@tombergan

This comment has been minimized.

Copy link
Contributor Author

@tombergan tombergan commented Feb 18, 2016

to be honest, i'm ambivalent about this because codel already exists and tcp_notsent_lowat is available on latest linux and darwin kernels. i'm wondering how much this tweak would be effective as compared with codel+unsent sockbuf controlling.

Unless I completely misunderstood, I think we're talking about different problems. This issue is about framing at the TLS level -- a poor choice of TLS-level framing can force a slow client to download large TLS records, which increases time-to-first-decrypted-byte. We can't solve this issue at the TCP level because TCP has no visibility into TLS record boundaries, so TCP_NOTSENT_LOWAT won't help. I'm not familiar with CoDeL, but it also looks too low-level to solve this issue.

@mikioh

This comment has been minimized.

Copy link
Contributor

@mikioh mikioh commented Feb 18, 2016

Thanks for the explanation. If I understand correctly, you are trying to make trade with loss caused by split for making mobile or high-latency TLS speakers happy, and not trying to utilize hierarchical buffers along a path comprised of high latency links. If so, automatic record size adjustment/clamping based on TCP MSS (or path/link IP MTU) might make low-latency TLS speakers frown unless we provide a control knob.

@tombergan

This comment has been minimized.

Copy link
Contributor Author

@tombergan tombergan commented Feb 18, 2016

Are you proposing adding a new knob to tls.Config, DisableDynamicRecordSizing? I'd be fine with that. In theory the dynamic sizing parameters could also be knobs -- RecordSizeIdleTimeout (default = 1s) and RecordSizeBoostThreshold (default = 1MB) -- though I don't see a strong use case.

@igrigorik

This comment has been minimized.

Copy link

@igrigorik igrigorik commented Feb 19, 2016

In theory the dynamic sizing parameters could also be knobs -- RecordSizeIdleTimeout (default = 1s) and RecordSizeBoostThreshold (default = 1MB) -- though I don't see a strong use case.

I do think it's reasonable to provide these knobs. E.g. if you're building a download server that serves large payloads that are not stream-processed, then you can disable dynamic sizing and always output 16KB records to reduce framing + CPU overhead.

@tombergan

This comment has been minimized.

Copy link
Contributor Author

@tombergan tombergan commented Feb 19, 2016

I was thinking that DisableDynamicRecordSizing=true would always use the largest possible records (16KB). This should cover the download server case. Do you know of a use case where you want to fine-tune either the 1s idle timeout or the 1MB boost threshold? AFAICT, Google servers have used these constants for multiple years without change, suggesting they are almost always good enough.

@igrigorik

This comment has been minimized.

Copy link

@igrigorik igrigorik commented Feb 19, 2016

Ah, I guess that works. I know some other servers do expose explicit control over these, but I can't point to concrete examples of where and why the defaults are being overriden in production.

@mikioh

This comment has been minimized.

Copy link
Contributor

@mikioh mikioh commented Feb 19, 2016

@tombergan,

Yup, I think a new API is necessary for adapting to various circumstances. I'd leave the API design including what's the best default behavior from Go 1.7 for other people.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 25, 2016

CL https://golang.org/cl/19591 mentions this issue.

@gopherbot gopherbot closed this in ae00df1 Mar 12, 2016
@golang golang locked and limited conversation to collaborators Mar 13, 2017
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
Currently, if a client of crypto/tls (e.g., net/http, http2) calls
tls.Conn.Write with a 33KB buffer, that ends up writing three TLS
records: 16KB, 16KB, and 1KB. Slow clients (such as 2G phones) must
download the first 16KB record before they can decrypt the first byte.
To improve latency, it's better to send smaller TLS records. However,
sending smaller records adds overhead (more overhead bytes and more
crypto calls), which slightly hurts throughput.

A simple heuristic, implemented in this change, is to send small
records for new connections, then boost to large records after the
first 1MB has been written on the connection.

Fixes golang#14376

Change-Id: Ice0f6279325be6775aa55351809f88e07dd700cd
Reviewed-on: https://go-review.googlesource.com/19591
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tom Bergan <tombergan@google.com>
Reviewed-by: Adam Langley <agl@golang.org>
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
Currently, if a client of crypto/tls (e.g., net/http, http2) calls
tls.Conn.Write with a 33KB buffer, that ends up writing three TLS
records: 16KB, 16KB, and 1KB. Slow clients (such as 2G phones) must
download the first 16KB record before they can decrypt the first byte.
To improve latency, it's better to send smaller TLS records. However,
sending smaller records adds overhead (more overhead bytes and more
crypto calls), which slightly hurts throughput.

A simple heuristic, implemented in this change, is to send small
records for new connections, then boost to large records after the
first 1MB has been written on the connection.

Fixes golang#14376

Change-Id: Ice0f6279325be6775aa55351809f88e07dd700cd
Reviewed-on: https://go-review.googlesource.com/19591
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tom Bergan <tombergan@google.com>
Reviewed-by: Adam Langley <agl@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.