Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit fragmentation in write_vectored #1640

Merged
merged 3 commits into from
Feb 16, 2024
Merged

Conversation

Keksoj
Copy link
Contributor

@Keksoj Keksoj commented Nov 29, 2023

We, Wonshtrum and myself, work at Clever Cloud on a custom-made reverse proxy called Sōzu. We switched from using both OpenSSL and Rustls to being Rustls-only, one year ago, and we are very happy about it. We appreciate the security orientation of Rustls, as well as its native Rust nature.

Our issue

In Sōzu, our reverse proxy, we strive to optimize system calls and throughput. Here is the syscall we found with strace when transmitting a simple HTTPS response to a client:

writev(
    11,
    [
        {iov_base="\27\3\3\0\31\210H\371\252\25w\304\275\346u\27\371\334g\345\244\371\331\203c\302\356\324i\367", iov_len=30},
        {iov_base="\27\3\3\0\22\366IR\236\357\374\37\30\310E\330Xr\3249\24r0", iov_len=23},
        {iov_base="\27\3\3\0\24\326\337\223\356R\360\37\215\343a\2L\236\30\24D\31\10\363\16", iov_len=25},
        {iov_base="\27\3\3\0\22\347\305\255\371?\224\6\212\325\345\250*a\250\321i\366\266", iov_len=23},
        {iov_base="\27\3\3\0\23\37\24Ex\334\vHp\2758\207\27G\306\16\226\253\320\240", iov_len=24},
        {iov_base="\27\3\3\0\23\23\362q\352\307\303\364\312]\361(\227<\17\334Y\333g\310", iov_len=24},
        {iov_base="\27\3\3\0\37\216\332\374\376\303H\206\35f\334\310\252\375\341\366\224\261*\367\30mg\\\230dbh"..., iov_len=36},
        {iov_base="\27\3\3\0\23-\312\31\v\353\212\303\265\231H\371\356%\317JX\32Tg", iov_len=24},
        {iov_base="\27\3\3\0\23d\356\0276\3236\252\206^5\346=\234F\2\200\33\314>", iov_len=24},
        {iov_base="\27\3\3\0\23U\30\37x7\5d\304/ZY\274\25\17;\276t]\216", iov_len=24},
        {iov_base="\27\3\3\0\25\240\371q\2416Z\202\6\35\311|\203Bai-\20\217=\3516", iov_len=26},
        {iov_base="\27\3\3\0\23\220\350\323\321bx\321\21:\35\203?\257\313)\200\364E\377", iov_len=24},
        {iov_base="\27\3\3\0.\226\254\30\240\315\307o\250\243b`Q\n\17\226\333\347\256[\331\324#~/\240\206,"..., iov_len=51},
        {iov_base="\27\3\3\0\23\251\207:\350?/\252Z\317\322\0017\24E\256\5*\342\224", iov_len=24},
        {iov_base="\27\3\3\0\30$w0!\205\310\276\372k\204?\375k\334\334\363\314\\!\355\266\0\362\33", iov_len=29},
        {iov_base="\27\3\3\0\23\257\277rj;\3\376\251\10\310\37\265\365\303\216\260\345\361\366", iov_len=24},
        {iov_base="\27\3\3\0+\341\0313\277'\223\3\272W&\316\306\360BYI9\225Vh\347t\3}\303\321\343"..., iov_len=48},
        {iov_base="\27\3\3\0\23\247\33\26M\255%\351\214F\254\177\250+z\253]\17F\325", iov_len=24},
        {iov_base="\27\3\3\0\23fm\262t\334\271>\226\225ITia\250,\31^\37\f", iov_len=24},
        {iov_base="\27\3\3\0\33S\25/zM'\343\302L/l\\\1\22\250M\0AL\357\304\36\227\317\322Q\365", iov_len=32}
    ],
    20
) = 563

What we see here is a number of small pieces of data, headers mostly, being encoded into TLS frames, one at a time. The \27\3\3\0 pattern is the beginning of a TLS frame.
This repetition is a waste of bandwitdth.

Our investigation

The system call above is due to how the Writer's write_vectored method encodes the chunks one by one.

In our case, our reverse proxy creates a number of smaller chunks. Half of the throughput is monopolized by TLS headers.

Our proposal

We could have solved our issue on the proxy side, by accumulating chunks and calling write() instead of calling write_vectored, but we seek to avoid dynamic memory allocation and copying.

In this pull request on Rustls, instead of treating the chunks individually in write_vectored, we suggest to pass them down together in a BorrowedPayload form, until they are encrypted in one single TLS frame (or several if the payload exceeds MAX_FRAGMENT_SIZE). This new type BorrowedPayload does not copy the data but manipulates slices of bytes.

Syscall improvement

This is the same response as above: a simple "hi" and some headers.

writev(
    14,
    [
        {iov_base="\27\3\3\0\214\376h\202\316\333w\252\25\24\375\354\254\24p\343\320\320\311\21\340\315B\367\327\342\225\340"..., iov_len=145}
    ],
    1
) = 145

We can see that there is only one iov_base (corresponding to 1 TLS frame), instead of many prior.
The syscall has dimished in size, from 563 to 145.

Benchmarks

Our modification reduces bandwidth usage significantly when benchmarking our reverse proxy.

These benchmarks run on a simple desktop machine, with 4 backends to route HTTPS traffic to. The backends reply with a simple "hi" and some headers.

Sōzu + Rustls 0.21.9

bombardier -c 450 -n 100000 https://localhost:8443/api -l

Statistics        Avg      Stdev        Max
  Reqs/sec      7973.06    4220.70   18732.68
  Latency       56.68ms    69.91ms      1.15s
  Latency Distribution
     50%    51.98ms
     75%    54.38ms
     90%    56.90ms
     95%    58.64ms
     99%    61.33ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     5.01MB/s

Sōzu + Rustls (this branch)

bombardier -c 450 -n 100000 https://localhost:8443/api -l                                             
Bombarding https://localhost:8443/api with 100000 request(s) using 450 connection(s)

Statistics        Avg      Stdev        Max
  Reqs/sec      8616.52    5132.64   22453.38
  Latency       52.58ms    64.24ms      1.04s
  Latency Distribution
     50%    47.85ms
     75%    50.40ms
     90%    53.53ms
     95%    55.80ms
     99%    76.39ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     2.00MB/s

Discussion

The strace and the benchmark suggest that our addition to the rustls codebase diminishes bandwidth use and improves overall performance in a meaningful way. Requests per second, and latency, are the key metrics for us.

Diving in Rustls, we found a number of memory allocation that could be avoided. If you would like, we would be more than happy to work with you in this direction.

(edited for clarity)

@Keksoj Keksoj changed the title create type BorrowedPayload Limit fragmentation in write_vectored Nov 29, 2023
@Keksoj Keksoj force-pushed the borrowed-payload branch 3 times, most recently from b79a446 to cc4d000 Compare November 29, 2023 11:41
Copy link

codecov bot commented Nov 29, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (1cdb10f) 95.94% compared to head (71746d6) 95.95%.

Files Patch % Lines
rustls/src/msgs/message.rs 96.75% 7 Missing ⚠️
rustls/src/conn.rs 88.23% 2 Missing ⚠️
rustls/src/crypto/cipher.rs 0.00% 2 Missing ⚠️
rustls/src/common_state.rs 97.43% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1640    +/-   ##
========================================
  Coverage   95.94%   95.95%            
========================================
  Files          81       81            
  Lines       18590    18827   +237     
========================================
+ Hits        17837    18065   +228     
- Misses        753      762     +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wonshtrum
Copy link
Contributor

On the topic of allocation/copy, we put a lot of effort into Sozu to handle HTTP messages (parsing and proxying) in zero-copy in most cases. Very few dynamic allocations are also necessary, as Sozu preallocates and reuses a pool of buffers.

Analyzing the Rustls codebase, we found that in order to encrypt a payload, the entire plaintext message is copied by MessageEncrypter::encrypt, then the entire encrypted message is copied by OpaqueMessage::encode. Both copies require a memory allocation.

As explained in this PR description, aggregating the vector chunks on our side would mean a third allocation and copy. Instead, we placed the responsibility of aggregation on the copy of MessageEncrypter::encrypt. This copy could also be avoided if the ciphers accepted fragmented payloads, but it would be a lot more complicated.

On the other hand, OpaqueMessage::encode only appends the 5-bytes "TLS header". Allocating and copying an entire payload (up to 16Kb) seems wasteful and could be completely avoided if the payload built by MessageEncrypted::encrypt accounted for the 5-bytes prelude. OpaqueMessage::encode would only have to fill in the reserved space.

My remarks are not directly related to this PR, but to the overall talk on memory allocation. As @Keksoj said we are eager to contribute and would happily make this change as well.

@djc
Copy link
Member

djc commented Nov 29, 2023

On the other hand, OpaqueMessage::encode only appends the 5-bytes "TLS header". Allocating and copying an entire payload (up to 16Kb) seems wasteful and could be completely avoided if the payload built by MessageEncrypted::encrypt accounted for the 5-bytes prelude. OpaqueMessage::encode would only have to fill in the reserved space.

My remarks are not directly related to this PR, but to the overall talk on memory allocation. As @Keksoj said we are eager to contribute and would happily make this change as well.

Thanks for the investigation! We will review your changes shortly -- meanwhile, I just wanted to acknowledge that we'd be happy to review a PR for this, too.

Copy link
Contributor

@paolobarbolini paolobarbolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm too interested in this change. I've left a couple of comments (mostly nitpicks)

rustls/src/msgs/fragmenter.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/fragmenter.rs Outdated Show resolved Hide resolved
@Keksoj
Copy link
Contributor Author

Keksoj commented Dec 4, 2023

Thanks for the reviews! We appreciate your reception.

We reproduced the icount test of the CI locally and found a lot of variance. Should we take them into account?

@ctz
Copy link
Member

ctz commented Dec 4, 2023

Allocating and copying an entire payload (up to 16Kb) seems wasteful and could be completely avoided if the payload built by MessageEncrypted::encrypt accounted for the 5-bytes prelude. OpaqueMessage::encode would only have to fill in the reserved space.

I did a quick try of that, and it improves our send-direction transfer benchmarks by almost 20% in some cases:

Scenario Baseline Candidate Diff
transfer_no_resume_ring_1.2_rsa_aes_server 57033726 46061663 ✅ -10972063 (-19.24%)
transfer_no_resume_ring_1.3_rsa_aes_server 57175449 46178199 ✅ -10997250 (-19.23%)
transfer_no_resume_ring_1.3_ecdsa_aes_server 57155204 46178565 ✅ -10976639 (-19.20%)
transfer_no_resume_aws_lc_rs_1.3_ecdsa_aes_server 57272697 46282632 ✅ -10990065 (-19.19%)
transfer_no_resume_ring_1.3_rsa_chacha_server 91325257 80328043 ✅ -10997214 (-12.04%)
transfer_no_resume_ring_1.3_ecdsa_chacha_server 91304163 80328311 ✅ -10975852 (-12.02%)
transfer_no_resume_aws_lc_rs_1.3_ecdsa_chacha_server 91413941 80438723 ✅ -10975218 (-12.01%)

So this is certainly worth doing!

@cpu
Copy link
Member

cpu commented Dec 13, 2023

@rustls-benchmarking bench

(experimenting with our new benchmarking integration)

Copy link

rustls-benchmarking bot commented Dec 13, 2023

Benchmark results

Instruction counts

Significant differences

There are no significant instruction count differences

Other differences

Click to expand
Scenario Baseline Candidate Diff Threshold
handshake_tickets_aws_lc_rs_1.2_rsa_aes_server 4590737 4560942 -29795 (-0.65%) 3.16%
handshake_no_resume_aws_lc_rs_1.2_rsa_aes_server 12293739 12325094 31355 (0.26%) 1.01%
handshake_tickets_aws_lc_rs_1.2_rsa_aes_client 4564442 4574736 10294 (0.23%) 0.98%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_aes_client 8663122 8646583 -16539 (-0.19%) 0.93%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_aes_client 30491449 30544172 52723 (0.17%) 0.47%
handshake_session_id_ring_1.2_rsa_aes_server 4367526 4374272 6746 (0.15%) 0.53%
handshake_no_resume_aws_lc_rs_1.3_rsa_chacha_server 12704027 12723415 19388 (0.15%) 0.94%
handshake_tickets_ring_1.3_ecdsap256_aes_server 43835781 43896990 61209 (0.14%) 0.26%
handshake_no_resume_aws_lc_rs_1.3_rsa_aes_server 12671839 12686304 14465 (0.11%) 0.71%
handshake_tickets_aws_lc_rs_1.3_rsa_aes_server 32863297 32899864 36567 (0.11%) 0.49%
handshake_no_resume_ring_1.3_ecdsap256_chacha_client 3899263 3895001 -4262 (-0.11%) 0.20%
handshake_tickets_aws_lc_rs_1.3_rsa_chacha_server 32824782 32860266 35484 (0.11%) 0.60%
handshake_tickets_ring_1.3_ecdsap384_chacha_server 43819484 43863472 43988 (0.10%) 0.25%
handshake_tickets_ring_1.2_rsa_aes_server 4832447 4837241 4794 (0.10%) 0.57%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_aes_server 32834033 32864817 30784 (0.09%) 0.20%
handshake_session_id_aws_lc_rs_1.3_rsa_chacha_server 32582475 32612657 30182 (0.09%) 0.42%
handshake_tickets_ring_1.3_ecdsap384_aes_server 43871501 43835516 -35985 (-0.08%) 0.21%
handshake_session_id_aws_lc_rs_1.3_rsa_aes_client 30484458 30509201 24743 (0.08%) 0.20%
handshake_session_id_aws_lc_rs_1.2_rsa_aes_server 4071168 4067907 -3261 (-0.08%) 4.06%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_aes_client 30332658 30356395 23737 (0.08%) 0.45%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_aes_client 57967203 58011946 44743 (0.08%) 0.20%
handshake_session_id_ring_1.3_ecdsap384_chacha_server 43580015 43611159 31144 (0.07%) 0.20%
handshake_tickets_ring_1.3_ecdsap256_chacha_server 43837838 43807608 -30230 (-0.07%) 0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_chacha_server 32820502 32842802 22300 (0.07%) 0.27%
handshake_session_id_ring_1.2_rsa_aes_client 4471669 4474635 2966 (0.07%) 0.63%
handshake_tickets_ring_1.3_rsa_chacha_client 42325016 42352942 27926 (0.07%) 0.20%
handshake_no_resume_ring_1.3_ecdsap256_aes_client 3894383 3896878 2495 (0.06%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_rsa_aes_server 57167091 57130506 -36585 (-0.06%) 0.33%
handshake_no_resume_ring_1.3_ecdsap256_aes_server 2127138 2128477 1339 (0.06%) 0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_chacha_server 32829016 32849434 20418 (0.06%) 0.32%
transfer_no_resume_aws_lc_rs_1.3_rsa_chacha_server 91374867 91326136 -48731 (-0.05%) 0.20%
transfer_no_resume_ring_1.2_rsa_aes_server 56972538 57001453 28915 (0.05%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_server 91348845 91394313 45468 (0.05%) 0.20%
handshake_session_id_aws_lc_rs_1.3_rsa_chacha_client 30474239 30489165 14926 (0.05%) 0.20%
handshake_tickets_ring_1.3_rsa_aes_client 42359337 42379877 20540 (0.05%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_aes_server 57137375 57163211 25836 (0.05%) 0.20%
transfer_no_resume_ring_1.3_ecdsap384_aes_server 57091508 57117296 25788 (0.05%) 0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_aes_client 30496059 30509400 13341 (0.04%) 0.20%
transfer_no_resume_ring_1.3_rsa_aes_server 57088057 57112306 24249 (0.04%) 0.20%
transfer_no_resume_ring_1.3_ecdsap256_aes_server 57128902 57152722 23820 (0.04%) 0.26%
handshake_tickets_ring_1.3_ecdsap384_chacha_client 42133767 42151271 17504 (0.04%) 0.20%
handshake_session_id_ring_1.3_ecdsap256_aes_client 41995600 42011751 16151 (0.04%) 0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_aes_server 32837319 32849600 12281 (0.04%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_aes_client 3343377 3342141 -1236 (-0.04%) 0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_chacha_server 32610835 32622453 11618 (0.04%) 0.21%
transfer_no_resume_aws_lc_rs_1.2_rsa_aes_server 57153803 57174060 20257 (0.04%) 0.27%
handshake_tickets_ring_1.3_rsa_chacha_server 43899232 43883683 -15549 (-0.04%) 0.20%
handshake_tickets_ring_1.3_ecdsap256_chacha_client 42138658 42152912 14254 (0.03%) 0.20%
handshake_tickets_ring_1.2_rsa_aes_client 4737980 4739541 1561 (0.03%) 0.75%
handshake_session_id_ring_1.3_ecdsap256_chacha_client 41953755 41967230 13475 (0.03%) 0.20%
handshake_session_id_ring_1.3_ecdsap384_chacha_client 41950209 41963615 13406 (0.03%) 0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_aes_client 30311632 30321289 9657 (0.03%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_client 92455342 92426064 -29278 (-0.03%) 0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_aes_server 32627438 32637402 9964 (0.03%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_server 91348275 91374836 26561 (0.03%) 0.20%
transfer_no_resume_ring_1.3_rsa_chacha_server 91251928 91277084 25156 (0.03%) 0.20%
transfer_no_resume_ring_1.3_ecdsap256_chacha_server 91254388 91279184 24796 (0.03%) 0.20%
transfer_no_resume_ring_1.3_ecdsap384_chacha_server 91256571 91281248 24677 (0.03%) 0.20%
handshake_tickets_ring_1.3_rsa_aes_server 43916979 43928539 11560 (0.03%) 0.20%
handshake_session_id_ring_1.3_ecdsap384_aes_server 43665269 43676635 11366 (0.03%) 0.20%
handshake_tickets_ring_1.3_ecdsap384_aes_client 42184600 42173794 -10806 (-0.03%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_aes_server 57178748 57164564 -14184 (-0.02%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_client 8667499 8665359 -2140 (-0.02%) 1.29%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_chacha_server 32602433 32610354 7921 (0.02%) 0.30%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_chacha_client 30495316 30502290 6974 (0.02%) 0.20%
handshake_session_id_ring_1.3_rsa_chacha_client 42140074 42149621 9547 (0.02%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_server 4265002 4265934 932 (0.02%) 0.20%
handshake_session_id_ring_1.3_rsa_aes_client 42197529 42206375 8846 (0.02%) 0.20%
handshake_session_id_aws_lc_rs_1.3_rsa_aes_server 32663301 32670065 6764 (0.02%) 0.46%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_aes_server 32633008 32639516 6508 (0.02%) 0.20%
handshake_session_id_ring_1.3_rsa_chacha_server 43600412 43606868 6456 (0.01%) 0.20%
handshake_session_id_ring_1.3_rsa_aes_server 43653759 43659934 6175 (0.01%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_server 1886159 1885898 -261 (-0.01%) 0.20%
handshake_tickets_ring_1.3_ecdsap256_aes_client 42181225 42175780 -5445 (-0.01%) 0.20%
handshake_tickets_aws_lc_rs_1.3_rsa_chacha_client 30700103 30704058 3955 (0.01%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_aes_client 57965185 57971116 5931 (0.01%) 0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_chacha_client 30306814 30309866 3052 (0.01%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_client 3346308 3345976 -332 (-0.01%) 0.28%
handshake_session_id_ring_1.3_ecdsap384_aes_client 42004809 42008854 4045 (0.01%) 0.20%
handshake_no_resume_aws_lc_rs_1.2_rsa_aes_client 3153790 3153498 -292 (-0.01%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_aes_server 1882598 1882752 154 (0.01%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_rsa_chacha_client 3367417 3367692 275 (0.01%) 0.20%
handshake_session_id_aws_lc_rs_1.2_rsa_aes_client 4238519 4238860 341 (0.01%) 0.93%
handshake_no_resume_ring_1.2_rsa_aes_client 4441422 4441773 351 (0.01%) 0.20%
handshake_no_resume_ring_1.3_ecdsap384_aes_server 13738133 13737096 -1037 (-0.01%) 0.20%
handshake_session_id_ring_1.3_ecdsap256_aes_server 43658394 43655136 -3258 (-0.01%) 0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_chacha_client 30308895 30306829 -2066 (-0.01%) 0.35%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_aes_server 4260530 4260785 255 (0.01%) 0.20%
transfer_no_resume_ring_1.3_rsa_aes_client 57949782 57952359 2577 (0.00%) 0.20%
handshake_session_id_ring_1.3_ecdsap256_chacha_server 43591462 43593100 1638 (0.00%) 0.20%
transfer_no_resume_ring_1.3_ecdsap256_chacha_client 92382921 92386348 3427 (0.00%) 0.20%
handshake_no_resume_aws_lc_rs_1.3_rsa_aes_client 3356580 3356466 -114 (-0.00%) 0.20%
handshake_no_resume_ring_1.2_rsa_aes_server 12045856 12046244 388 (0.00%) 0.20%
handshake_tickets_aws_lc_rs_1.3_rsa_aes_client 30710953 30710030 -923 (-0.00%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_rsa_aes_client 57989385 57991068 1683 (0.00%) 0.20%
transfer_no_resume_ring_1.3_ecdsap384_aes_client 57950111 57951530 1419 (0.00%) 0.20%
transfer_no_resume_ring_1.3_rsa_chacha_client 92390590 92388348 -2242 (-0.00%) 0.20%
handshake_no_resume_ring_1.3_rsa_chacha_server 12250178 12250463 285 (0.00%) 0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_chacha_client 30483095 30483640 545 (0.00%) 0.30%
handshake_no_resume_ring_1.3_rsa_aes_server 12240049 12240250 201 (0.00%) 0.20%
handshake_no_resume_ring_1.3_rsa_aes_client 4537988 4538061 73 (0.00%) 0.20%
handshake_no_resume_ring_1.3_rsa_chacha_client 4547847 4547779 -68 (-0.00%) 0.20%
transfer_no_resume_ring_1.3_ecdsap256_aes_client 57950661 57949876 -785 (-0.00%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_rsa_chacha_client 92443304 92444522 1218 (0.00%) 0.20%
transfer_no_resume_ring_1.3_ecdsap384_chacha_client 92385644 92386860 1216 (0.00%) 0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_client 92438276 92439269 993 (0.00%) 0.20%
handshake_no_resume_ring_1.3_ecdsap256_chacha_server 2131421 2131403 -18 (-0.00%) 0.21%
transfer_no_resume_ring_1.2_rsa_aes_client 57811354 57810914 -440 (-0.00%) 0.20%
handshake_no_resume_ring_1.3_ecdsap384_aes_client 35451509 35451391 -118 (-0.00%) 0.20%
transfer_no_resume_aws_lc_rs_1.2_rsa_aes_client 68432293 68432434 141 (0.00%) 0.20%
handshake_no_resume_ring_1.3_ecdsap384_chacha_client 35454465 35454437 -28 (-0.00%) 0.20%
handshake_no_resume_ring_1.3_ecdsap384_chacha_server 13740738 13740732 -6 (-0.00%) 0.20%

Wall-time

Significant differences

⚠️ There are significant wall-time differences

Click to expand
Scenario Baseline Candidate Diff Threshold
handshake_session_id_ring_1.3_ecdsap256_chacha 6.83 ms 6.92 ms ⚠️ 0.09 ms (1.38%) 1.21%
handshake_tickets_ring_1.3_ecdsap256_chacha 6.83 ms 6.92 ms ⚠️ 0.09 ms (1.31%) 1.26%
handshake_session_id_ring_1.3_rsa_chacha 7.46 ms 7.55 ms ⚠️ 0.09 ms (1.25%) 1.23%
handshake_tickets_ring_1.3_rsa_chacha 7.47 ms 7.56 ms ⚠️ 0.09 ms (1.14%) 1.11%

Other differences

Click to expand
Scenario Baseline Candidate Diff Threshold
handshake_tickets_aws_lc_rs_1.3_ecdsap256_aes 5.38 ms 5.45 ms 0.08 ms (1.40%) 2.49%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_aes 5.36 ms 5.44 ms 0.07 ms (1.36%) 2.24%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_chacha 5.38 ms 5.45 ms 0.07 ms (1.28%) 2.43%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_aes 6.08 ms 6.15 ms 0.07 ms (1.16%) 2.19%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_chacha 5.35 ms 5.41 ms 0.06 ms (1.16%) 1.71%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_aes 6.10 ms 6.17 ms 0.07 ms (1.13%) 1.91%
handshake_tickets_aws_lc_rs_1.3_rsa_chacha 6.39 ms 6.46 ms 0.07 ms (1.09%) 1.87%
handshake_tickets_aws_lc_rs_1.3_rsa_aes 6.38 ms 6.45 ms 0.07 ms (1.08%) 2.03%
handshake_session_id_aws_lc_rs_1.3_rsa_aes 6.37 ms 6.44 ms 0.07 ms (1.05%) 1.92%
handshake_session_id_ring_1.3_ecdsap384_chacha 9.92 ms 10.03 ms 0.10 ms (1.05%) 1.18%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_chacha 6.10 ms 6.16 ms 0.06 ms (1.02%) 1.90%
handshake_session_id_ring_1.3_ecdsap256_aes 6.86 ms 6.93 ms 0.07 ms (0.99%) 1.22%
handshake_session_id_ring_1.3_rsa_aes 7.50 ms 7.57 ms 0.07 ms (0.98%) 1.00%
handshake_tickets_ring_1.3_ecdsap256_aes 6.88 ms 6.94 ms 0.07 ms (0.96%) 1.07%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_chacha 6.05 ms 6.11 ms 0.06 ms (0.96%) 1.40%
handshake_tickets_ring_1.3_ecdsap384_chacha 9.94 ms 10.03 ms 0.09 ms (0.95%) 1.00%
handshake_tickets_ring_1.3_rsa_aes 7.51 ms 7.58 ms 0.07 ms (0.92%) 1.09%
handshake_session_id_aws_lc_rs_1.3_rsa_chacha 6.35 ms 6.41 ms 0.06 ms (0.89%) 1.23%
handshake_tickets_ring_1.3_ecdsap384_aes 9.98 ms 10.06 ms 0.08 ms (0.82%) 1.00%
handshake_session_id_ring_1.3_ecdsap384_aes 9.95 ms 10.03 ms 0.08 ms (0.80%) 1.00%
handshake_tickets_aws_lc_rs_1.2_rsa_aes 2.35 ms 2.36 ms 0.01 ms (0.53%) 1.56%
handshake_no_resume_ring_1.3_ecdsap384_chacha 3.61 ms 3.62 ms 0.01 ms (0.37%) 1.00%
handshake_session_id_ring_1.2_rsa_aes 1.75 ms 1.74 ms -0.01 ms (-0.37%) 1.56%
handshake_no_resume_ring_1.3_ecdsap384_aes 3.61 ms 3.62 ms 0.01 ms (0.30%) 1.00%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_aes 479.17 µs 477.81 µs -1.36 µs (-0.28%) 3.35%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_chacha 478.40 µs 477.13 µs -1.27 µs (-0.27%) 3.78%
handshake_no_resume_ring_1.2_rsa_aes 1.08 ms 1.07 ms -0.00 ms (-0.25%) 1.15%
handshake_no_resume_ring_1.3_ecdsap256_chacha 507.41 µs 508.64 µs 1.24 µs (0.24%) 2.26%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_aes 1.19 ms 1.19 ms -0.00 ms (-0.23%) 1.00%
transfer_no_resume_aws_lc_rs_1.2_rsa_aes 5.86 ms 5.87 ms 0.01 ms (0.17%) 3.26%
handshake_tickets_ring_1.2_rsa_aes 1.84 ms 1.84 ms 0.00 ms (0.14%) 1.54%
handshake_session_id_aws_lc_rs_1.2_rsa_aes 2.19 ms 2.19 ms 0.00 ms (0.12%) 1.00%
transfer_no_resume_ring_1.3_rsa_aes 7.30 ms 7.29 ms -0.01 ms (-0.11%) 3.30%
transfer_no_resume_ring_1.3_ecdsap384_chacha 16.53 ms 16.54 ms 0.02 ms (0.11%) 1.71%
transfer_no_resume_ring_1.3_ecdsap384_aes 9.83 ms 9.84 ms 0.01 ms (0.11%) 2.08%
handshake_no_resume_ring_1.3_rsa_aes 1.08 ms 1.08 ms -0.00 ms (-0.10%) 1.00%
transfer_no_resume_ring_1.2_rsa_aes 7.22 ms 7.21 ms -0.01 ms (-0.10%) 2.60%
transfer_no_resume_ring_1.3_rsa_chacha 14.00 ms 14.02 ms 0.01 ms (0.09%) 2.08%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_chacha 1.18 ms 1.18 ms -0.00 ms (-0.09%) 1.35%
handshake_no_resume_aws_lc_rs_1.3_rsa_chacha 1.41 ms 1.41 ms 0.00 ms (0.08%) 1.00%
transfer_no_resume_ring_1.3_ecdsap256_chacha 13.42 ms 13.43 ms 0.01 ms (0.08%) 1.96%
handshake_no_resume_aws_lc_rs_1.3_rsa_aes 1.42 ms 1.42 ms -0.00 ms (-0.06%) 1.00%
handshake_no_resume_aws_lc_rs_1.2_rsa_aes 1.36 ms 1.37 ms 0.00 ms (0.04%) 1.47%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_aes 4.93 ms 4.93 ms 0.00 ms (0.04%) 7.05%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_chacha 14.13 ms 14.13 ms 0.00 ms (0.03%) 2.19%
handshake_no_resume_ring_1.3_rsa_chacha 1.09 ms 1.09 ms 0.00 ms (0.02%) 1.00%
transfer_no_resume_aws_lc_rs_1.3_rsa_chacha 14.36 ms 14.37 ms 0.00 ms (0.02%) 1.72%
transfer_no_resume_ring_1.3_ecdsap256_aes 6.72 ms 6.72 ms 0.00 ms (0.01%) 3.60%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_aes 5.65 ms 5.65 ms 0.00 ms (0.01%) 5.60%
handshake_no_resume_ring_1.3_ecdsap256_aes 509.30 µs 509.27 µs -0.04 µs (-0.01%) 2.32%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_chacha 13.42 ms 13.42 ms -0.00 ms (-0.00%) 2.29%
transfer_no_resume_aws_lc_rs_1.3_rsa_aes 5.87 ms 5.87 ms -0.00 ms (-0.00%) 4.46%

Additional information

Historical results

Checkout details:

@cpu
Copy link
Member

cpu commented Dec 15, 2023

Have you folks taken a look at #1597 ? I haven't done a deep dive on your branch yet but I think it would be important to make sure what you're proposing and what's in the queue in #1597 can be harmonized.

@pvdrz flagging this branch as one you might be interested in as well.

@cpu cpu self-assigned this Jan 29, 2024
@cpu
Copy link
Member

cpu commented Jan 29, 2024

Have you folks taken a look at #1597 ? I haven't done a deep dive on your branch yet but I think it would be important to make sure what you're proposing and what's in the queue in #1597 can be harmonized.

@Keksoj @Wonshtrum I'm still interested in your take on this. #1597 has landed in main since I left this comment.

We would be interested in resurrecting this work but it looks like there are changes necessary (in particular in the deframer layer). I'm starting to look through what might need to be adjusted but if you had time to take a look as well your input would be appreciated.

@cpu
Copy link
Member

cpu commented Jan 30, 2024

Here's a branch I started to try and rebase this on main: https://github.com/cpu/rustls/tree/cpu-borrowed-payload

I made some minor changes along the way, none of which feel especially substantial:
  • The BorrowedPayload type from this branch is now named BorrowedPlainPayload - there's already a BorrowedPayload type from the ferrous work used for a non-owning payload in the context of opaque messages. It might make sense to name this BorrowedOpaquePayload to emphasize this, but I've avoided that change for now and renamed the new type.
  • I reordered the members of BorrowedPlainPayload to match style guide conventions
  • I employed more liberal use of Self within member BorrowedPlainPayload fns.
  • Unit tests for BorrowedPlainPayload were moved to a tests mod at the end of message.rs.
  • I implemented Into<Vec<u8>> for BorrowedPlainPayload - it already offered a to_vec() fn so this is just ergonomics.
  • Throughout I tried to explicitly write lifetimes instead of eliding them, to match style guide convention.
  • I inlined a size binding in the write_vector return

Most importantly, as it stands now my branch doesn't build because of the deframer.rs drift I mentioned in my previous comment w.r.t #1597. I haven't been able to square the RawSlice arrangement that was already flagged as problematic in previous reviews with the new borrowed plaintext payload representation where the data is spread across multiple independently borrowed slices. I've separately tried (and failed) to refactor the deframer code to try and wrangle out the RawSlice parts, so this feels like I'm hitting the same wall.

I'm very open to the possibility I'm completely on the wrong track here but as it stands I'm stuck and could use guidance.

@ctz
Copy link
Member

ctz commented Jan 31, 2024

Just a shower thought: should we change pub fn encrypt in impl<Data> WriteTraffic<'_, Data> { so application_data is &[&[u8]] instead?

At the current time, once we do #1723 we would have to implement io::Write::write_vectored by calling encrypt multiple times1 which would regress the behaviour desired in this PR.

edit: this is probably not a request for a change to this PR.

Footnotes

  1. I am ignoring allocating and copying

@Wonshtrum
Copy link
Contributor

Hello @cpu, we finished your attempt at rebasing on main, all the while encountering the same problem that you have.
We managed to get it to compile, but while pattern matching , we had to leave some branches of BorrowedPlainPayload unimplemented:

https://github.com/Wonshtrum/rustls/blob/f31f35d910a40e36c65654dbf97eafa1829b82bb/rustls/src/msgs/message.rs#L412-L427
https://github.com/Wonshtrum/rustls/blob/f31f35d910a40e36c65654dbf97eafa1829b82bb/rustls/src/msgs/deframer.rs#L164-L169

In both cases, they should be truly unreachable. This stems from the fact that we designed BorrowedPlainPayload as an outbound type but it is currently used (through BorrowedPlainMessage) for inbound data. As such BorrowedPlainPayload can hold references to fragmented regions of memory (through its Multiple variant), which makes sense as the data comes from the user and may be spread out in memory (this was our original complaint about write_vectored). But it does not make sense for incoming frames which are guaranteed to hold a single consecutive slice of bytes (coming from the Deframer internal buffer).
So in both cases, there is no proper way to handle the BorrowedPlainMessage::Multiple variant and we are guaranteed we will never have to by design.

Coming to this realization I think we have 4 different options:

  1. keep it like that, it's a bit awkward but we can document it
  2. explicitly take into account this dichotomy between inbound and outbound in the types. We would have an InboundBorrowedPayload and an OutboundBorrowedPayload and potentially similar types for messages.
  3. use a single "powerful" Payload type, merging our BorrowedPlainMessage with the current Payload, and explicitly allow the payload as potentially being fragmented everywhere, even in the code handling inbound data
  4. give up 😱

In my opinion, options 2 and 3 are the most elegant ones, but it's a lot of work for a feature originally meant to be well delimited.
We would love to have a chat about this with you folks. Do you hang out on Discord?

@cpu
Copy link
Member

cpu commented Jan 31, 2024

Hi @Wonshtrum, thanks for taking a look at this.

This stems from the fact that we designed BorrowedPlainPayload as an outbound type but it is currently used (through BorrowedPlainMessage) for inbound data.

💡 that does seem like the crux of the issues we were both running into. This is a good insight.

Coming to this realization I think we have 4 different options:

Of the four options presented (well, 3 since I don't think we should give up yet 😆) I think option 2 seems the most reasonable to me, but I'm curious what others think. I think option 1 should be a last resort, and option 3 seems like it might be more complexity than is merited.

We would love to have a chat about this with you folks. Do you hang out on Discord?

Absolutely, you can find us in the "crates" discord Djc created, in the #rustls channel.

@djc
Copy link
Member

djc commented Feb 1, 2024

I would also gravitate towards option 2 probably at least as an initial approach. If we find there's a lot of duplication, can always merge them again into something more like option 3? But I think it will be helpful in terms of understanding, too, to figure out what the needs on the receive side are vs the needs on the send side (not sure between "receive" vs "send" and "inbound" vs "outbound" which is the clearer framing? I have a slight preference to the former).

@Wonshtrum
Copy link
Contributor

Wonshtrum commented Feb 1, 2024

I also think option 2 is the cleaner one. I quickly looked into the code, it seems there is only BorrowedPlainMessage that holds a BorrowedPlainPayload, and BorrowedPlainMessage is only held by Decrypted and Deframed which should both be inbound. So I would make the current BorrowedPlainMessage exclusively an inbound type holding a single slice of u8, and create a distinct outbound equivalent holding the current BorrowedPlainPayload and use it in all the write/encrypt/send code.
I may have overlooked something, but this seems like the only duplication needed.

@Keksoj
Copy link
Contributor Author

Keksoj commented Feb 1, 2024

I agree with you all, and on top of that, option 2 seems relatively doable.

As for the naming, I would go for Inbound- and Outbound rather than Send- and Receive-, because we need to indicate the direction of the payload, rather than the action we perform with it.

But maybe there are other namings available. Maybe Incoming- and Outgoing- ?

@cpu
Copy link
Member

cpu commented Feb 1, 2024

@rustls-benchmarking bench

@cpu
Copy link
Member

cpu commented Feb 1, 2024

The last push looks good to me 👍 I think the branch needs a rustfmt pass to fix the one CI task that failed.

Copy link
Member

@cpu cpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for picking this branch up and iterating with me. I think we're just about ready to follow up with some other folks for review.

Beyond the comments I left in my review I think we should clean up some of the intermediate work in the commit history. I think the right end state would be two commits:

  1. The first, doing the BorrowedPlainMessage split into BorrowedPlainMessageInbound/BorrowedPlainMessageOutbound leaving the existing code as-is as much as possible and just introducing the new trait/types. I think this commit message should touch on the rationale for the change, e.g. that we want to handle the payload differently for one of the two w.r.t chunks.
  2. The second, implementing the borrowed payload/chunker. I think this commit message should keep the existing comment from the original first commit in this branch, but also be augmented with a bit more of the context from the PR description. It doesn't need to have all the strace output but I think it's important to mention the performance motivation.

Does that make sense to you?

rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/deframer.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
@Keksoj
Copy link
Contributor Author

Keksoj commented Feb 2, 2024

We certainly intend to edit the commit history. We thought of having two commits:

  1. chunked payload
  2. Inbound/outbound

But it is more elegant the other way around, as you suggest. Thanks for the review! We're pretty excited about this!

rustls/src/conn.rs Outdated Show resolved Hide resolved
@Wonshtrum
Copy link
Contributor

I rebased on main.
For my part, I consider this PR done.

@djc
Copy link
Member

djc commented Feb 13, 2024

Note: this change has nothing to do with my initial worries, it's just an optimization to avoid 2 bound checks.

I think this diff link is showing the opposite change of what you actually meant. I like the actual change, though!

@djc
Copy link
Member

djc commented Feb 13, 2024

Technically, none of those points break any invariants of OutboundChunks, I just don't particularly like when a state has multiple valid ways to be represented (because it usually means multiple code paths to handle them). We could constrain OutboundChunks to always instantiate the simplest variant, or we could reduce the number of possible representations (by removing Empty whose only purpose is to be simple to create, or even remove Single, whose purpose is to avoid a double pointer indirection).

I don't have a very strong opinion, but removing Empty in favor of an empty slice in Single sounds kinda appealing? Otherwise it looks like OutboundChunks provides a decent abstraction so it doesn't feel like also removing Single in favor of having a Multi with some otherwise enforced invariants would be a great improvement.

@djc djc marked this pull request as ready for review February 13, 2024 12:07
@cpu cpu removed their assignment Feb 13, 2024
rustls/src/msgs/message.rs Outdated Show resolved Hide resolved
@Wonshtrum
Copy link
Contributor

Wonshtrum commented Feb 14, 2024

I think this diff link is showing the opposite change of what you actually meant. I like the actual change, though!

Unfortunately, this is the right diff, I changed the enumerate for a plain range loop. This micro-optimization works because the plain loop couples strongly the index to the slice's length, whereas, due to how it is written in the standard lib, the enumerated index is (from the compiler's point of view) completely independent of the slice (so any indexing in the slice must be bound checked).

I don't have a very strong opinion, but removing Empty in favor of an empty slice in Single sounds kinda appealing? Otherwise it looks like OutboundChunks provides a decent abstraction so it doesn't feel like also removing Single in favor of having a Multi with some otherwise enforced invariants would be a great improvement.

I removed Empty from OutboundChunks. The code can remain sensibly the same (removing just the corresponding match arms). But I wanted to try two variations with less invariants.

First I removed the "at least two slices in Multiple" invariant. This simplifies the split_at which now avoids "downgrading" to Single. The private with_cursors constructor can be removed. The copy_to_vec has to be modified to handle the case where Multiple holds only one chunk with both cursors in it.

pub fn copy_to_vec(&self, vec: &mut Vec<u8>) {
    match self {
        Self::Single(chunk) => vec.extend_from_slice(chunk),
        Self::Multiple { chunks, start, end } => {
            let mut size = 0;
            let last = chunks.len() - 1;
            for (i, chunk) in chunks.iter().enumerate() {
                let start = if i == 0 { *start } else { 0 };
                let end = if i == last { end - size } else { chunk.len() };
                vec.extend_from_slice(&chunk[start..end]);
                size += chunk.len();
            }
        }
    }
}
pub fn split_at(&self, mid: usize) -> (Self, Self) {
    match self {
        Self::Single(chunk) => {
            if chunk.len() <= mid { return (self.clone(), Self::Single(&[]); }
            (Self::Single(&chunk[..mid]), Self::Single(&chunk[mid..]))
        }
        Self::Multiple { chunks, start, end } => {
            for i in 0..chunks.len() {
                if size + chunks[i].len() >= (mid + start) {
                    return (
                        Self::Multiple { chunks: chunks[..=i], start: *start, end: start + mid },
                        Self::Multiple { chunks: chunks[i..], start: start + mid, end: *end },
                    );
                }
                size += chunks[i].len();
            }
            (self.clone(), Self::new_empty())
        }
    }
}

Then I removed the "start must be in the first chunk" and "end must be in the last chunk" invariants. The only invariant this version assumes is that "start is smaller than end". This simplifies again split_at which is now O(1) as it only has to change the cursors of Multiple. Again copy_to_vec takes a bit more complexity as we don't know anymore in which chunks the cursors point.

pub fn copy_to_vec(&self, vec: &mut Vec<u8>) {
    match self {
        Self::Single(chunk) => vec.extend_from_slice(chunk),
        Self::Multiple { chunks, start, end } => {
            let mut size = 0;
            for chunk in *chunks {
                let psize = size;
                let len = chunk.len();
                size += len;
                if size <= *start || psize >= *end { continue; }
                let start = if psize < *start { start - psize } else { 0 };
                let end = if end - psize < len { end - psize } else { len };
                vec.extend_from_slice(&chunk[start..end]);
            }
        }
    }
}
pub fn split_at(&self, mid: usize) -> (Self, Self) {
    match self {
        Self::Single(chunk) => {
            if chunk.len() <= mid { return (self.clone(), Self::Single(&[]); }
            (Self::Single(&chunk[..mid]), Self::Single(&chunk[mid..]))
        }
        Self::Multiple { chunks, start, end } => (
            Self::Multiple { chunks, start: *start, end: start + mid },
            Self::Multiple { chunks, start: start + mid, end: *end },
        )
    }
}

It's hard to create a representative benchmark for OutboundChunks, but these two propositions seem a little faster than the current implementation. The second produces a smaller assembly, but they seem equivalent in performance.

@djc
Copy link
Member

djc commented Feb 14, 2024

It's hard to create a representative benchmark for OutboundChunks, but these two propositions seem a little faster than the current implementation. The second produces a smaller assembly, but they seem equivalent in performance.

This sounds good to me! Thanks for diving in some more. Maybe split_at() for Single can be (Self::Single(&chunk[..Ord::min(mid, chunk.len())]), Self::Single(&chunk[mid..]))? Avoids a bit of repetition.

@Wonshtrum Wonshtrum force-pushed the borrowed-payload branch 2 times, most recently from 9c8197e to de1fdf7 Compare February 15, 2024 07:02
@djc
Copy link
Member

djc commented Feb 15, 2024

@Wonshtrum one more clippy issue to address!

Also looks like our external types workflow check broke, so we'll need to look into that (but not need to block this PR I think).

@Wonshtrum
Copy link
Contributor

Sorry about that.
Fixed and rebased on main.

@cpu
Copy link
Member

cpu commented Feb 15, 2024

Also looks like our external types workflow check broke, so we'll need to look into that (but not need to block this PR I think).

This was fixed in main w/ #1791

I think we probably just need Ctz to do another pass on this branch and then it's ready for merge? I suspect based on OOB conversations this will be doable early next week 🤞

@djc djc enabled auto-merge February 15, 2024 22:22
Wonshtrum and others added 3 commits February 16, 2024 08:56
Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>
The ConnectionCommon<T>::write_vectored was implemented by processing
each chunk, fragmenting them and wrapping each fragment in a
OutboundMessage before encrypting and sending it as separate TLS frames.
For very fragmented payloads this generates a lot of very small payloads
with most of the data being TLS headers.

OutboundChunks can contain an arbitrary amount of fragmented chunks.
This allows write_vectored to process all its chunks at once,
fragmenting it in place if needed and wrapping it in a OutboundMessage.
All the chunks are merged in a contiguous vector (taking atvantage of an
already existent copy) before being encrypted and sent as a single TLS
frame.

Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>
Co-Authored-By: Emmanuel Bosquet <bjokac@gmail.com>
Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>
auto-merge was automatically disabled February 16, 2024 07:56

Head branch was pushed to by a user without write access

@djc djc enabled auto-merge February 16, 2024 09:01
@djc djc added this pull request to the merge queue Feb 16, 2024
Merged via the queue into rustls:main with commit cf09842 Feb 16, 2024
23 checks passed
@djc
Copy link
Member

djc commented Feb 16, 2024

@Wonshtrum thanks for all your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance_enhancement Pull requests that should improve performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants