Limit fragmentation in `write_vectored` #1640

Keksoj · 2023-11-29T10:34:06Z

We, Wonshtrum and myself, work at Clever Cloud on a custom-made reverse proxy called Sōzu. We switched from using both OpenSSL and Rustls to being Rustls-only, one year ago, and we are very happy about it. We appreciate the security orientation of Rustls, as well as its native Rust nature.

Our issue

In Sōzu, our reverse proxy, we strive to optimize system calls and throughput. Here is the syscall we found with strace when transmitting a simple HTTPS response to a client:

writev(
    11,
    [
        {iov_base="\27\3\3\0\31\210H\371\252\25w\304\275\346u\27\371\334g\345\244\371\331\203c\302\356\324i\367", iov_len=30},
        {iov_base="\27\3\3\0\22\366IR\236\357\374\37\30\310E\330Xr\3249\24r0", iov_len=23},
        {iov_base="\27\3\3\0\24\326\337\223\356R\360\37\215\343a\2L\236\30\24D\31\10\363\16", iov_len=25},
        {iov_base="\27\3\3\0\22\347\305\255\371?\224\6\212\325\345\250*a\250\321i\366\266", iov_len=23},
        {iov_base="\27\3\3\0\23\37\24Ex\334\vHp\2758\207\27G\306\16\226\253\320\240", iov_len=24},
        {iov_base="\27\3\3\0\23\23\362q\352\307\303\364\312]\361(\227<\17\334Y\333g\310", iov_len=24},
        {iov_base="\27\3\3\0\37\216\332\374\376\303H\206\35f\334\310\252\375\341\366\224\261*\367\30mg\\\230dbh"..., iov_len=36},
        {iov_base="\27\3\3\0\23-\312\31\v\353\212\303\265\231H\371\356%\317JX\32Tg", iov_len=24},
        {iov_base="\27\3\3\0\23d\356\0276\3236\252\206^5\346=\234F\2\200\33\314>", iov_len=24},
        {iov_base="\27\3\3\0\23U\30\37x7\5d\304/ZY\274\25\17;\276t]\216", iov_len=24},
        {iov_base="\27\3\3\0\25\240\371q\2416Z\202\6\35\311|\203Bai-\20\217=\3516", iov_len=26},
        {iov_base="\27\3\3\0\23\220\350\323\321bx\321\21:\35\203?\257\313)\200\364E\377", iov_len=24},
        {iov_base="\27\3\3\0.\226\254\30\240\315\307o\250\243b`Q\n\17\226\333\347\256[\331\324#~/\240\206,"..., iov_len=51},
        {iov_base="\27\3\3\0\23\251\207:\350?/\252Z\317\322\0017\24E\256\5*\342\224", iov_len=24},
        {iov_base="\27\3\3\0\30$w0!\205\310\276\372k\204?\375k\334\334\363\314\\!\355\266\0\362\33", iov_len=29},
        {iov_base="\27\3\3\0\23\257\277rj;\3\376\251\10\310\37\265\365\303\216\260\345\361\366", iov_len=24},
        {iov_base="\27\3\3\0+\341\0313\277'\223\3\272W&\316\306\360BYI9\225Vh\347t\3}\303\321\343"..., iov_len=48},
        {iov_base="\27\3\3\0\23\247\33\26M\255%\351\214F\254\177\250+z\253]\17F\325", iov_len=24},
        {iov_base="\27\3\3\0\23fm\262t\334\271>\226\225ITia\250,\31^\37\f", iov_len=24},
        {iov_base="\27\3\3\0\33S\25/zM'\343\302L/l\\\1\22\250M\0AL\357\304\36\227\317\322Q\365", iov_len=32}
    ],
    20
) = 563

What we see here is a number of small pieces of data, headers mostly, being encoded into TLS frames, one at a time. The \27\3\3\0 pattern is the beginning of a TLS frame.
This repetition is a waste of bandwitdth.

Our investigation

The system call above is due to how the Writer's write_vectored method encodes the chunks one by one.

In our case, our reverse proxy creates a number of smaller chunks. Half of the throughput is monopolized by TLS headers.

Our proposal

We could have solved our issue on the proxy side, by accumulating chunks and calling write() instead of calling write_vectored, but we seek to avoid dynamic memory allocation and copying.

In this pull request on Rustls, instead of treating the chunks individually in write_vectored, we suggest to pass them down together in a BorrowedPayload form, until they are encrypted in one single TLS frame (or several if the payload exceeds MAX_FRAGMENT_SIZE). This new type BorrowedPayload does not copy the data but manipulates slices of bytes.

Syscall improvement

This is the same response as above: a simple "hi" and some headers.

writev(
    14,
    [
        {iov_base="\27\3\3\0\214\376h\202\316\333w\252\25\24\375\354\254\24p\343\320\320\311\21\340\315B\367\327\342\225\340"..., iov_len=145}
    ],
    1
) = 145

We can see that there is only one iov_base (corresponding to 1 TLS frame), instead of many prior.
The syscall has dimished in size, from 563 to 145.

Benchmarks

Our modification reduces bandwidth usage significantly when benchmarking our reverse proxy.

These benchmarks run on a simple desktop machine, with 4 backends to route HTTPS traffic to. The backends reply with a simple "hi" and some headers.

Sōzu + Rustls 0.21.9

bombardier -c 450 -n 100000 https://localhost:8443/api -l

Statistics        Avg      Stdev        Max
  Reqs/sec      7973.06    4220.70   18732.68
  Latency       56.68ms    69.91ms      1.15s
  Latency Distribution
     50%    51.98ms
     75%    54.38ms
     90%    56.90ms
     95%    58.64ms
     99%    61.33ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     5.01MB/s

Sōzu + Rustls (this branch)

bombardier -c 450 -n 100000 https://localhost:8443/api -l                                             
Bombarding https://localhost:8443/api with 100000 request(s) using 450 connection(s)

Statistics        Avg      Stdev        Max
  Reqs/sec      8616.52    5132.64   22453.38
  Latency       52.58ms    64.24ms      1.04s
  Latency Distribution
     50%    47.85ms
     75%    50.40ms
     90%    53.53ms
     95%    55.80ms
     99%    76.39ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     2.00MB/s

Discussion

The strace and the benchmark suggest that our addition to the rustls codebase diminishes bandwidth use and improves overall performance in a meaningful way. Requests per second, and latency, are the key metrics for us.

Diving in Rustls, we found a number of memory allocation that could be avoided. If you would like, we would be more than happy to work with you in this direction.

(edited for clarity)

codecov · 2023-11-29T14:31:39Z

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (1cdb10f) 95.94% compared to head (71746d6) 95.95%.

Files	Patch %	Lines
rustls/src/msgs/message.rs	96.75%	7 Missing ⚠️
rustls/src/conn.rs	88.23%	2 Missing ⚠️
rustls/src/crypto/cipher.rs	0.00%	2 Missing ⚠️
rustls/src/common_state.rs	97.43%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #1640    +/-   ##
========================================
  Coverage   95.94%   95.95%            
========================================
  Files          81       81            
  Lines       18590    18827   +237     
========================================
+ Hits        17837    18065   +228     
- Misses        753      762     +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Wonshtrum · 2023-11-29T16:14:31Z

On the topic of allocation/copy, we put a lot of effort into Sozu to handle HTTP messages (parsing and proxying) in zero-copy in most cases. Very few dynamic allocations are also necessary, as Sozu preallocates and reuses a pool of buffers.

Analyzing the Rustls codebase, we found that in order to encrypt a payload, the entire plaintext message is copied by MessageEncrypter::encrypt, then the entire encrypted message is copied by OpaqueMessage::encode. Both copies require a memory allocation.

As explained in this PR description, aggregating the vector chunks on our side would mean a third allocation and copy. Instead, we placed the responsibility of aggregation on the copy of MessageEncrypter::encrypt. This copy could also be avoided if the ciphers accepted fragmented payloads, but it would be a lot more complicated.

On the other hand, OpaqueMessage::encode only appends the 5-bytes "TLS header". Allocating and copying an entire payload (up to 16Kb) seems wasteful and could be completely avoided if the payload built by MessageEncrypted::encrypt accounted for the 5-bytes prelude. OpaqueMessage::encode would only have to fill in the reserved space.

My remarks are not directly related to this PR, but to the overall talk on memory allocation. As @Keksoj said we are eager to contribute and would happily make this change as well.

djc · 2023-11-29T21:24:57Z

On the other hand, OpaqueMessage::encode only appends the 5-bytes "TLS header". Allocating and copying an entire payload (up to 16Kb) seems wasteful and could be completely avoided if the payload built by MessageEncrypted::encrypt accounted for the 5-bytes prelude. OpaqueMessage::encode would only have to fill in the reserved space.

My remarks are not directly related to this PR, but to the overall talk on memory allocation. As @Keksoj said we are eager to contribute and would happily make this change as well.

Thanks for the investigation! We will review your changes shortly -- meanwhile, I just wanted to acknowledge that we'd be happy to review a PR for this, too.

paolobarbolini

I'm too interested in this change. I've left a couple of comments (mostly nitpicks)

rustls/src/msgs/fragmenter.rs

rustls/src/msgs/message.rs

rustls/src/msgs/fragmenter.rs

Keksoj · 2023-12-04T15:43:01Z

Thanks for the reviews! We appreciate your reception.

We reproduced the icount test of the CI locally and found a lot of variance. Should we take them into account?

ctz · 2023-12-04T17:28:03Z

Allocating and copying an entire payload (up to 16Kb) seems wasteful and could be completely avoided if the payload built by MessageEncrypted::encrypt accounted for the 5-bytes prelude. OpaqueMessage::encode would only have to fill in the reserved space.

I did a quick try of that, and it improves our send-direction transfer benchmarks by almost 20% in some cases:

Scenario	Baseline	Candidate	Diff
transfer_no_resume_ring_1.2_rsa_aes_server	57033726	46061663	✅ -10972063 (-19.24%)
transfer_no_resume_ring_1.3_rsa_aes_server	57175449	46178199	✅ -10997250 (-19.23%)
transfer_no_resume_ring_1.3_ecdsa_aes_server	57155204	46178565	✅ -10976639 (-19.20%)
transfer_no_resume_aws_lc_rs_1.3_ecdsa_aes_server	57272697	46282632	✅ -10990065 (-19.19%)
transfer_no_resume_ring_1.3_rsa_chacha_server	91325257	80328043	✅ -10997214 (-12.04%)
transfer_no_resume_ring_1.3_ecdsa_chacha_server	91304163	80328311	✅ -10975852 (-12.02%)
transfer_no_resume_aws_lc_rs_1.3_ecdsa_chacha_server	91413941	80438723	✅ -10975218 (-12.01%)

So this is certainly worth doing!

cpu · 2023-12-13T15:27:45Z

@rustls-benchmarking bench

(experimenting with our new benchmarking integration)

rustls-benchmarking · 2023-12-13T15:30:02Z

Benchmark results

Instruction counts

Significant differences

There are no significant instruction count differences

Other differences

Click to expand

Scenario	Baseline	Candidate	Diff	Threshold
handshake_tickets_aws_lc_rs_1.2_rsa_aes_server	4590737	4560942	-29795 (-0.65%)	3.16%
handshake_no_resume_aws_lc_rs_1.2_rsa_aes_server	12293739	12325094	31355 (0.26%)	1.01%
handshake_tickets_aws_lc_rs_1.2_rsa_aes_client	4564442	4574736	10294 (0.23%)	0.98%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_aes_client	8663122	8646583	-16539 (-0.19%)	0.93%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_aes_client	30491449	30544172	52723 (0.17%)	0.47%
handshake_session_id_ring_1.2_rsa_aes_server	4367526	4374272	6746 (0.15%)	0.53%
handshake_no_resume_aws_lc_rs_1.3_rsa_chacha_server	12704027	12723415	19388 (0.15%)	0.94%
handshake_tickets_ring_1.3_ecdsap256_aes_server	43835781	43896990	61209 (0.14%)	0.26%
handshake_no_resume_aws_lc_rs_1.3_rsa_aes_server	12671839	12686304	14465 (0.11%)	0.71%
handshake_tickets_aws_lc_rs_1.3_rsa_aes_server	32863297	32899864	36567 (0.11%)	0.49%
handshake_no_resume_ring_1.3_ecdsap256_chacha_client	3899263	3895001	-4262 (-0.11%)	0.20%
handshake_tickets_aws_lc_rs_1.3_rsa_chacha_server	32824782	32860266	35484 (0.11%)	0.60%
handshake_tickets_ring_1.3_ecdsap384_chacha_server	43819484	43863472	43988 (0.10%)	0.25%
handshake_tickets_ring_1.2_rsa_aes_server	4832447	4837241	4794 (0.10%)	0.57%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_aes_server	32834033	32864817	30784 (0.09%)	0.20%
handshake_session_id_aws_lc_rs_1.3_rsa_chacha_server	32582475	32612657	30182 (0.09%)	0.42%
handshake_tickets_ring_1.3_ecdsap384_aes_server	43871501	43835516	-35985 (-0.08%)	0.21%
handshake_session_id_aws_lc_rs_1.3_rsa_aes_client	30484458	30509201	24743 (0.08%)	0.20%
handshake_session_id_aws_lc_rs_1.2_rsa_aes_server	4071168	4067907	-3261 (-0.08%)	4.06%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_aes_client	30332658	30356395	23737 (0.08%)	0.45%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_aes_client	57967203	58011946	44743 (0.08%)	0.20%
handshake_session_id_ring_1.3_ecdsap384_chacha_server	43580015	43611159	31144 (0.07%)	0.20%
handshake_tickets_ring_1.3_ecdsap256_chacha_server	43837838	43807608	-30230 (-0.07%)	0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_chacha_server	32820502	32842802	22300 (0.07%)	0.27%
handshake_session_id_ring_1.2_rsa_aes_client	4471669	4474635	2966 (0.07%)	0.63%
handshake_tickets_ring_1.3_rsa_chacha_client	42325016	42352942	27926 (0.07%)	0.20%
handshake_no_resume_ring_1.3_ecdsap256_aes_client	3894383	3896878	2495 (0.06%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_rsa_aes_server	57167091	57130506	-36585 (-0.06%)	0.33%
handshake_no_resume_ring_1.3_ecdsap256_aes_server	2127138	2128477	1339 (0.06%)	0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_chacha_server	32829016	32849434	20418 (0.06%)	0.32%
transfer_no_resume_aws_lc_rs_1.3_rsa_chacha_server	91374867	91326136	-48731 (-0.05%)	0.20%
transfer_no_resume_ring_1.2_rsa_aes_server	56972538	57001453	28915 (0.05%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_server	91348845	91394313	45468 (0.05%)	0.20%
handshake_session_id_aws_lc_rs_1.3_rsa_chacha_client	30474239	30489165	14926 (0.05%)	0.20%
handshake_tickets_ring_1.3_rsa_aes_client	42359337	42379877	20540 (0.05%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_aes_server	57137375	57163211	25836 (0.05%)	0.20%
transfer_no_resume_ring_1.3_ecdsap384_aes_server	57091508	57117296	25788 (0.05%)	0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_aes_client	30496059	30509400	13341 (0.04%)	0.20%
transfer_no_resume_ring_1.3_rsa_aes_server	57088057	57112306	24249 (0.04%)	0.20%
transfer_no_resume_ring_1.3_ecdsap256_aes_server	57128902	57152722	23820 (0.04%)	0.26%
handshake_tickets_ring_1.3_ecdsap384_chacha_client	42133767	42151271	17504 (0.04%)	0.20%
handshake_session_id_ring_1.3_ecdsap256_aes_client	41995600	42011751	16151 (0.04%)	0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_aes_server	32837319	32849600	12281 (0.04%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_aes_client	3343377	3342141	-1236 (-0.04%)	0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_chacha_server	32610835	32622453	11618 (0.04%)	0.21%
transfer_no_resume_aws_lc_rs_1.2_rsa_aes_server	57153803	57174060	20257 (0.04%)	0.27%
handshake_tickets_ring_1.3_rsa_chacha_server	43899232	43883683	-15549 (-0.04%)	0.20%
handshake_tickets_ring_1.3_ecdsap256_chacha_client	42138658	42152912	14254 (0.03%)	0.20%
handshake_tickets_ring_1.2_rsa_aes_client	4737980	4739541	1561 (0.03%)	0.75%
handshake_session_id_ring_1.3_ecdsap256_chacha_client	41953755	41967230	13475 (0.03%)	0.20%
handshake_session_id_ring_1.3_ecdsap384_chacha_client	41950209	41963615	13406 (0.03%)	0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_aes_client	30311632	30321289	9657 (0.03%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_client	92455342	92426064	-29278 (-0.03%)	0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_aes_server	32627438	32637402	9964 (0.03%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_server	91348275	91374836	26561 (0.03%)	0.20%
transfer_no_resume_ring_1.3_rsa_chacha_server	91251928	91277084	25156 (0.03%)	0.20%
transfer_no_resume_ring_1.3_ecdsap256_chacha_server	91254388	91279184	24796 (0.03%)	0.20%
transfer_no_resume_ring_1.3_ecdsap384_chacha_server	91256571	91281248	24677 (0.03%)	0.20%
handshake_tickets_ring_1.3_rsa_aes_server	43916979	43928539	11560 (0.03%)	0.20%
handshake_session_id_ring_1.3_ecdsap384_aes_server	43665269	43676635	11366 (0.03%)	0.20%
handshake_tickets_ring_1.3_ecdsap384_aes_client	42184600	42173794	-10806 (-0.03%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_aes_server	57178748	57164564	-14184 (-0.02%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_client	8667499	8665359	-2140 (-0.02%)	1.29%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_chacha_server	32602433	32610354	7921 (0.02%)	0.30%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_chacha_client	30495316	30502290	6974 (0.02%)	0.20%
handshake_session_id_ring_1.3_rsa_chacha_client	42140074	42149621	9547 (0.02%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_chacha_server	4265002	4265934	932 (0.02%)	0.20%
handshake_session_id_ring_1.3_rsa_aes_client	42197529	42206375	8846 (0.02%)	0.20%
handshake_session_id_aws_lc_rs_1.3_rsa_aes_server	32663301	32670065	6764 (0.02%)	0.46%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_aes_server	32633008	32639516	6508 (0.02%)	0.20%
handshake_session_id_ring_1.3_rsa_chacha_server	43600412	43606868	6456 (0.01%)	0.20%
handshake_session_id_ring_1.3_rsa_aes_server	43653759	43659934	6175 (0.01%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_server	1886159	1885898	-261 (-0.01%)	0.20%
handshake_tickets_ring_1.3_ecdsap256_aes_client	42181225	42175780	-5445 (-0.01%)	0.20%
handshake_tickets_aws_lc_rs_1.3_rsa_chacha_client	30700103	30704058	3955 (0.01%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_aes_client	57965185	57971116	5931 (0.01%)	0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_chacha_client	30306814	30309866	3052 (0.01%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_client	3346308	3345976	-332 (-0.01%)	0.28%
handshake_session_id_ring_1.3_ecdsap384_aes_client	42004809	42008854	4045 (0.01%)	0.20%
handshake_no_resume_aws_lc_rs_1.2_rsa_aes_client	3153790	3153498	-292 (-0.01%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_aes_server	1882598	1882752	154 (0.01%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_rsa_chacha_client	3367417	3367692	275 (0.01%)	0.20%
handshake_session_id_aws_lc_rs_1.2_rsa_aes_client	4238519	4238860	341 (0.01%)	0.93%
handshake_no_resume_ring_1.2_rsa_aes_client	4441422	4441773	351 (0.01%)	0.20%
handshake_no_resume_ring_1.3_ecdsap384_aes_server	13738133	13737096	-1037 (-0.01%)	0.20%
handshake_session_id_ring_1.3_ecdsap256_aes_server	43658394	43655136	-3258 (-0.01%)	0.20%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_chacha_client	30308895	30306829	-2066 (-0.01%)	0.35%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_aes_server	4260530	4260785	255 (0.01%)	0.20%
transfer_no_resume_ring_1.3_rsa_aes_client	57949782	57952359	2577 (0.00%)	0.20%
handshake_session_id_ring_1.3_ecdsap256_chacha_server	43591462	43593100	1638 (0.00%)	0.20%
transfer_no_resume_ring_1.3_ecdsap256_chacha_client	92382921	92386348	3427 (0.00%)	0.20%
handshake_no_resume_aws_lc_rs_1.3_rsa_aes_client	3356580	3356466	-114 (-0.00%)	0.20%
handshake_no_resume_ring_1.2_rsa_aes_server	12045856	12046244	388 (0.00%)	0.20%
handshake_tickets_aws_lc_rs_1.3_rsa_aes_client	30710953	30710030	-923 (-0.00%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_rsa_aes_client	57989385	57991068	1683 (0.00%)	0.20%
transfer_no_resume_ring_1.3_ecdsap384_aes_client	57950111	57951530	1419 (0.00%)	0.20%
transfer_no_resume_ring_1.3_rsa_chacha_client	92390590	92388348	-2242 (-0.00%)	0.20%
handshake_no_resume_ring_1.3_rsa_chacha_server	12250178	12250463	285 (0.00%)	0.20%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_chacha_client	30483095	30483640	545 (0.00%)	0.30%
handshake_no_resume_ring_1.3_rsa_aes_server	12240049	12240250	201 (0.00%)	0.20%
handshake_no_resume_ring_1.3_rsa_aes_client	4537988	4538061	73 (0.00%)	0.20%
handshake_no_resume_ring_1.3_rsa_chacha_client	4547847	4547779	-68 (-0.00%)	0.20%
transfer_no_resume_ring_1.3_ecdsap256_aes_client	57950661	57949876	-785 (-0.00%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_rsa_chacha_client	92443304	92444522	1218 (0.00%)	0.20%
transfer_no_resume_ring_1.3_ecdsap384_chacha_client	92385644	92386860	1216 (0.00%)	0.20%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_chacha_client	92438276	92439269	993 (0.00%)	0.20%
handshake_no_resume_ring_1.3_ecdsap256_chacha_server	2131421	2131403	-18 (-0.00%)	0.21%
transfer_no_resume_ring_1.2_rsa_aes_client	57811354	57810914	-440 (-0.00%)	0.20%
handshake_no_resume_ring_1.3_ecdsap384_aes_client	35451509	35451391	-118 (-0.00%)	0.20%
transfer_no_resume_aws_lc_rs_1.2_rsa_aes_client	68432293	68432434	141 (0.00%)	0.20%
handshake_no_resume_ring_1.3_ecdsap384_chacha_client	35454465	35454437	-28 (-0.00%)	0.20%
handshake_no_resume_ring_1.3_ecdsap384_chacha_server	13740738	13740732	-6 (-0.00%)	0.20%

Wall-time

Significant differences

⚠️ There are significant wall-time differences

Click to expand

Scenario	Baseline	Candidate	Diff	Threshold
handshake_session_id_ring_1.3_ecdsap256_chacha	6.83 ms	6.92 ms	⚠️ 0.09 ms (1.38%)	1.21%
handshake_tickets_ring_1.3_ecdsap256_chacha	6.83 ms	6.92 ms	⚠️ 0.09 ms (1.31%)	1.26%
handshake_session_id_ring_1.3_rsa_chacha	7.46 ms	7.55 ms	⚠️ 0.09 ms (1.25%)	1.23%
handshake_tickets_ring_1.3_rsa_chacha	7.47 ms	7.56 ms	⚠️ 0.09 ms (1.14%)	1.11%

Other differences

Click to expand

Scenario	Baseline	Candidate	Diff	Threshold
handshake_tickets_aws_lc_rs_1.3_ecdsap256_aes	5.38 ms	5.45 ms	0.08 ms (1.40%)	2.49%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_aes	5.36 ms	5.44 ms	0.07 ms (1.36%)	2.24%
handshake_tickets_aws_lc_rs_1.3_ecdsap256_chacha	5.38 ms	5.45 ms	0.07 ms (1.28%)	2.43%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_aes	6.08 ms	6.15 ms	0.07 ms (1.16%)	2.19%
handshake_session_id_aws_lc_rs_1.3_ecdsap256_chacha	5.35 ms	5.41 ms	0.06 ms (1.16%)	1.71%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_aes	6.10 ms	6.17 ms	0.07 ms (1.13%)	1.91%
handshake_tickets_aws_lc_rs_1.3_rsa_chacha	6.39 ms	6.46 ms	0.07 ms (1.09%)	1.87%
handshake_tickets_aws_lc_rs_1.3_rsa_aes	6.38 ms	6.45 ms	0.07 ms (1.08%)	2.03%
handshake_session_id_aws_lc_rs_1.3_rsa_aes	6.37 ms	6.44 ms	0.07 ms (1.05%)	1.92%
handshake_session_id_ring_1.3_ecdsap384_chacha	9.92 ms	10.03 ms	0.10 ms (1.05%)	1.18%
handshake_tickets_aws_lc_rs_1.3_ecdsap384_chacha	6.10 ms	6.16 ms	0.06 ms (1.02%)	1.90%
handshake_session_id_ring_1.3_ecdsap256_aes	6.86 ms	6.93 ms	0.07 ms (0.99%)	1.22%
handshake_session_id_ring_1.3_rsa_aes	7.50 ms	7.57 ms	0.07 ms (0.98%)	1.00%
handshake_tickets_ring_1.3_ecdsap256_aes	6.88 ms	6.94 ms	0.07 ms (0.96%)	1.07%
handshake_session_id_aws_lc_rs_1.3_ecdsap384_chacha	6.05 ms	6.11 ms	0.06 ms (0.96%)	1.40%
handshake_tickets_ring_1.3_ecdsap384_chacha	9.94 ms	10.03 ms	0.09 ms (0.95%)	1.00%
handshake_tickets_ring_1.3_rsa_aes	7.51 ms	7.58 ms	0.07 ms (0.92%)	1.09%
handshake_session_id_aws_lc_rs_1.3_rsa_chacha	6.35 ms	6.41 ms	0.06 ms (0.89%)	1.23%
handshake_tickets_ring_1.3_ecdsap384_aes	9.98 ms	10.06 ms	0.08 ms (0.82%)	1.00%
handshake_session_id_ring_1.3_ecdsap384_aes	9.95 ms	10.03 ms	0.08 ms (0.80%)	1.00%
handshake_tickets_aws_lc_rs_1.2_rsa_aes	2.35 ms	2.36 ms	0.01 ms (0.53%)	1.56%
handshake_no_resume_ring_1.3_ecdsap384_chacha	3.61 ms	3.62 ms	0.01 ms (0.37%)	1.00%
handshake_session_id_ring_1.2_rsa_aes	1.75 ms	1.74 ms	-0.01 ms (-0.37%)	1.56%
handshake_no_resume_ring_1.3_ecdsap384_aes	3.61 ms	3.62 ms	0.01 ms (0.30%)	1.00%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_aes	479.17 µs	477.81 µs	-1.36 µs (-0.28%)	3.35%
handshake_no_resume_aws_lc_rs_1.3_ecdsap256_chacha	478.40 µs	477.13 µs	-1.27 µs (-0.27%)	3.78%
handshake_no_resume_ring_1.2_rsa_aes	1.08 ms	1.07 ms	-0.00 ms (-0.25%)	1.15%
handshake_no_resume_ring_1.3_ecdsap256_chacha	507.41 µs	508.64 µs	1.24 µs (0.24%)	2.26%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_aes	1.19 ms	1.19 ms	-0.00 ms (-0.23%)	1.00%
transfer_no_resume_aws_lc_rs_1.2_rsa_aes	5.86 ms	5.87 ms	0.01 ms (0.17%)	3.26%
handshake_tickets_ring_1.2_rsa_aes	1.84 ms	1.84 ms	0.00 ms (0.14%)	1.54%
handshake_session_id_aws_lc_rs_1.2_rsa_aes	2.19 ms	2.19 ms	0.00 ms (0.12%)	1.00%
transfer_no_resume_ring_1.3_rsa_aes	7.30 ms	7.29 ms	-0.01 ms (-0.11%)	3.30%
transfer_no_resume_ring_1.3_ecdsap384_chacha	16.53 ms	16.54 ms	0.02 ms (0.11%)	1.71%
transfer_no_resume_ring_1.3_ecdsap384_aes	9.83 ms	9.84 ms	0.01 ms (0.11%)	2.08%
handshake_no_resume_ring_1.3_rsa_aes	1.08 ms	1.08 ms	-0.00 ms (-0.10%)	1.00%
transfer_no_resume_ring_1.2_rsa_aes	7.22 ms	7.21 ms	-0.01 ms (-0.10%)	2.60%
transfer_no_resume_ring_1.3_rsa_chacha	14.00 ms	14.02 ms	0.01 ms (0.09%)	2.08%
handshake_no_resume_aws_lc_rs_1.3_ecdsap384_chacha	1.18 ms	1.18 ms	-0.00 ms (-0.09%)	1.35%
handshake_no_resume_aws_lc_rs_1.3_rsa_chacha	1.41 ms	1.41 ms	0.00 ms (0.08%)	1.00%
transfer_no_resume_ring_1.3_ecdsap256_chacha	13.42 ms	13.43 ms	0.01 ms (0.08%)	1.96%
handshake_no_resume_aws_lc_rs_1.3_rsa_aes	1.42 ms	1.42 ms	-0.00 ms (-0.06%)	1.00%
handshake_no_resume_aws_lc_rs_1.2_rsa_aes	1.36 ms	1.37 ms	0.00 ms (0.04%)	1.47%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_aes	4.93 ms	4.93 ms	0.00 ms (0.04%)	7.05%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_chacha	14.13 ms	14.13 ms	0.00 ms (0.03%)	2.19%
handshake_no_resume_ring_1.3_rsa_chacha	1.09 ms	1.09 ms	0.00 ms (0.02%)	1.00%
transfer_no_resume_aws_lc_rs_1.3_rsa_chacha	14.36 ms	14.37 ms	0.00 ms (0.02%)	1.72%
transfer_no_resume_ring_1.3_ecdsap256_aes	6.72 ms	6.72 ms	0.00 ms (0.01%)	3.60%
transfer_no_resume_aws_lc_rs_1.3_ecdsap384_aes	5.65 ms	5.65 ms	0.00 ms (0.01%)	5.60%
handshake_no_resume_ring_1.3_ecdsap256_aes	509.30 µs	509.27 µs	-0.04 µs (-0.01%)	2.32%
transfer_no_resume_aws_lc_rs_1.3_ecdsap256_chacha	13.42 ms	13.42 ms	-0.00 ms (-0.00%)	2.29%
transfer_no_resume_aws_lc_rs_1.3_rsa_aes	5.87 ms	5.87 ms	-0.00 ms (-0.00%)	4.46%

Additional information

Historical results

Checkout details:

Base repo: https://github.com/rustls/rustls.git
Base branch: main (563c5c1)
Candidate repo: https://github.com/Wonshtrum/rustls.git
Candidate branch: borrowed-payload (e9bfe40)

cpu · 2023-12-15T22:29:05Z

Have you folks taken a look at #1597 ? I haven't done a deep dive on your branch yet but I think it would be important to make sure what you're proposing and what's in the queue in #1597 can be harmonized.

@pvdrz flagging this branch as one you might be interested in as well.

cpu · 2024-01-29T18:48:37Z

Have you folks taken a look at #1597 ? I haven't done a deep dive on your branch yet but I think it would be important to make sure what you're proposing and what's in the queue in #1597 can be harmonized.

@Keksoj @Wonshtrum I'm still interested in your take on this. #1597 has landed in main since I left this comment.

We would be interested in resurrecting this work but it looks like there are changes necessary (in particular in the deframer layer). I'm starting to look through what might need to be adjusted but if you had time to take a look as well your input would be appreciated.

cpu · 2024-01-30T20:28:57Z

Here's a branch I started to try and rebase this on main: https://github.com/cpu/rustls/tree/cpu-borrowed-payload

I made some minor changes along the way, none of which feel especially substantial:

The BorrowedPayload type from this branch is now named BorrowedPlainPayload - there's already a BorrowedPayload type from the ferrous work used for a non-owning payload in the context of opaque messages. It might make sense to name this BorrowedOpaquePayload to emphasize this, but I've avoided that change for now and renamed the new type.
I reordered the members of BorrowedPlainPayload to match style guide conventions
I employed more liberal use of Self within member BorrowedPlainPayload fns.
Unit tests for BorrowedPlainPayload were moved to a tests mod at the end of message.rs.
I implemented Into<Vec<u8>> for BorrowedPlainPayload - it already offered a to_vec() fn so this is just ergonomics.
Throughout I tried to explicitly write lifetimes instead of eliding them, to match style guide convention.
I inlined a size binding in the write_vector return

Most importantly, as it stands now my branch doesn't build because of the deframer.rs drift I mentioned in my previous comment w.r.t #1597. I haven't been able to square the RawSlice arrangement that was already flagged as problematic in previous reviews with the new borrowed plaintext payload representation where the data is spread across multiple independently borrowed slices. I've separately tried (and failed) to refactor the deframer code to try and wrangle out the RawSlice parts, so this feels like I'm hitting the same wall.

I'm very open to the possibility I'm completely on the wrong track here but as it stands I'm stuck and could use guidance.

ctz · 2024-01-31T15:29:32Z

Just a shower thought: should we change pub fn encrypt in impl<Data> WriteTraffic<'_, Data> { so application_data is &[&[u8]] instead?

At the current time, once we do #1723 we would have to implement io::Write::write_vectored by calling encrypt multiple times¹ which would regress the behaviour desired in this PR.

edit: this is probably not a request for a change to this PR.

I am ignoring allocating and copying ↩

Wonshtrum · 2024-01-31T15:52:43Z

Hello @cpu, we finished your attempt at rebasing on main, all the while encountering the same problem that you have.
We managed to get it to compile, but while pattern matching , we had to leave some branches of BorrowedPlainPayload unimplemented:

https://github.com/Wonshtrum/rustls/blob/f31f35d910a40e36c65654dbf97eafa1829b82bb/rustls/src/msgs/message.rs#L412-L427
https://github.com/Wonshtrum/rustls/blob/f31f35d910a40e36c65654dbf97eafa1829b82bb/rustls/src/msgs/deframer.rs#L164-L169

In both cases, they should be truly unreachable. This stems from the fact that we designed BorrowedPlainPayload as an outbound type but it is currently used (through BorrowedPlainMessage) for inbound data. As such BorrowedPlainPayload can hold references to fragmented regions of memory (through its Multiple variant), which makes sense as the data comes from the user and may be spread out in memory (this was our original complaint about write_vectored). But it does not make sense for incoming frames which are guaranteed to hold a single consecutive slice of bytes (coming from the Deframer internal buffer).
So in both cases, there is no proper way to handle the BorrowedPlainMessage::Multiple variant and we are guaranteed we will never have to by design.

Coming to this realization I think we have 4 different options:

keep it like that, it's a bit awkward but we can document it
explicitly take into account this dichotomy between inbound and outbound in the types. We would have an InboundBorrowedPayload and an OutboundBorrowedPayload and potentially similar types for messages.
use a single "powerful" Payload type, merging our BorrowedPlainMessage with the current Payload, and explicitly allow the payload as potentially being fragmented everywhere, even in the code handling inbound data
give up 😱

In my opinion, options 2 and 3 are the most elegant ones, but it's a lot of work for a feature originally meant to be well delimited.
We would love to have a chat about this with you folks. Do you hang out on Discord?

cpu · 2024-01-31T16:38:04Z

Hi @Wonshtrum, thanks for taking a look at this.

This stems from the fact that we designed BorrowedPlainPayload as an outbound type but it is currently used (through BorrowedPlainMessage) for inbound data.

💡 that does seem like the crux of the issues we were both running into. This is a good insight.

Coming to this realization I think we have 4 different options:

Of the four options presented (well, 3 since I don't think we should give up yet 😆) I think option 2 seems the most reasonable to me, but I'm curious what others think. I think option 1 should be a last resort, and option 3 seems like it might be more complexity than is merited.

We would love to have a chat about this with you folks. Do you hang out on Discord?

Absolutely, you can find us in the "crates" discord Djc created, in the #rustls channel.

djc · 2024-02-01T08:26:36Z

I would also gravitate towards option 2 probably at least as an initial approach. If we find there's a lot of duplication, can always merge them again into something more like option 3? But I think it will be helpful in terms of understanding, too, to figure out what the needs on the receive side are vs the needs on the send side (not sure between "receive" vs "send" and "inbound" vs "outbound" which is the clearer framing? I have a slight preference to the former).

Wonshtrum · 2024-02-01T09:25:23Z

I also think option 2 is the cleaner one. I quickly looked into the code, it seems there is only BorrowedPlainMessage that holds a BorrowedPlainPayload, and BorrowedPlainMessage is only held by Decrypted and Deframed which should both be inbound. So I would make the current BorrowedPlainMessage exclusively an inbound type holding a single slice of u8, and create a distinct outbound equivalent holding the current BorrowedPlainPayload and use it in all the write/encrypt/send code.
I may have overlooked something, but this seems like the only duplication needed.

Keksoj · 2024-02-01T15:21:32Z

I agree with you all, and on top of that, option 2 seems relatively doable.

As for the naming, I would go for Inbound- and Outbound rather than Send- and Receive-, because we need to indicate the direction of the payload, rather than the action we perform with it.

But maybe there are other namings available. Maybe Incoming- and Outgoing- ?

cpu · 2024-02-01T17:42:23Z

@rustls-benchmarking bench

cpu · 2024-02-01T18:40:18Z

The last push looks good to me 👍 I think the branch needs a rustfmt pass to fix the one CI task that failed.

cpu

Thanks again for picking this branch up and iterating with me. I think we're just about ready to follow up with some other folks for review.

Beyond the comments I left in my review I think we should clean up some of the intermediate work in the commit history. I think the right end state would be two commits:

The first, doing the BorrowedPlainMessage split into BorrowedPlainMessageInbound/BorrowedPlainMessageOutbound leaving the existing code as-is as much as possible and just introducing the new trait/types. I think this commit message should touch on the rationale for the change, e.g. that we want to handle the payload differently for one of the two w.r.t chunks.
The second, implementing the borrowed payload/chunker. I think this commit message should keep the existing comment from the original first commit in this branch, but also be augmented with a bit more of the context from the PR description. It doesn't need to have all the strace output but I think it's important to mention the performance motivation.

Does that make sense to you?

rustls/src/msgs/message.rs

rustls/src/msgs/deframer.rs

rustls/src/msgs/message.rs

Keksoj · 2024-02-02T19:38:15Z

We certainly intend to edit the commit history. We thought of having two commits:

chunked payload
Inbound/outbound

But it is more elegant the other way around, as you suggest. Thanks for the review! We're pretty excited about this!

rustls/src/conn.rs

Wonshtrum · 2024-02-13T10:52:53Z

I rebased on main.
For my part, I consider this PR done.

djc · 2024-02-13T12:04:35Z

Note: this change has nothing to do with my initial worries, it's just an optimization to avoid 2 bound checks.

I think this diff link is showing the opposite change of what you actually meant. I like the actual change, though!

djc · 2024-02-13T12:07:42Z

Technically, none of those points break any invariants of OutboundChunks, I just don't particularly like when a state has multiple valid ways to be represented (because it usually means multiple code paths to handle them). We could constrain OutboundChunks to always instantiate the simplest variant, or we could reduce the number of possible representations (by removing Empty whose only purpose is to be simple to create, or even remove Single, whose purpose is to avoid a double pointer indirection).

I don't have a very strong opinion, but removing Empty in favor of an empty slice in Single sounds kinda appealing? Otherwise it looks like OutboundChunks provides a decent abstraction so it doesn't feel like also removing Single in favor of having a Multi with some otherwise enforced invariants would be a great improvement.

rustls/src/msgs/message.rs

Wonshtrum · 2024-02-14T08:45:33Z

I think this diff link is showing the opposite change of what you actually meant. I like the actual change, though!

Unfortunately, this is the right diff, I changed the enumerate for a plain range loop. This micro-optimization works because the plain loop couples strongly the index to the slice's length, whereas, due to how it is written in the standard lib, the enumerated index is (from the compiler's point of view) completely independent of the slice (so any indexing in the slice must be bound checked).

I don't have a very strong opinion, but removing Empty in favor of an empty slice in Single sounds kinda appealing? Otherwise it looks like OutboundChunks provides a decent abstraction so it doesn't feel like also removing Single in favor of having a Multi with some otherwise enforced invariants would be a great improvement.

I removed Empty from OutboundChunks. The code can remain sensibly the same (removing just the corresponding match arms). But I wanted to try two variations with less invariants.

First I removed the "at least two slices in Multiple" invariant. This simplifies the split_at which now avoids "downgrading" to Single. The private with_cursors constructor can be removed. The copy_to_vec has to be modified to handle the case where Multiple holds only one chunk with both cursors in it.

pub fn copy_to_vec(&self, vec: &mut Vec<u8>) {
    match self {
        Self::Single(chunk) => vec.extend_from_slice(chunk),
        Self::Multiple { chunks, start, end } => {
            let mut size = 0;
            let last = chunks.len() - 1;
            for (i, chunk) in chunks.iter().enumerate() {
                let start = if i == 0 { *start } else { 0 };
                let end = if i == last { end - size } else { chunk.len() };
                vec.extend_from_slice(&chunk[start..end]);
                size += chunk.len();
            }
        }
    }
}
pub fn split_at(&self, mid: usize) -> (Self, Self) {
    match self {
        Self::Single(chunk) => {
            if chunk.len() <= mid { return (self.clone(), Self::Single(&[]); }
            (Self::Single(&chunk[..mid]), Self::Single(&chunk[mid..]))
        }
        Self::Multiple { chunks, start, end } => {
            for i in 0..chunks.len() {
                if size + chunks[i].len() >= (mid + start) {
                    return (
                        Self::Multiple { chunks: chunks[..=i], start: *start, end: start + mid },
                        Self::Multiple { chunks: chunks[i..], start: start + mid, end: *end },
                    );
                }
                size += chunks[i].len();
            }
            (self.clone(), Self::new_empty())
        }
    }
}

Then I removed the "start must be in the first chunk" and "end must be in the last chunk" invariants. The only invariant this version assumes is that "start is smaller than end". This simplifies again split_at which is now O(1) as it only has to change the cursors of Multiple. Again copy_to_vec takes a bit more complexity as we don't know anymore in which chunks the cursors point.

pub fn copy_to_vec(&self, vec: &mut Vec<u8>) {
    match self {
        Self::Single(chunk) => vec.extend_from_slice(chunk),
        Self::Multiple { chunks, start, end } => {
            let mut size = 0;
            for chunk in *chunks {
                let psize = size;
                let len = chunk.len();
                size += len;
                if size <= *start || psize >= *end { continue; }
                let start = if psize < *start { start - psize } else { 0 };
                let end = if end - psize < len { end - psize } else { len };
                vec.extend_from_slice(&chunk[start..end]);
            }
        }
    }
}
pub fn split_at(&self, mid: usize) -> (Self, Self) {
    match self {
        Self::Single(chunk) => {
            if chunk.len() <= mid { return (self.clone(), Self::Single(&[]); }
            (Self::Single(&chunk[..mid]), Self::Single(&chunk[mid..]))
        }
        Self::Multiple { chunks, start, end } => (
            Self::Multiple { chunks, start: *start, end: start + mid },
            Self::Multiple { chunks, start: start + mid, end: *end },
        )
    }
}

It's hard to create a representative benchmark for OutboundChunks, but these two propositions seem a little faster than the current implementation. The second produces a smaller assembly, but they seem equivalent in performance.

djc · 2024-02-14T09:08:53Z

It's hard to create a representative benchmark for OutboundChunks, but these two propositions seem a little faster than the current implementation. The second produces a smaller assembly, but they seem equivalent in performance.

This sounds good to me! Thanks for diving in some more. Maybe split_at() for Single can be (Self::Single(&chunk[..Ord::min(mid, chunk.len())]), Self::Single(&chunk[mid..]))? Avoids a bit of repetition.

djc · 2024-02-15T09:01:56Z

@Wonshtrum one more clippy issue to address!

Also looks like our external types workflow check broke, so we'll need to look into that (but not need to block this PR I think).

Wonshtrum · 2024-02-15T11:47:23Z

Sorry about that.
Fixed and rebased on main.

cpu · 2024-02-15T19:36:57Z

Also looks like our external types workflow check broke, so we'll need to look into that (but not need to block this PR I think).

This was fixed in main w/ #1791

I think we probably just need Ctz to do another pass on this branch and then it's ready for merge? I suspect based on OOB conversations this will be doable early next week 🤞

Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>

The ConnectionCommon<T>::write_vectored was implemented by processing each chunk, fragmenting them and wrapping each fragment in a OutboundMessage before encrypting and sending it as separate TLS frames. For very fragmented payloads this generates a lot of very small payloads with most of the data being TLS headers. OutboundChunks can contain an arbitrary amount of fragmented chunks. This allows write_vectored to process all its chunks at once, fragmenting it in place if needed and wrapping it in a OutboundMessage. All the chunks are merged in a contiguous vector (taking atvantage of an already existent copy) before being encrypted and sent as a single TLS frame. Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com> Co-Authored-By: Emmanuel Bosquet <bjokac@gmail.com>

Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>

djc · 2024-02-16T09:26:31Z

@Wonshtrum thanks for all your work on this!

Keksoj changed the title ~~create type BorrowedPayload~~ Limit fragmentation in write_vectored Nov 29, 2023

Keksoj force-pushed the borrowed-payload branch 3 times, most recently from b79a446 to cc4d000 Compare November 29, 2023 11:41

paolobarbolini reviewed Dec 3, 2023

View reviewed changes

rustls/src/msgs/fragmenter.rs Outdated Show resolved Hide resolved

rustls/src/msgs/message.rs Outdated Show resolved Hide resolved

rustls/src/msgs/fragmenter.rs Outdated Show resolved Hide resolved

Keksoj force-pushed the borrowed-payload branch from cc4d000 to 175a26d Compare December 4, 2023 15:30

cpu self-assigned this Jan 29, 2024

Wonshtrum force-pushed the borrowed-payload branch from 175a26d to f31f35d Compare January 31, 2024 15:23

Wonshtrum force-pushed the borrowed-payload branch from c1dd941 to 6c2bbed Compare February 2, 2024 00:33

cpu reviewed Feb 2, 2024

View reviewed changes

ctz reviewed Feb 2, 2024

View reviewed changes

rustls/src/conn.rs Outdated Show resolved Hide resolved

djc marked this pull request as ready for review February 13, 2024 12:07

cpu removed their assignment Feb 13, 2024

cpu approved these changes Feb 13, 2024

View reviewed changes

rustls/src/msgs/message.rs Outdated Show resolved Hide resolved

Wonshtrum force-pushed the borrowed-payload branch 2 times, most recently from 9c8197e to de1fdf7 Compare February 15, 2024 07:02

Wonshtrum force-pushed the borrowed-payload branch from de1fdf7 to e9bfe40 Compare February 15, 2024 11:47

ctz approved these changes Feb 15, 2024

View reviewed changes

djc enabled auto-merge February 15, 2024 22:22

Wonshtrum and others added 3 commits February 16, 2024 08:56

Split BorrowedPlainMessage in inbound and outbound types

c15ec06

Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>

Move is_valid_ccs as a method of InboundMessage

71746d6

Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>

auto-merge was automatically disabled February 16, 2024 07:56
Head branch was pushed to by a user without write access

Wonshtrum force-pushed the borrowed-payload branch from e9bfe40 to 71746d6 Compare February 16, 2024 07:56

djc enabled auto-merge February 16, 2024 09:01

djc added this pull request to the merge queue Feb 16, 2024

Merged via the queue into rustls:main with commit cf09842 Feb 16, 2024
23 checks passed

This was referenced Feb 29, 2024

Override poll_write_vectored? rustls/tokio-rustls#26

Closed

Forward vectored writes rustls/tokio-rustls#45

Merged

stormshield-gt mentioned this pull request Mar 6, 2024

Performance boost implementing poll_write_vectored and rustls 0.23 snapview/tokio-tungstenite#322

Closed

Keksoj mentioned this pull request Apr 9, 2024

TLS-encrypt frames globally instead of individually sozu-proxy/sozu#1029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit fragmentation in `write_vectored` #1640

Limit fragmentation in `write_vectored` #1640

Keksoj commented Nov 29, 2023 •

edited

Loading

codecov bot commented Nov 29, 2023 •

edited

Loading

Wonshtrum commented Nov 29, 2023

djc commented Nov 29, 2023

paolobarbolini left a comment

Keksoj commented Dec 4, 2023

ctz commented Dec 4, 2023

cpu commented Dec 13, 2023

rustls-benchmarking bot commented Dec 13, 2023 •

edited

Loading

cpu commented Dec 15, 2023

cpu commented Jan 29, 2024

cpu commented Jan 30, 2024 •

edited

Loading

ctz commented Jan 31, 2024 •

edited

Loading

Wonshtrum commented Jan 31, 2024

cpu commented Jan 31, 2024

djc commented Feb 1, 2024

Wonshtrum commented Feb 1, 2024 •

edited

Loading

Keksoj commented Feb 1, 2024

cpu commented Feb 1, 2024

cpu commented Feb 1, 2024

cpu left a comment

Keksoj commented Feb 2, 2024

Wonshtrum commented Feb 13, 2024

djc commented Feb 13, 2024

djc commented Feb 13, 2024

Wonshtrum commented Feb 14, 2024 •

edited

Loading

djc commented Feb 14, 2024

djc commented Feb 15, 2024

Wonshtrum commented Feb 15, 2024

cpu commented Feb 15, 2024

djc commented Feb 16, 2024

Limit fragmentation in write_vectored #1640

Limit fragmentation in write_vectored #1640

Conversation

Keksoj commented Nov 29, 2023 • edited Loading

Our issue

Our investigation

Our proposal

Syscall improvement

Benchmarks

Sōzu + Rustls 0.21.9

Sōzu + Rustls (this branch)

Discussion

codecov bot commented Nov 29, 2023 • edited Loading

Codecov Report

Wonshtrum commented Nov 29, 2023

djc commented Nov 29, 2023

paolobarbolini left a comment

Choose a reason for hiding this comment

Keksoj commented Dec 4, 2023

ctz commented Dec 4, 2023

cpu commented Dec 13, 2023

rustls-benchmarking bot commented Dec 13, 2023 • edited Loading

Benchmark results

Instruction counts

Significant differences

Other differences

Wall-time

Significant differences

Other differences

Additional information

cpu commented Dec 15, 2023

cpu commented Jan 29, 2024

cpu commented Jan 30, 2024 • edited Loading

ctz commented Jan 31, 2024 • edited Loading

Footnotes

Wonshtrum commented Jan 31, 2024

cpu commented Jan 31, 2024

djc commented Feb 1, 2024

Wonshtrum commented Feb 1, 2024 • edited Loading

Keksoj commented Feb 1, 2024

cpu commented Feb 1, 2024

cpu commented Feb 1, 2024

cpu left a comment

Choose a reason for hiding this comment

Keksoj commented Feb 2, 2024

Wonshtrum commented Feb 13, 2024

djc commented Feb 13, 2024

djc commented Feb 13, 2024

Wonshtrum commented Feb 14, 2024 • edited Loading

djc commented Feb 14, 2024

djc commented Feb 15, 2024

Wonshtrum commented Feb 15, 2024

cpu commented Feb 15, 2024

djc commented Feb 16, 2024

Limit fragmentation in `write_vectored` #1640

Limit fragmentation in `write_vectored` #1640

Keksoj commented Nov 29, 2023 •

edited

Loading

codecov bot commented Nov 29, 2023 •

edited

Loading

rustls-benchmarking bot commented Dec 13, 2023 •

edited

Loading

cpu commented Jan 30, 2024 •

edited

Loading

ctz commented Jan 31, 2024 •

edited

Loading

Wonshtrum commented Feb 1, 2024 •

edited

Loading

Wonshtrum commented Feb 14, 2024 •

edited

Loading