-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make the decode / decrypt pipeline non-allocating (groundwork) #1597
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1597 +/- ##
==========================================
- Coverage 96.16% 95.96% -0.20%
==========================================
Files 80 80
Lines 17337 18138 +801
==========================================
+ Hits 16672 17407 +735
- Misses 665 731 +66 ☔ View full report in Codecov by Sentry. |
44da081
to
91fb30f
Compare
@cpu @ctz @djc could I get a quick sanity check on the direction here? the main commits that affect the end-user and/or internal interfaces are:
TL;DR make things the bulk of commits that will reduce allocations will be fairly mechanical like the 'reduce allocs around ServerCertDetails ' one thoroughly minimizing the internal API in an incremental fashion is going to take a lot of commits but I'll break out a smaller PR out of this one that only does the interface changes now, I mainly did this to enable in-place (non-allocating) decoding / decryption in the so far, I have only removed allocations from -- what my uneducated guess of what the biggest contributors are like -- application-data and certificates and I'm seeing improvements of this magnitude (ran these locally; I've filtered out the noisy
(+) I intend to use this |
ACK'ing this but wanted to mention it might be a little bit. We're focused on 0.22 release prep and Ctz is AFK for a little bit. The high-level vision sounds good to me, but I think input from Djc and Ctz will be most valuable here. My experience in this domain is more limited. |
91fb30f
to
305ea4e
Compare
I, instead, trimmed down this PR and it's ready for review. I have added a high level description of the changes in the PR description. removing the bulk of the allocations will be need in a separate / subsequent PR. (the MSRV seems to not like the function/struct update syntax used in this PR 😢 ) |
96b20d6
to
d686cdb
Compare
@rustls-benchmarking bench |
Benchmark resultsInstruction countsSignificant differencesClick to expand
Other differencesClick to expand
Wall-timeSignificant differencesClick to expand
Other differencesClick to expand
Additional informationCheckout details:
|
c0d9ee8
to
1bcf479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not really comfortable with some of the things that are happening here. In particular the work around the RawSlice
which has some invariants which are neither enforced by the type system nor clearly documented, and some complexity around the AppendPayload
trait and impls that it's not obvious are needed.
(Informed by my experience working on #1179.)
4893608
to
49a5f47
Compare
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
Co-authored-by: Jorge Aparicio <jorge.aparicio@ferrous-systems.com>
49a5f47
to
35e4117
Compare
Pulling some discussion out of a resolved review thread:
|
this PR lays the groundwork for making the decoding - decryption - decoding pipeline "non-allocating" AKA removing as many allocations as possible
at a very high level that is enabled by these changes
MessageDeframer::pop
performs decoding - decryption in-place and yields borrowed data (slices) to the next layerCodec
gains a lifetime parameter'a
to allow itsread
method to return maximally borrowed data (&'a
) fromReader<'a>
-- this lets the post-decryption decoding to also be non-allocatingPayload
becomes astd::Cow
-like enum to support carrying either borrowed or owned data --HandshakePayload
and other types indirectly becomeCow
-like as wellState::handle
is tweaked to allow its return type hold borrowed data contained in themessage
parameterto showcase how this enables the removal of allocations from the pipeline, this PR makes decoding & decryption of application-data as well as certificates (
CertificateChain
) non-allocatingremoving the rest of the allocations that exist in the pipeline will be done in a subsequent PR
breaking changes:
MessageDecrypter::decrypt
. actual implementations don't have to change much, as you can see in the diff, but a breaking change is still a breaking change.this PR builds on top of #1595
I think it's worthwhile to explain why
Payload
and many of theState
implementers need to becomeCow
-like and cannot contain just references.In very simplified terms, you can think of the process of advancing
Connection
's internal state machine as the following function: (this applies to both the bufferedConnection
andUnbufferedConnection
APIs)each time this function is called, several records are decoded and decrypted from the buffered incoming TLS data (the
buffer
parameter in the above snippet). as mentioned above,Deframer::pop
now yields slices with lifetime'a
. eachmessage
s that's fed into the state machine through theState::handle
method can change the concrete state type behind thecurrent
trait object; each of those intermediate states can hold the'a
borrowed data inmessage
thanks to theState::handle
change done in this PR.eventually, the
deframer
is exhausted (e.g. the user may need to read data from the socket and append it tobuffer
) and it's time to return from the function. however, just after exiting thewhile
loopcurrent
may contain borrowed'a
content. it's not possible to store that borrowed content inConnection
because itsstate
field has a'static
constraint. that's where we are forced to erase the'a
lifetime by switching the true state behindcurrent
into its owned variant; this heap allocates any borrowed content. it's worth noting thatinto_owned
it's a no-operation in the case the state type does not contain any borrowed data.so yes, there's still one potential allocation in this function call but all of the many decoding and decryption events that happen within this function call becoming non-allocating; it also becomes possible to move between states -- within a single function call -- without needing to allocate borrowed content.