This repository has been archived by the owner on Oct 30, 2019. It is now read-only.
Standardisation of traits for (buffered) IO #53
Labels
WG async/await
Issues relevant to the async subgroup
There is a lot of fragmentation in this space, which is an especially big problem for something like buffering where we should be aiming for all libs to be able to share their buffers to reduce copying. However, traits for IO in general are still not really a solved problem it seems. Most things seem to be coming down to a difference between
(Async)(Buf)Read/Write
andStream/Sink
based solutions. We need to evaluate the pros and cons of each of these, and come to a decision, so that we can start using the same generic traits and patterns for IO across different crates and achieve consistency. I'm going to briefly go through the current solutions, and some that are being developed (please correct me if any of the code below is wrong), then we can compare and contrast them to find the best option.Options
AsyncRead / AsyncWrite
Very minimal. Mirrors the standard library, so also very familiar. Can only work with bytes, which limits it to relatively low level operations. Can work without allocation, but does require copying of bytes.
Stream / Sink
Generic over more than just IO. Can work with things that aren't bytes. Needs ownership of data being sent through (for IO applications) which will typically mean allocations are required. IO types don't directly implement these traits, you'd need to create wrappers such as Framed.
AsyncBufRead
This isn't a fully fledged idea yet, but I found myself using
AsyncRead
and aBytesMut
to roughly this effect a lot in my http parsing crate. Allows for all the benefits ofAsyncRead
, as well as the advantages of buffering - increased performance and not needing to worry too much about over reading.BufStream
More IO focussed than stream on its own. Still looks like it will require ownership of the bytes being sent in most cases. Also, the caller cannot choose how many bytes are read in each go.
Comparison
Firstly,
BufStream
andStream
appear very similar, it's just thatBufStream
is more specialised for IO thanStream
. As we are trying to find a good trait to make IO functions operate with, I think we can consider onlyBufStream
for reading, and perhaps a similarBufSink
equivalent.For our
Read
apis, we are then looking atAsyncRead
vsAsyncBufRead
vsBufStream
.From an API consumer's perspective, the main difference between each of these is who chooses how much reading is done and when.
AsyncRead
can choose an upper limit on how many bytes they receive per call, but must carefully set that upper limit so that the next attempt to read from the reader does not have the beginning of the message it is attempting to read cut off.AsyncBufRead
can choose an upper limit on how many bytes they receive per call, and if they accidentally read too many bytes they can simply choose not to consume that many.BufStream
have no control over how many bytes get sent through with each poll, and must adapt their code to be able to handle receiving extra bytes that are not part of the message they are trying to parse. This is especially difficult as we would need a generic interface for passing these extra unwanted bytes either back into the stream or onto the next function that tries to read from theBufStream
.While an API consumer who is doing something simple could make all three APIs work, in more complex cases the
AsyncBufRead
has definite advantages. Consider the case where a server is attempting to listen for two different types of messages on a single port - eg HTTP requests and websocket connections. It is necessary to be able to read exactly one HTTP request from the reader, and then immediately afterwards begin reading either further HTTP requests or websocket packets. It is therefore necessary that no excess bytes are consumed from the reader while parsing the HTTP request, as they would be missing from the start of the next message, and it is not known which parser will be used to parse that message.From a library designer's perspective, the differences between each of these is how closely they mirror the
read
APIs provided by the operating system, and therefore how much overhead in both performance and complexity is necessary to emulate the given API with operating system read sources.AsyncRead
can mirror the OS API exactly, and have no issues at allAsyncBufRead
need to extendAsyncRead
with a buffer implementation, but can do so without too much complexity by using crates such asbytes
.BufStream
would need to wrap anAsyncRead
-like API with something that allocates buffers and then emits them. This would not have high costs.All three of these cases are reasonably straightforward, and have limited performance costs. There is real disadvantage to any solution from a library designer's perspective.
From a performance perspective, the main issues are how many read calls are performed to parse a message, how much allocation is needed, and how much copying of data occurs.
AsyncRead
will require lots of read calls, and will require that data is copied out of the reader once, and into the buffer provided. No allocation is needed forAsyncBufRead
.AsyncBufRead
will require minimal read calls, and requires that data is copied into a buffer once, out of the inner reader. More reallocations than allocations would be necessary forAsyncBufRead
BufStream
will require minimum read calls*, and requires that data is read into owned buffers (some optimisations may be possible that prevent copying of memory here, apparently). Several small allocations are probably necessary forBufStream
.*the caller of the
BufStream
API has no control over how many bytes come in per read call, meaning that while it may be possible to read in fewer calls, it is not possible to prevent excess reading from occurring.For callers of
Write
APIs, we are looking atAsyncWrite
vsBufSink
.The
Sink
andAsyncWrite
APIs are very similar, with the only difference being whether attempting a write is one option or two (is it ready, followed by do the write). I think the extra complexity ofSink
makes it potentially harder to misuse, but as the APIs are so similar I think we should base our decision on keeping consistency with the read API we choose to use.Summary
In most ways, all 3 APIs could be used to achieve the same results. However, in the case of reading exactly up to the end of a message (and no further),
AsyncBufRead
is the only viable solution so far.Therefore, I'm currently leaning towards adopting a recommendation that we use
AsyncBufRead
(or in some casesAsyncRead
, with an impl provided to bridge the two) andAsyncWrite
(with a buffered alternative, similar to the standard library) for IO work, and standardise on making crates generic over this trait.The text was updated successfully, but these errors were encountered: