ReadAsync doesn't return until output buffer filled or stream ends #32

LinusKardellInfobric · 2024-02-20T09:12:51Z

I'm trying to use this library for adding compression to a network protocol, and the idea I had is to simply wrap the protocol in a ZSTD-compressed stream. Each time an end sends a message, it would flush out a compressed block. But I'm running into the problem that DecompressionStream.ReadAsync doesn't exit return the data it has already decompressed even if no more data is currently available on the inner stream. ReadAsync only exits once output.pos >= output.size, or innerStream.ReadAsync returns 0 (when the stream has ended). So there seems to be no way fetch all currently available data from a DecompressionStream when reading from a never-ending stream (unless you read one byte at a time from the DecompressionStream, but that would presumably be inefficient), and there seems to be no other way to access streaming decompression in this library without using unsafe code.

I don't see any clear, clean way of having it check if there is any more available data (there is no generic DataAvailable on Stream, and no way to use timeout or cancellation with NetworkStream), so the only solutions I can think of would be to:

Have ReadAsync not await innerStream.ReadAsync if it already has some decompressed data, and instead possibly leave it in progress to be picked up by a subsequent call. Something like:

        private Task<int> readInProgress;

…


         var output = new ZSTD_outBuffer_s { pos = 0, size = (nuint)buffer.Length };
         while (output.pos < output.size)
         {
            if (input.pos >= input.size)
            {
               int bytesRead;
               var task = readInProgress != null ? readInProgress : innerStream.ReadAsync(inputBuffer, 0, inputBufferSize, cancellationToken);
               int bytesRead = 0;
               if (output.pos == 0)
               {
                  bytesRead = await task.ConfigureAwait(false);
                  task = null;
               }
               else if (task.IsCompleted)
               {
                  bytesRead = task.Result;
                  task = null;
               }

               readInProgress = task;

               if (bytesRead == 0 && task == null)
               {
                  if (checkEndOfStream && lastDecompressResult != 0)
                     throw new EndOfStreamException("Premature end of stream");

                  break;
               }

               input.size = (nuint)bytesRead;
               input.pos = 0;
            }

            lastDecompressResult = DecompressStream(ref output, buffer.Span);
         }

Though that's just something I quickly threw together, I haven't tested it, and Read would also need to be modified in order for mixed Sync/Async read to be safe, and I have no idea how to deal with that in Dispose.

Have ReadAsync never call innerStream.ReadAsync when output.pos > 0 (which means it would call innerStream.ReadAsync at most once on each call). Would have the drawback that ReadAsync might not fill the output buffer even if there is more data available.

Provide a thin, non-stream-based, safe wrapper around Decompressor.DecompressStream. Something like:

        public (nuint Status, nuint InputPos, nuint OutputPos) DecompressStream(Span<byte> inputBuffer, Span<byte> outputBuffer)
        {
            fixed (byte* inputBufferPtr = inputBuffer)
            fixed (byte* outputBufferPtr = outputBuffer)
            {
                var input = new ZSTD_inBuffer_s { size = (nuint)inputBuffer.Length, pos = 0, src = inputBufferPtr };
                var output = new ZSTD_outBuffer_s { size = (nuint)outputBuffer.Length, pos = 0, dst = outputBufferPtr };
                var returnValue = DecompressStream(ref input, ref output);
                return (returnValue, input.pos, output.pos);
            }
        }

The text was updated successfully, but these errors were encountered:

oleg-st · 2024-02-20T11:14:05Z

I like the second option, we can increase the inputBuffer via bufferSize if we want to get more data in ReadAsync.

LinusKardellInfobric · 2024-02-20T11:42:53Z

I think in that option ReadAsync would also need to start out by calling DecompressStream without reading any new input, to check if there is any data still buffered in the decompression context ~~or in inputBuffer~~ which didn't fit in output on the previous call (at least if the previous call ended with output.pos == output.size).

LinusKardellInfobric · 2024-02-20T12:02:38Z

Maybe something like (again, something I just quickly scratched together without testing):

            EnsureNotDisposed();

            var output = new ZSTD_outBuffer_s { pos = 0, size = (nuint)buffer.Length };

            while (true)
            {
                // Check if we already have some data available to decompress
                lastDecompressResult = DecompressStream(ref output, buffer.Span);
                if (output.pos > 0)
                {
                    return (int)output.pos;
                }

                // Otherwise, read some more data
                if (input.pos >= input.size)
                {
                    int bytesRead;
                    if ((bytesRead = await innerStream.ReadAsync(inputBuffer, 0, inputBufferSize, cancellationToken)
                         .ConfigureAwait(false)) == 0)
                    {
                        if (checkEndOfStream && lastDecompressResult != 0)
                        {
                            throw new EndOfStreamException("Premature end of stream");
                        }

                        return 0;
                    }

                    input.size = (nuint)bytesRead;
                    input.pos = 0;
                }
            }

oleg-st · 2024-02-20T12:13:44Z

input.pos >= input.size checks that all input is consumed and we need to read more

LinusKardellInfobric · 2024-02-20T12:42:15Z

True, but if I understand correctly it is still possible that the ZSTD_DCtx_s has consumed all compressed data from input, but could not fit all the decompressed data in output on the previous call.

oleg-st · 2024-02-20T13:19:33Z

We can add this before ReadAsync:

// frame is completely decoded so flush it before next read
if (lastDecompressResult == 0 && output.pos > 0) {
    break;
}

LinusKardellInfobric · 2024-02-20T13:46:24Z

I'm afraid I don't see the purpose of checking if a frame has ended there.

oleg-st · 2024-02-20T13:53:31Z

A frame ends at the flush/close of the compression stream, so a flush in compression will cause a flush in decompression.

LinusKardellInfobric · 2024-02-20T14:06:14Z

Flushing the compression would end the current block (ZSTD_e_flush), but not necessarily end the current frame (ZSTD_e_end). And it is still possible that when a frame ends the sender immediately starts a new frame (so the start of the new frame is received at the same time as the end of the previous frame), or that data remains buffered in the ZSTD_DCtx_s.

LinusKardellInfobric · 2024-02-20T17:04:22Z

Should I make a pull request with my solution?

oleg-st · 2024-02-20T17:47:18Z

Flush before ReadAsync then?

if (output.pos > 0) {
    break;
}

LinusKardellInfobric mentioned this issue Feb 21, 2024

Return available data from stream, before waiting for more #33

Merged

oleg-st closed this as completed in #33 Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadAsync doesn't return until output buffer filled or stream ends #32

ReadAsync doesn't return until output buffer filled or stream ends #32

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024 •

edited

Loading

LinusKardellInfobric commented Feb 20, 2024 •

edited

Loading

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024

LinusKardellInfobric commented Feb 20, 2024 •

edited

Loading

oleg-st commented Feb 20, 2024

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024

LinusKardellInfobric commented Feb 20, 2024 •

edited

Loading

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024

ReadAsync doesn't return until output buffer filled or stream ends #32

ReadAsync doesn't return until output buffer filled or stream ends #32

Comments

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024 • edited Loading

LinusKardellInfobric commented Feb 20, 2024 • edited Loading

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024

LinusKardellInfobric commented Feb 20, 2024 • edited Loading

oleg-st commented Feb 20, 2024

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024

LinusKardellInfobric commented Feb 20, 2024 • edited Loading

LinusKardellInfobric commented Feb 20, 2024

oleg-st commented Feb 20, 2024

oleg-st commented Feb 20, 2024 •

edited

Loading

LinusKardellInfobric commented Feb 20, 2024 •

edited

Loading

LinusKardellInfobric commented Feb 20, 2024 •

edited

Loading

LinusKardellInfobric commented Feb 20, 2024 •

edited

Loading