[Proposal] Support ReadOnlySequence as input #61

ycrumeyrolle · 2019-02-03T12:47:42Z

The Encode / Decode operations support a ReadOnlySpan<byte> as data input.
It would be interesting to also support a ReadOnlySequence<byte> as data input for chunked data blocks.

The text was updated successfully, but these errors were encountered:

gfoidl · 2019-02-05T13:58:35Z

Thanks for the proposal. I like the idea, but have to think about it a bit more, as the same / similar could apply to the "output".

gfoidl · 2019-03-01T13:44:52Z

@ycrumeyrolle for your proposal what do you suggest for the "output"-side?
With ROSequence the input can be "endless", but I have no good idea where to write the encoded result.

If it tends to have overloads for pipelines, I'll rather push this to the consumer of this packages, as he knows where to read / write the data.

If well find a good solution for the output side, then your proposal would be nice to have.

ycrumeyrolle · 2019-03-02T13:31:52Z

Basically I would expect the same usage than the current implementation, I mean a Span as output.

I was thinking something like this:

public static OperationStatus Decode(in ReadOnlySequence<byte> base64Url, Span<byte> data, out int bytesConsumed, out int bytesWritten)
{
    SequencePosition position = default;
    Span<byte> span = data;
    base64Url.TryGet(ref position, out ReadOnlyMemory<byte> previousSegment, advance: true);
    int totalConsumed = 0;
    int totalWritten = 0;
    OperationStatus status;
    while (base64Url.TryGet(ref position, out ReadOnlyMemory<byte> segment, advance: true))
    {
        status = _base64.Decode(previousSegment.Span, span, out bytesConsumed, out bytesWritten, isFinalBlock: false);
        totalConsumed += bytesConsumed;
        totalWritten += bytesWritten;
        if (status != OperationStatus.NeedMoreData)
        {
            return status;
        }

        span = data.Slice(totalWritten);
        previousSegment = segment;
    }

    status  = _base64.Decode(previousSegment.Span, span, out bytesConsumed, out bytesWritten, isFinalBlock: true);
    bytesConsumed += totalConsumed;
    bytesWritten += totalWritten;
    return status;
}

gfoidl · 2019-03-02T15:18:25Z

How do you know how much space to allocate for data (output)?

ycrumeyrolle · 2019-03-02T15:38:14Z

You can assume the ROSequence is not modifiable and its length will not change anymore.

Otherwise, you may have to use a IBufferWriter instead of a Span.

gfoidl · 2019-03-02T17:00:08Z

Ah, yes that obvious for ROSequence. Thanks (and shame on me).

In your example what is base64Url.TryGet?

https://github.com/dotnet/corefx/issues/34761 shows an example where IBufferWriter is used as type for the destination. I'm torn back and forth whether to use Span or IBufferWriter.

Span can be used with any kind of buffers (arrays, pipelines, etc.). But the user has to manage the right size itself.
With IBufferWriter the implementation could handle this for the user. And I assume that ROSequence-usage is with pipelines...

So this would result in this API:

using System;
using System.Buffers;

namespace gfoidl.Base64
{
    public interface IBase64
    {
        int GetEncodedLength(int sourceLength);
        int GetMaxDecodedLength(int encodedLength);
        int GetDecodedLength(ReadOnlySpan<byte> encoded);
        int GetDecodedLength(ReadOnlySpan<char> encoded);
        OperationStatus Encode(ReadOnlySpan<byte> data, Span<byte> encoded, out int consumed, out int written, bool isFinalBlock = true);
        OperationStatus Encode(ReadOnlySpan<byte> data, Span<char> encoded, out int consumed, out int written, bool isFinalBlock = true);
+       OperationStatus Encode(ReadOnlySequence<byte> data, IBufferWriter<byte> encoded, out long consumed, out long written);
+       OperationStatus Encode(ReadOnlySequence<byte> data, IBufferWriter<char> encoded, out long consumed, out long written);
        OperationStatus Decode(ReadOnlySpan<byte> encoded, Span<byte> data, out int consumed, out int written, bool isFinalBlock = true);
        OperationStatus Decode(ReadOnlySpan<char> encoded, Span<byte> data, out int consumed, out int written, bool isFinalBlock = true);
+       OperationStatus Decode(ReadOnlySequence<byte> encoded, IBufferWriter<byte> data, out long consumed, out long written);
+       OperationStatus Decode(ReadOnlySequence<char> encoded, IBufferWriter<byte> data, out long consumed, out long written);
        string Encode(ReadOnlySpan<byte> data);
        byte[] Decode(ReadOnlySpan<char> encoded);
    }
}

Note that consumed and written are given as long, because for pipelines these could overflow when they are ints.

Are the char overloads actually needed?

ycrumeyrolle · 2019-03-04T08:13:12Z

Should you put this on the interface or simply as en extension method ?
The Core B64 encoder exposes the primitives, and the rest is extensions.

I do not see any use case of char overload. The main use case I see is in AspNetCore by getting the data from the BodyPipe,which exposes a ReadOnlySequence<byte>

If you use a IBufferWriter, you allow usage of any writer, but it require to implement this interface.

You can simply propose another extension method that internally wrap a Span into a IBufferWriter.
Edit: you can't wrap a Span...
Or... 2 extensions methods with both ?

ycrumeyrolle · 2019-03-04T08:24:05Z

Oh... I was faced to a major problem I did not expect...
The B64 decoder require that input length %4==0, and it cannot be respected by the ROSequence.

Do you think it is possible to evolve the current implementation that integrate the remaining of a previous decoding, decode the remaining + the first bytes, then decode the rest of the span?

public static OperationStatus Decode(ReadOnlySpan<byte> span, ReadOnlySpan<byte> remaining, Span<byte> data, out int bytesConsumed, out int bytesWritten)
{
   if (!remaining.IsEmpty)
   {
     var offset = DecodeRemaining(remaining, span, data);
     span = span.Slice(offset);
   }

  // continue decoding
}

And of course without perf impact? :)

gfoidl · 2019-03-04T14:41:14Z

or simply as en extension method ?

Hm, extension methods seem to be the better way.

The B64 decoder require that input length %4==0, and it cannot be respected by the ROSequence.

I think it's possible to do this (with slicing and second call to decode -- or like that).
Good that you captured this, as we can construct a test for that case (then).

As extension-method per #61 (comment)

gfoidl · 2019-04-05T10:43:19Z

FYI: it's not forgotten, just no time for this at the moment.

gfoidl · 2019-09-18T10:13:35Z

My last comment still applies 😉

Now I'd like to wait for the API review of https://github.com/dotnet/corefx/issues/41166, so these APIs here should be aligned to the API in corefx.

One point I like there, is that SequencePosition is used instead of outing the written data.

As extension-method per #61 (comment)

gfoidl · 2019-11-09T11:46:13Z

From the motivation for this issue (The main use case I see is in AspNetCore by getting the data from the BodyPipe, which exposes a ReadOnlySequence<byte>) in a first step the API should look like:

public static class ReadOnlySequenceExtensions
{
    public static long Encode(this Base64? encoder, in ReadOnlySequence<byte> data, IBufferWriter<byte>? base64Writer);
    public static bool TryDecode(this Base64? encoder, in ReadOnlySequence<byte> base64, IBufferWriter<byte>? dataWriter, out long consumed);
}

Notes:
Encode is safe, as no failure status should occur, thus the long return gives the bytes consumed.

TryDecode has to check for invalid data, hence it is a Try-methods, as I want it to not throw. consumed is "returned" via out.
If invalid data occurs, the return will be false, otherwise true. In any case as much data as possible will be consumed.

The span-bases overloads are left out at this step, but these can be added later if there is some demand for it.

gfoidl · 2019-11-09T11:56:29Z

Ahh, I changed my mind again 😉
Though it's easy to get from consumed to written, it may be more convenient to expose written too.

So the API is

public static void Encode(this Base64? encoder, in ReadOnlySequence<byte> data, IBufferWriter<byte>? writer, out long consumed, out long written);
public static bool TryDecode(this Base64? encoder, in ReadOnlySequence<byte> base64, IBufferWriter<byte>? writer, out long consumed, out long written);

As extension-method per #61 (comment)

* API defined As extension-method per #61 (comment) * Tests * Implementation * ReadMe and demo updated * Demo for ASP.NET Core BodyWriter * Added comment for hacky example [skip ci] * Added benchmark * Update source/gfoidl.Base64/ThrowHelper.cs Co-Authored-By: Yann Crumeyrolle <ycrumeyrolle@free.fr> * Whitespace * xml doc comments

gfoidl added the enhancement New feature or request label Feb 5, 2019

gfoidl added this to the Future milestone Feb 5, 2019

gfoidl added a commit that referenced this issue Mar 12, 2019

API defined

7b8ad73

As extension-method per #61 (comment)

gfoidl added a commit that referenced this issue Nov 8, 2019

API defined

299be27

As extension-method per #61 (comment)

gfoidl added a commit that referenced this issue Nov 9, 2019

API defined

fce80dd

As extension-method per #61 (comment)

gfoidl modified the milestones: Future, v1.1.0 Nov 9, 2019

gfoidl mentioned this issue Nov 9, 2019

Support ReadOnlySequence as input #106

Merged

gfoidl self-assigned this Nov 9, 2019

gfoidl closed this as completed in #106 Nov 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Support ReadOnlySequence as input #61

[Proposal] Support ReadOnlySequence as input #61

ycrumeyrolle commented Feb 3, 2019

gfoidl commented Feb 5, 2019

gfoidl commented Mar 1, 2019

ycrumeyrolle commented Mar 2, 2019

gfoidl commented Mar 2, 2019

ycrumeyrolle commented Mar 2, 2019

gfoidl commented Mar 2, 2019

ycrumeyrolle commented Mar 4, 2019

ycrumeyrolle commented Mar 4, 2019

gfoidl commented Mar 4, 2019

gfoidl commented Apr 5, 2019

gfoidl commented Sep 18, 2019

gfoidl commented Nov 9, 2019

gfoidl commented Nov 9, 2019

[Proposal] Support ReadOnlySequence as input #61

[Proposal] Support ReadOnlySequence as input #61

Comments

ycrumeyrolle commented Feb 3, 2019

gfoidl commented Feb 5, 2019

gfoidl commented Mar 1, 2019

ycrumeyrolle commented Mar 2, 2019

gfoidl commented Mar 2, 2019

ycrumeyrolle commented Mar 2, 2019

gfoidl commented Mar 2, 2019

ycrumeyrolle commented Mar 4, 2019

ycrumeyrolle commented Mar 4, 2019

gfoidl commented Mar 4, 2019

gfoidl commented Apr 5, 2019

gfoidl commented Sep 18, 2019

gfoidl commented Nov 9, 2019

gfoidl commented Nov 9, 2019