Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Support ReadOnlySequence as input #61

Closed
ycrumeyrolle opened this issue Feb 3, 2019 · 13 comments · Fixed by #106
Closed

[Proposal] Support ReadOnlySequence as input #61

ycrumeyrolle opened this issue Feb 3, 2019 · 13 comments · Fixed by #106
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ycrumeyrolle
Copy link
Contributor

The Encode / Decode operations support a ReadOnlySpan<byte> as data input.
It would be interesting to also support a ReadOnlySequence<byte> as data input for chunked data blocks.

@gfoidl
Copy link
Owner

gfoidl commented Feb 5, 2019

Thanks for the proposal. I like the idea, but have to think about it a bit more, as the same / similar could apply to the "output".

@gfoidl gfoidl added the enhancement New feature or request label Feb 5, 2019
@gfoidl gfoidl added this to the Future milestone Feb 5, 2019
@gfoidl
Copy link
Owner

gfoidl commented Mar 1, 2019

@ycrumeyrolle for your proposal what do you suggest for the "output"-side?
With ROSequence the input can be "endless", but I have no good idea where to write the encoded result.

If it tends to have overloads for pipelines, I'll rather push this to the consumer of this packages, as he knows where to read / write the data.

If well find a good solution for the output side, then your proposal would be nice to have.

@ycrumeyrolle
Copy link
Contributor Author

Basically I would expect the same usage than the current implementation, I mean a Span as output.

I was thinking something like this:

public static OperationStatus Decode(in ReadOnlySequence<byte> base64Url, Span<byte> data, out int bytesConsumed, out int bytesWritten)
{
    SequencePosition position = default;
    Span<byte> span = data;
    base64Url.TryGet(ref position, out ReadOnlyMemory<byte> previousSegment, advance: true);
    int totalConsumed = 0;
    int totalWritten = 0;
    OperationStatus status;
    while (base64Url.TryGet(ref position, out ReadOnlyMemory<byte> segment, advance: true))
    {
        status = _base64.Decode(previousSegment.Span, span, out bytesConsumed, out bytesWritten, isFinalBlock: false);
        totalConsumed += bytesConsumed;
        totalWritten += bytesWritten;
        if (status != OperationStatus.NeedMoreData)
        {
            return status;
        }

        span = data.Slice(totalWritten);
        previousSegment = segment;
    }

    status  = _base64.Decode(previousSegment.Span, span, out bytesConsumed, out bytesWritten, isFinalBlock: true);
    bytesConsumed += totalConsumed;
    bytesWritten += totalWritten;
    return status;
}

@gfoidl
Copy link
Owner

gfoidl commented Mar 2, 2019

How do you know how much space to allocate for data (output)?

@ycrumeyrolle
Copy link
Contributor Author

You can assume the ROSequence is not modifiable and its length will not change anymore.

Otherwise, you may have to use a IBufferWriter instead of a Span.

@gfoidl
Copy link
Owner

gfoidl commented Mar 2, 2019

Ah, yes that obvious for ROSequence. Thanks (and shame on me).

In your example what is base64Url.TryGet?

https://github.com/dotnet/corefx/issues/34761 shows an example where IBufferWriter is used as type for the destination. I'm torn back and forth whether to use Span or IBufferWriter.

Span can be used with any kind of buffers (arrays, pipelines, etc.). But the user has to manage the right size itself.
With IBufferWriter the implementation could handle this for the user. And I assume that ROSequence-usage is with pipelines...

So this would result in this API:

using System;
using System.Buffers;

namespace gfoidl.Base64
{
    public interface IBase64
    {
        int GetEncodedLength(int sourceLength);
        int GetMaxDecodedLength(int encodedLength);
        int GetDecodedLength(ReadOnlySpan<byte> encoded);
        int GetDecodedLength(ReadOnlySpan<char> encoded);
        OperationStatus Encode(ReadOnlySpan<byte> data, Span<byte> encoded, out int consumed, out int written, bool isFinalBlock = true);
        OperationStatus Encode(ReadOnlySpan<byte> data, Span<char> encoded, out int consumed, out int written, bool isFinalBlock = true);
+       OperationStatus Encode(ReadOnlySequence<byte> data, IBufferWriter<byte> encoded, out long consumed, out long written);
+       OperationStatus Encode(ReadOnlySequence<byte> data, IBufferWriter<char> encoded, out long consumed, out long written);
        OperationStatus Decode(ReadOnlySpan<byte> encoded, Span<byte> data, out int consumed, out int written, bool isFinalBlock = true);
        OperationStatus Decode(ReadOnlySpan<char> encoded, Span<byte> data, out int consumed, out int written, bool isFinalBlock = true);
+       OperationStatus Decode(ReadOnlySequence<byte> encoded, IBufferWriter<byte> data, out long consumed, out long written);
+       OperationStatus Decode(ReadOnlySequence<char> encoded, IBufferWriter<byte> data, out long consumed, out long written);
        string Encode(ReadOnlySpan<byte> data);
        byte[] Decode(ReadOnlySpan<char> encoded);
    }
}

Note that consumed and written are given as long, because for pipelines these could overflow when they are ints.

Are the char overloads actually needed?

@ycrumeyrolle
Copy link
Contributor Author

Should you put this on the interface or simply as en extension method ?
The Core B64 encoder exposes the primitives, and the rest is extensions.

I do not see any use case of char overload. The main use case I see is in AspNetCore by getting the data from the BodyPipe,which exposes a ReadOnlySequence<byte>

If you use a IBufferWriter, you allow usage of any writer, but it require to implement this interface.

You can simply propose another extension method that internally wrap a Span into a IBufferWriter.
Edit: you can't wrap a Span...
Or... 2 extensions methods with both ?

@ycrumeyrolle
Copy link
Contributor Author

Oh... I was faced to a major problem I did not expect...
The B64 decoder require that input length %4==0, and it cannot be respected by the ROSequence.

Do you think it is possible to evolve the current implementation that integrate the remaining of a previous decoding, decode the remaining + the first bytes, then decode the rest of the span?

public static OperationStatus Decode(ReadOnlySpan<byte> span, ReadOnlySpan<byte> remaining, Span<byte> data, out int bytesConsumed, out int bytesWritten)
{
   if (!remaining.IsEmpty)
   {
     var offset = DecodeRemaining(remaining, span, data);
     span = span.Slice(offset);
   }

  // continue decoding
}

And of course without perf impact? :)

@gfoidl
Copy link
Owner

gfoidl commented Mar 4, 2019

or simply as en extension method ?

Hm, extension methods seem to be the better way.

The B64 decoder require that input length %4==0, and it cannot be respected by the ROSequence.

I think it's possible to do this (with slicing and second call to decode -- or like that).
Good that you captured this, as we can construct a test for that case (then).

gfoidl added a commit that referenced this issue Mar 12, 2019
As extension-method per #61 (comment)
@gfoidl
Copy link
Owner

gfoidl commented Apr 5, 2019

FYI: it's not forgotten, just no time for this at the moment.

@gfoidl
Copy link
Owner

gfoidl commented Sep 18, 2019

My last comment still applies 😉

Now I'd like to wait for the API review of https://github.com/dotnet/corefx/issues/41166, so these APIs here should be aligned to the API in corefx.

One point I like there, is that SequencePosition is used instead of outing the written data.

gfoidl added a commit that referenced this issue Nov 8, 2019
As extension-method per #61 (comment)
@gfoidl
Copy link
Owner

gfoidl commented Nov 9, 2019

From the motivation for this issue (The main use case I see is in AspNetCore by getting the data from the BodyPipe, which exposes a ReadOnlySequence<byte>) in a first step the API should look like:

public static class ReadOnlySequenceExtensions
{
    public static long Encode(this Base64? encoder, in ReadOnlySequence<byte> data, IBufferWriter<byte>? base64Writer);
    public static bool TryDecode(this Base64? encoder, in ReadOnlySequence<byte> base64, IBufferWriter<byte>? dataWriter, out long consumed);
}

Notes:
Encode is safe, as no failure status should occur, thus the long return gives the bytes consumed.

TryDecode has to check for invalid data, hence it is a Try-methods, as I want it to not throw. consumed is "returned" via out.
If invalid data occurs, the return will be false, otherwise true. In any case as much data as possible will be consumed.

The span-bases overloads are left out at this step, but these can be added later if there is some demand for it.

@gfoidl
Copy link
Owner

gfoidl commented Nov 9, 2019

Ahh, I changed my mind again 😉
Though it's easy to get from consumed to written, it may be more convenient to expose written too.

So the API is

public static void Encode(this Base64? encoder, in ReadOnlySequence<byte> data, IBufferWriter<byte>? writer, out long consumed, out long written);
public static bool TryDecode(this Base64? encoder, in ReadOnlySequence<byte> base64, IBufferWriter<byte>? writer, out long consumed, out long written);

gfoidl added a commit that referenced this issue Nov 9, 2019
As extension-method per #61 (comment)
@gfoidl gfoidl modified the milestones: Future, v1.1.0 Nov 9, 2019
@gfoidl gfoidl self-assigned this Nov 9, 2019
gfoidl added a commit that referenced this issue Nov 17, 2019
* API defined

As extension-method per #61 (comment)

* Tests

* Implementation

* ReadMe and demo updated

* Demo for ASP.NET Core BodyWriter

* Added comment for hacky example

[skip ci]

* Added benchmark

* Update source/gfoidl.Base64/ThrowHelper.cs

Co-Authored-By: Yann Crumeyrolle <ycrumeyrolle@free.fr>

* Whitespace

* xml doc comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants