Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/binary: let Write and Read recognize common interfaces #29222

Closed
elagergren-spideroak opened this issue Dec 13, 2018 · 8 comments

Comments

Projects
None yet
5 participants
@elagergren-spideroak
Copy link

commented Dec 13, 2018

Proposal: let binary.Write and binary.Read recognize common interfaces

Author(s): Eric Lagergren
Last updated: 2018-12-14

Abstract

Currently, binary.Write and binary.Read rejects struct fields, array elements, and top-level slices that don't have a fixed length. This limitation can make serializing structs time-consuming and error-prone. This proposal would remove this limitation, provided the non-fixed length type implements one of two interfaces.

Background

Attempting to use binary.Write or binary.Read to (de)serialize a struct, array, or top-level slice that contains a non-fixed length type (for example, []byte) as a field or element results in an error. This is explicitly stated in the docs:

Data must be a fixed-size value or a slice of fixed-size values, or a pointer to such data.

Example 1, struct.
Example 2, array.

I originally encountered this limitation when I needed to use an API that required serialized structs. For example:

type Header struct {
    Magic uint32
    Len   uint32
}

type T struct {
    Header Header
    Data   []byte // len(Data) == Header.Len
}

var b bytes.Buffer
err := binary.Write(&b, binary.LittleEndian, T{ ... })
if err != nil {
    // something ...
}
UseAPIThatIDontControl(b.Bytes())

Since T.Data does not have a fixed length, the call to binary.Write results in an error.

In order to serialize T, I have to create a custom method:

func (t T) serialize() []byte {
    var b bytes.Buffer
    binary.Write(&b, binary.LittleEndian, t.Header) // This works, since Header only has fixed-length fields
    b.Write(t.Data)
    return b.Bytes()
}

UseAPIThatIDontControl((T{ ... }).serialize())

This specific case (T) is quite simple. However, I have at least 5 other types, some of which look like this:

type G struct {
    Header Header
    Key1 []byte
    X [20]uint32
    Y [10]uint32
    Key1 []byte
    D [4]uint32
    ...
}

Creating a serialize method for G is time-consuming and error-prone:

func (g G) serialize() []byte {
    // Error checking and field validation elided for brevity
    var b bytes.Buffer
    binary.Write(&b, binary.LittleEndian, g.Header)
    binary.Write(&b, binary.LittleEndian, g.X)
    binary.Write(&b, binary.LittleEndian, g.Y)
    b.Write(g.Key1) // oops, this is out of order
    b.Write(g.Key2)
    binary.Write(&b, binary.LittleEndian, g.Q)
    return b.Bytes()
}

Proposal

I propose we add two interfaces that allow user-specified encodings to and from []byte for types that don't have a fixed length.

We need not use an already-existing interface, but encoding.BinaryMarshaler and encoding.BinaryUnmarshaler are good candidates.

Rationale

This proposal provides a simple way to serialize more complex structs, arrays, and top-level slices—something that has historically been time-consuming, error prone, and/or required code generation.

Compatibility

The change is Go 1 compatible if the interfaces are only checked after determining that the type does not have a fixed length. This is similar to: #12146

Implementation

TBD

Open issues (if applicable)

This idea has been mentioned before by Russ: #478 (comment)

@ianlancetaylor ianlancetaylor changed the title binary: let Write and Read recognize common interfaces encoding/binary: let Write and Read recognize common interfaces Dec 13, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

To be specific, it sounds like you are suggesting that we add these lines to start of encoding.Write:

    if m, ok := data.(encoding.BinaryMarshaler); ok {
        data, err := m.MarshalBinary()
        if err != nil {
            return err
        }
        _, err := w.Write(data)
        return err
    }

Also add similar lines to binary.Read.

This doesn't seem too useful at top level, since you could just do that yourself. In particular I don't see how it helps your example. You would not be able to implement the MarshalBinary method by calling encoding.Write, because that would just lead directly to an endless recursion.

The place where this could be useful is when you have a struct containing some fields that binary.Write can handle directly and some fields for which you define a MarshalBinary method. I don't know how common that case is.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

Also I note that the possibility that some types implement their MarshalBinary method by calling binary.Write means that this would probably be a breaking change for some programs.

@ericlagergren

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

To be specific, it sounds like you are suggesting that we add these lines to start of encoding.Write:

Or some equivalent, yes.

This doesn't seem too useful at top level, since you could just do that yourself.

Could you expand on this? I for sure can handle each field myself, but again that's time-consuming and error-prone.

In particular I don't see how it helps your example. You would not be able to implement the MarshalBinary method by calling encoding.Write, because that would just lead directly to an endless recursion.

The method name (MarshalBinary) on T was just a typo in the example, ignore the name and the recursion.

Also I note that the possibility that some types implement their MarshalBinary method by calling binary.Write means that this would probably be a breaking change for some programs.

This proposal could definitely be interface independent as well. It's not married to MarshalBinary or UnmarshalBinary.

(PS: same person, different GitHub account.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

This doesn't seem too useful at top level, since you could just do that yourself.

Could you expand on this? I for sure can handle each field myself, but again that's time-consuming and error-prone.

At top level you can just do your own check for BinaryMarshaler before calling binary.Write.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

This is an API change and a potentially breaking one, so turning it into a proposal.

@ianlancetaylor ianlancetaylor changed the title encoding/binary: let Write and Read recognize common interfaces proposal: encoding/binary: let Write and Read recognize common interfaces Dec 13, 2018

@gopherbot gopherbot added this to the Proposal milestone Dec 13, 2018

@gopherbot gopherbot added the Proposal label Dec 13, 2018

@ericlagergren

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

At top level you can just do your own check for BinaryMarshaler before calling binary.Write.

That doesn’t scale well, though. The biggest benefit to Write and Read is being able to serialize and entire type in one swoop. When one of the struct’s fields can’t be serialized, you’re forced to either serialize each field by hand or resort to code generation.

Imagine if JSONMarshaler didn’t exist.

@elagergren-spideroak

This comment has been minimized.

Copy link
Author

commented Dec 14, 2018

I've updated the proposal to make it more succinct and clarified unclear parts.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 19, 2018

The doc comment for this package says:

Package binary implements simple translation between numbers and byte
sequences and encoding and decoding of varints.

Numbers are translated by reading and writing fixed-size values. A
fixed-size value is either a fixed-size arithmetic type (bool, int8, uint8,
int16, float32, complex64, ...) or an array or struct containing only
fixed-size values.

...

This package favors simplicity over efficiency. Clients that require
high-performance serialization, especially for large data structures, should
look at more advanced solutions such as the encoding/gob package or protocol
buffers.

Everything about Read and Write is meant for fixed-size values, not arbitrary values with custom marshalers and so on. It looks like I did say in 2010 that I would be interested in proposals to extend that, but at this point it seems like binary is doing well as is and that more advanced uses should go to json or gob or something else. Let's not add more complexity here.

@rsc rsc closed this Dec 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.