Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding: add ScalarMarshaler and ScalarUnmarshaler #56235

Open
dsnet opened this issue Oct 14, 2022 · 25 comments
Open

encoding: add ScalarMarshaler and ScalarUnmarshaler #56235

dsnet opened this issue Oct 14, 2022 · 25 comments

Comments

@dsnet
Copy link
Member

dsnet commented Oct 14, 2022

There are often types that are fundamentally opaque wrappers around a scalar type. One such example is atomic.Bool and friends. In order to get these types to operate well with serialization libraries, they would need to implement one of the Marshaler or Unmarshaler interfaces (see #54582).

The current interfaces in the "encoding" package fall short:

  • encoding.TextMarshaler and encoding.TextUnmarshaler only treat the type as if it were a string, and so can't natively represent a boolean or number.
  • json.Marshaler and json.Unmarshaler is specific to JSON, which means that only "encoding/json" benefits and not other encoding libraries (in stdlib such as "encoding/xml" or third-party packages).
  • The MarshalText or MarshalJSON always allocates, which is rather heavy-weight for something that is encoding a scalar.

To work around these detriments, I propose the addition of the following interface into the "encoding" package:

type ScalarMarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    MarshalScalar() (T, error)
}
type ScalarUnmarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    UnmarshalScalar(T) error
}

The type list specifies exactly 5 types, which matches the superset kind of all the scalar kinds in Go:

  • int8, int16, int32, int, etc. are not included since they can be represented as int64. When unmarshaling, the implementation can return an error if the provided int64 is out of range.
  • We do not declare the constraint as using the underlying kind (i.e., not ~bool | ~int64 | ~uint64 | ~float64 | ~complex128) to keep the possible set of concrete interfaces a finite set that encoding packages can actually check for.
  • string is not included because 1) a string is technically not a "scalar" type in that it can represent an infinite number of values, while scalar pertains to a finite set of possible values (i.e., following some scale), and 2) because we already provide TextMarshaler and TextUnmarshaler, which are more or less the same thing.

Many serialization formats have first-class support booleans and various numeric types, so this interface provides a way to express that fact.

It is a feature (not a bug) that the 5 possible representations of this interface all use the same method name. That is, for a given type, it can only choose to implement scalar representation for one of the scalar kinds. We do not need to worry what happens when a type implements both MarshalScalar() (int64, error) and MarshalScalar() (float64, error) at the same time.

If MarshalJSON and MarshalText and MarshalScalar are all present, it is the discretion of the encoding package to choose the precedence order. I recommend MarshalJSON taking precedence over MarshalText, taking precedence over MarshalScalar.

It is up to an encoding package to decide which of the scalar kinds it wants to support. For example, "encoding/json" would support scalar marshaling for bool, int64, uint64, and float64, but reject complex128 since JSON has no representation of complex numbers. This matches the existing "encoding/json" behavior where it can't serialize complex128.

\cc @johanbrandhorst @mvdan

@dsnet
Copy link
Member Author

dsnet commented Oct 14, 2022

A search through all public modules known by the module proxy showed no implementations of the MarshalScalar and UnmarshalScalar methods, so there should be no backwards compatibility concerns.

@mvdan
Copy link
Member

mvdan commented Oct 14, 2022

Overall the proposal sounds like a good idea to me. The only surprising aspect is that MarshalText would usually take precedence over MarshalScalar. If I have a type which could marshal itself as either an integer or a string, I don't think I could implement both methods correctly, as one would always win.

I guess I could instead implement MarshalJSON, but that is specific to a single encoding format.

@DeedleFake
Copy link

The only surprising aspect is that MarshalText would usually take precedence over MarshalScalar.

I think that's up to the encoding package in question. The proposal seems to just be suggesting that precedence for the encoding/json package specifically.

@mvdan
Copy link
Member

mvdan commented Oct 14, 2022

Whichever way the precedence ends up between the two, I think implementing the case I mention above with MarshalText and MarshalScalar would not be possible. Maybe that limitation by design is fine - I am simply pointing it out.

@dsnet
Copy link
Member Author

dsnet commented Oct 20, 2022

In my original proposal I argued against the inclusion of sub-64-bit integers since the value can be represented by the larger width kind. However, allowing smaller bit widths may be helpful for formats that encode smaller width integers differently.

Analysis:

  • Almost every text-based encoding in existence permits formatting numbers in decimal, where the exact bit-width is not directly expressed (i.e., you cannot distinguish between a 0 that came from an int8 versus a 0 from an int64).
  • Any binary encodings that use arbitrary precision integers (e.g., varints) are not affected.
  • Any binary encodings that use fixed-width representation are affected.

Thus, there is some utility (for formats like BSON) in supporting the smaller-width numeric kinds (e.g., int8, int16, int32, etc.). However, it is unclear whether the benefit is worth the complexity. Adding various bit widths will expand the set of scalar types by 8 (i.e., int8, int16, int32, uint8, uint16, uint32, float32, complex64). This will greatly increase the complexity of encoding libraries that need to check for these.

If we don't expose the width here, that information can always be exposed in a side-channel mechanism. For example:

type MyStruct struct {
    NumProcessed atomic.Int32 `bson:",int32"`
}

I'm still in favor of excluding the smaller width types, but wanted to call this caveat out.

@dsnet
Copy link
Member Author

dsnet commented Oct 20, 2022

I thought more about precedence order between MarshalBinary, MarshalText, and MarshalScalar. I think we should deliberately stay silent on the issue and leave it to each encoding package to decide what is best.

Consider the following example:

// Color is a finite enumeration of supported colors.
type Color int

const (
	Red   Color = 1
	Green Color = 2
	Blue  Color = 3
)

func (c Color) MarshalText() ([]byte, error)
func (c Color) MarshalScalar() (int64, error)
func (c *Color) UnmarshalText(b []byte) error
func (c *Color) UnmarshalScalar(v int64) error

For most text-based serializations (e.g., JSON), you probably want to use MarshalText over MarshalScalar.
For most binary-based serializations (e.g., CBOR), you probably want to use MarshalScalar over MarshalText.

When unmarshaling, you could imagine JSON using UnmarshalText if the input is a JSON string, and UnmarshalScalar if the input is a JSON number. This is precisely how the JSON specification for protocol buffers works, where enumerations are marshaled in its textual form, but permit unmarshaling from either the textual form or numeric forms.

@rsc
Copy link
Contributor

rsc commented Oct 26, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@mvdan
Copy link
Member

mvdan commented Oct 26, 2022

I'm still in favor of excluding the smaller width types, but wanted to call this caveat out.

I wonder, can we decide to add sub-64-bit integer types later? It's not clear to me whether that would be considered a breaking change given the type parameter.

I think we should deliberately stay silent on the issue and leave it to each encoding package to decide what is best.

That sounds like the right choice to me. As an implementer, I want to offer the encoding methods which makes sense for my type. I don't want to dictate which one should always be used over another. An encoding package can inspect which methods I implement and make a reasonable decision about which one to use, like yous ay.

@dsnet
Copy link
Member Author

dsnet commented Oct 27, 2022

I wonder, can we decide to add sub-64-bit integer types later? It's not clear to me whether that would be considered a breaking change given the type parameter.

I'm not sure, but I think so?

\cc @ianlancetaylor

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Oct 27, 2022

The only fashion by which constraints containing union elements are contravariant with their constituents is in the operation set. I.e., if adding a type to a union element creates a smaller operation set, then programs using that constraint can break. There are two reasons why this doesn't apply here:

  • The constraint isn't given an exported name, so no other code can actually vary with the union. Programs which want to be generic over all types implementing the proposed interfaces need to list the types again themselves, but adding to our list doesn't affect theirs.
  • Because bool is included, the operation set is already close to minimal. The only operation supported by both bool and any numeric type is comparison, which is supported by every numeric type. So, adding another numeric type can't change the operation set.

If we remove both of these properties, it is possible to write programs which fail to compile when adding another type to the union: https://go.dev/play/p/fbi6-RzzspL, https://go.dev/play/p/CGnAxdndZSA

(It is weird to think of untyped literals as "operations." I twisted myself up a couple times writing this.)

@ianlancetaylor
Copy link
Contributor

I agree that in a case like this adding additional terms to the type union in the constraint should be fine.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2022

It sounds like people are generally happy with

package encoding

type ScalarMarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    MarshalScalar() (T, error)
}
type ScalarUnmarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    UnmarshalScalar(T) error
}

What do we do after that? Specifically, it seems like we can either add support for them to marshalers (like encoding/gob), or we can add implementations to existing types in the standard library, but we can't do both, because that would likely change the wire format generated by some of these packages. So which do we do?

It sounds like above all we want to add them to the marshalers (like gob and json (and xml?)) and then after that, we can add these methods to types that gob and json (and xml?) currently cannot marshal at all and therefore have no existing wire format. This would include atomic.Uint32 and so on.

Do I have that right?

Also, is encoding/json really going to do 10 extra dynamic interface satisfaction checks on every value it marshals and unmarshals? Will we notice the expense?

@Merovius
Copy link
Contributor

Merovius commented Nov 2, 2022

@rsc Technically speaking adding them to the marshalers would be a backwards incompatible change, as types (not just in the stdlib) currently could have those methods and would then change how they are encoded.

Mayhaps something like json.Decoder.UseNumber? That would be strictly backwards compatible and allow to implement the interfaces for types in the stdlib.

Or we ignore the theoretical breakages in lieu of evidence of practical ones.

@Merovius
Copy link
Contributor

Merovius commented Nov 2, 2022

Also, is encoding/json really going to do 10 extra dynamic interface satisfaction checks on every value it marshals and unmarshals? Will we notice the expense?

I think encoding/json can do one check if a method of the name exists and only do the interface-satisfaction check in that case.

@dsnet
Copy link
Member Author

dsnet commented Nov 2, 2022

@Merovius, per #56235 (comment), there were no detected methods of the proposed name according to all code known by the module proxy.

I think encoding/json can do one check if a method of the name exists and only do the interface-satisfaction check in that case.

That's my plan for implementing this. The marshal path for "json" already has a sync.Map that keys from reflect.Type to encoder functions. There's an open issue to do the equivalent for unmarshal (#16212).

@Merovius
Copy link
Contributor

Merovius commented Nov 2, 2022

@dsnet Fair enough (Next time I chide someone for not reading the entire discussion before commenting, feel free to remind me of this 🤦)

@dsnet
Copy link
Member Author

dsnet commented Nov 2, 2022

No worries at all. The throughput of comments on golang/go issues and discussions is astonishing and unreasonable to expect every person to be at the same level of state.

@rsc
Copy link
Contributor

rsc commented Nov 9, 2022

Moving this to likely accept, but it seems like we should wait for Go 1.21 to make sure we land the encoder updates and the new implementations of these methods together.

@dsnet
Copy link
Member Author

dsnet commented Nov 9, 2022

we should wait for Go 1.21 to make sure we land the encoder updates and the new implementations of these methods together.

Agreed.

To be clear, acceptance of this proposal implies concurrent support within relevant "encoding/..." packages?

This would include the following packages:

  • "encoding/json"
  • "encoding/xml"
  • "encoding/gob"

It is unclear whether to add support in "encoding/asn1" since that package currently does not support encoding.BinaryMarshaler or encoding.TextMarshaler even though there are clear use-cases for it. It seems odd to partial support for some encoding interfaces rather than others.

@dsnet
Copy link
Member Author

dsnet commented Nov 9, 2022

Given the existing lack of support for encoding interface in "asn1", I propose we leave the package unchanged. The package does have MarshalWithParams and UnmarshalWithParams functions where we can add a special parameter that instructs the package to respect encoding interfaces. That can be a separate proposal in the future.

@rsc
Copy link
Contributor

rsc commented Nov 9, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Nov 16, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: encoding: add ScalarMarshaler and ScalarUnmarshaler encoding: add ScalarMarshaler and ScalarUnmarshaler Nov 16, 2022
@rsc rsc removed this from the Proposal milestone Nov 16, 2022
@rsc rsc added this to the Backlog milestone Nov 16, 2022
@ydnar ydnar mentioned this issue Dec 10, 2023
15 tasks
ydnar added a commit to ydnar/codec that referenced this issue Dec 12, 2023
@gopherbot
Copy link

Change https://go.dev/cl/553176 mentions this issue: encoding/json: handle encoding.ScalarMarshaler and encoding.ScalarUnmarshaler

@gopherbot
Copy link

Change https://go.dev/cl/553175 mentions this issue: encoding: add ScalarMarshaler and ScalarUnmarshaler

@zephyrtronium
Copy link
Contributor

I find this proposal valuable and didn't see any open CLs mentioned here, so I sent in CL 553175 defining the interfaces and CL 553176 implementing their use in encoding/json. I can do the same for encoding/xml and encoding/gob soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

8 participants