-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Right now unions have these restrictions:
- No more than one class type per union
- No more than one array type per union
- No more than one map type per union
- No more than one enum type per union
- Unions cannot contain both a class and a map type
- Unions cannot contain both an enum and the string primitive type
Some of these are more easily lifted than others. The enum restrictions, are easy, for example, provided that the enums don't overlap. If they do, we could bail, or just go by whichever is first in the list. If the string primitive type is in the mix as well, use it if the string we got isn't part of any of the enums.
The one-array and one-map restrictions can be solved by deserializing the first array element and checking which type it is, i.e. if we have to handle A[] | B[] we parse the first element, and if it's an A we have an A[]. There are cases where things are more complicated, in particular:
-
An
int[]vs andouble[]can be disambiguated by any element, i.e[1,2,3.1415]looks like anint[]at first, until we get to the last element. -
An empty array or map is undecidable. Do we add a separate union case for that? Do we pick the first array or map type in the
oneOfsequence?
The class type case is the most complicated one. Here are some difficulties:
-
There might be cases where the data matches more than one type, such as when empty arrays or maps, or optional properties are involved, or enums and strings.
-
The element that breaks the ambiguity between types can come late. Take, for example, classes
A: { foo: string }andB: { foo: string, x: bool }. Until we have seenx, or the end of the object, we don't know whether we're dealing with anAorB. -
There can be ambiguous elements in addition to late ambiguity-breaking types. For example:
A: { arr: int[] }andB: { arr: double[], x: bool}. If we seearrfirst we have to deserialize it into adouble[]and then, if we don't seex, convert it into anint[]. Similar cases apply to maps, enums, and even class types that allow ambiguity (imagine, for example, thearrproperties instead of being arrays, being two classes that differ in a single property which is an enum in one, a string in the other). -
There could be more than one disambiguating element. For example:
A: { x: int, y: double }andB: { x: double, y: int }. You can't decide between these two cases until you've seen bothxandy.
In two of our target languages, Elm and TypeScript, none of this is an issue. TypeScript only does validation of types, so when it handles a union it just checks that the type satisfies one of the cases. Elm is somewhat similar, in that it adopts the inefficient but very generic deserialization strategy of just trying all possible cases one by one, and using the first one that works.
In many of the other frameworks we don't really have that choice - they don't let us restart reading the JSON for each case (C++ does, because, like Elm, it deserializes the JSON into an intermediate, dynamically typed, form first). In addition to that, I'm not fond of that kind of inefficiency.
I don't think writing a code generator for class types that's super smart and able to disambiguate at the earliest point is feasible. A pretty generic and feasible way to do this is to generate two types for each union type:
- The user-facing one that's nice and exactly what you expect.
- A messy internal one that is the element-wise union of the classes, i.e. what you get right now. We can already deserialize this, and once we have the result, we can run a simple disambiguation over it and construct the correct nice user-facing type out of it.