Skip to content

Support more generic unions #493

@schani

Description

@schani

Right now unions have these restrictions:

  • No more than one class type per union
  • No more than one array type per union
  • No more than one map type per union
  • No more than one enum type per union
  • Unions cannot contain both a class and a map type
  • Unions cannot contain both an enum and the string primitive type

Some of these are more easily lifted than others. The enum restrictions, are easy, for example, provided that the enums don't overlap. If they do, we could bail, or just go by whichever is first in the list. If the string primitive type is in the mix as well, use it if the string we got isn't part of any of the enums.

The one-array and one-map restrictions can be solved by deserializing the first array element and checking which type it is, i.e. if we have to handle A[] | B[] we parse the first element, and if it's an A we have an A[]. There are cases where things are more complicated, in particular:

  • An int[] vs an double[] can be disambiguated by any element, i.e [1,2,3.1415] looks like an int[] at first, until we get to the last element.

  • An empty array or map is undecidable. Do we add a separate union case for that? Do we pick the first array or map type in the oneOf sequence?

The class type case is the most complicated one. Here are some difficulties:

  • There might be cases where the data matches more than one type, such as when empty arrays or maps, or optional properties are involved, or enums and strings.

  • The element that breaks the ambiguity between types can come late. Take, for example, classes A: { foo: string } and B: { foo: string, x: bool }. Until we have seen x, or the end of the object, we don't know whether we're dealing with an A or B.

  • There can be ambiguous elements in addition to late ambiguity-breaking types. For example: A: { arr: int[] } and B: { arr: double[], x: bool}. If we see arr first we have to deserialize it into a double[] and then, if we don't see x, convert it into an int[]. Similar cases apply to maps, enums, and even class types that allow ambiguity (imagine, for example, the arr properties instead of being arrays, being two classes that differ in a single property which is an enum in one, a string in the other).

  • There could be more than one disambiguating element. For example: A: { x: int, y: double } and B: { x: double, y: int }. You can't decide between these two cases until you've seen both x and y.

In two of our target languages, Elm and TypeScript, none of this is an issue. TypeScript only does validation of types, so when it handles a union it just checks that the type satisfies one of the cases. Elm is somewhat similar, in that it adopts the inefficient but very generic deserialization strategy of just trying all possible cases one by one, and using the first one that works.

In many of the other frameworks we don't really have that choice - they don't let us restart reading the JSON for each case (C++ does, because, like Elm, it deserializes the JSON into an intermediate, dynamically typed, form first). In addition to that, I'm not fond of that kind of inefficiency.

I don't think writing a code generator for class types that's super smart and able to disambiguate at the earliest point is feasible. A pretty generic and feasible way to do this is to generate two types for each union type:

  1. The user-facing one that's nice and exactly what you expect.
  2. A messy internal one that is the element-wise union of the classes, i.e. what you get right now. We can already deserialize this, and once we have the result, we can run a simple disambiguation over it and construct the correct nice user-facing type out of it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions