Skip to content

What are Discriminated Unions?

Paul Louth edited this page Jul 8, 2021 · 8 revisions

Consider that all types in programming languages represent sets of possible values. For example, bool is a set that has two possible values: true and false; uint represents the set of natural numbers from 0 to UInt32.MaxValue; when you use a generic parameter T in a type, then T represents every possible value in all types (unless its refined by a constraint clause - in fact, constraints could be seen as a 'filter' operation on the set of all values).

And so, if types are sets, then what happens if we create a union of sets?

Let's say we have the following enum:

    public enum Colour
    {
        Red,
        Green,
        Blue
    }

And then create a union of Colour and bool. Then the set of possible values are:

    true
    false
    Red
    Green
    Blue

But what happens if we create a union between bool and bool? Well, we just get:

   true
   false

Because duplicate items become one in a union operation. However, if we tag each set (each type) with an additional unique value then we get what's called a Discriminated Union (also known as: tagged union, variant, variant record, choice-type, disjoint union, sum-type or co-product). The easiest way to show that would be to use a tuple, with the first element being the tag and the second element being the value of the type. So, if we try to create a union of Colour and bool then we get:

    (1, true)
    (1, false)
    (2, Red)
    (2, Green)
    (2, Blue)

The integers being the tag in this case.

If we try to union bool and bool, we get:

    (1, true)
    (1, false)
    (2, true)
    (2, false)

They are discriminated by their 'tag'.

In most functional-first programming languages discriminated-unions are a first-class concept. This is an example from Haskell:

data Maybe a = Just a 
             | Nothing

Here Just and Nothing are type constructing functions that construct a Maybe a (like Maybe<A> in C#), but we can think of them as the tag for each set of values the various cases represent.

data Example = First Bool 
             | Second Bool

This simple example above would look like this in the tuple form shown earlier in this article:

    (First, True)
    (First, False)
    (Second, True)
    (Second, False)

The total number of values that are represented by a discriminated union is the sum of all the possible values in the individual cases. Which is why they are often referred to as sum types.

In C# we don't have sum-types yet. And so, the easiest way to represent them is to create a base-class or interface, and then a finite set of derived types:

    public interface Maybe<A> 
    {
        public static Maybe<A> Just(A value) => new Just<A>(value);
        public static Maybe<A> Nothing() => new Nothing<A>();
    }
    public class Just<A> : Maybe<A>
    {
        public readonly A Value;

        public Just(A value) => 
            Value = value;
    }
    public class Nothing<A> : Maybe<A>
    {
    }

This is obviously less convenient than the terse declarations in Haskell, and so we can achieve something similar by using the LanguageExt.CodeGen:

    [Union]
    public interface Maybe<A>
    {
        Maybe<A> Just(A value);
        Maybe<A> Nothing();
    }

NOTE: The big caveat here is that C# can't do completeness checking when using union-types with C# pattern-matching. However, the code-gen generates a Match function that does allow completeness checking.

Another way, which is slightly less powerful than the code-gen solutionm is to use C# records (available from C# 9):

    public abstract record Maybe<A>();
    public record Just<A>(A value) : Maybe<A>();
    public record Nothing<A>() : Maybe<A>();

This is also nice and terse, but also suffers from the lack of completeness checking, but you could add your own Match function:

    public abstract record Maybe<A>()
    {
        public B Match<B>(Func<Just<A>, B> Just, Func<Nothing<A>, B> Nothing) =>
            this switch
            {
                Just<A> j    => Just(j),
                Nothing<A> j => Nothing(j),
                _            => throw new NotSupportedException()
            };
    }

So, what are they useful for? Well, the key idea is that they represent a finite set of states, and the mapping of that finite set to another (via pattern matching) is more powerful and easier to reason about than the abstract/extend everything approach of the OO-world. Most programming is actually working on more concrete concepts than we're lead to believe by the OOP purists, and so creating a data model that more closely represents the number of states your application can be in is a good thing.

The above is true most of the time, but the reader should familiarise her or himself with The Expression Problem

Wiki Home