Skip to content

Language Design: Null vs. Optional

Reinier Zwitserloot edited this page Dec 16, 2020 · 3 revisions

null: Just the tip of the iceberg

You may have read or heard about this quote by Tony Hoare:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

This usually forms the rallying cry for a particular flavour of language design afficionado that insists that null is a mistake and that java needs to get rid of it ASAP. Generally, this rallying cry coalesces around the notion that java.util.Optional is therefore the solution.

This point of view is oversimplifying the matter. It is not useful to argue a straw man like this.

However, Optional 'fandom' is alive and well. This wiki page serves to explain why 'just use Optional!' and 'null was a billion dollar mistake' is oversimplifying matters, and give some context about how complex the underlying language design problems that null tries to solve are. Once this context is understood, an explanation of Lombok1's stance on these issues follows, which includes an explanation for why Lombok1 will not now and most likely not ever gain significant support for Optional.

[1] Specifically, those who carry the maintenance burden of lombok. Currently, @rspilker and @rzwitserloot.

What does null mean?

Let's dig into it a bit:

I was designing the first comprehensive type system

This part of Tony's quote is particularly important. Because null, as used here (and as used in most debates on this topic) is talking about two very different concepts.

null is a reference value. It's.. just a value. It is, in fact, a crucial value: After all, it is intrinsic to the very nature of computers and our universe that, sometimes, you have a need to convey that something is in the vein of 'not initialized yet', 'not found', or 'irrelevant in this situation'. Notably, null should not ever mean empty: If you want to convey empty, then use a non-null value that means that. Proper API design means you return "" and not null if somebody has no last name. Proper API design means you return an empty List<Address> instead of null if someone has no registered address, and the API allows for multiple addresses. It also does not mean 'stand in for illegal use of API' - throw exceptions for that. But 'not found', 'uinitialized' or 'irrelevant here' is fundamental to programming. There is a reason that null alternatives still have some concept to represent this. For example, Optional.empty() takes on the role of being the reference value to indicate this. Let's call this concept NUI because calling it null is presumptive (solutions to the NUI problem such as Optional don't have null at all). NUI is short for 'not found, uninitialized, or irrelevant in this context'.

null is also sometimes used as a property of a type system, and that is what Tony Hoare is talking about, as the above snippet of that quote shows. In java's original language design, any type in the system is to be read as 'either an X or null'. For example, given: public void foo(String x) {}, this signature indicates that you must pass 1 argument to this method, and that this 1 argument must be 'either a String or a null reference'. Java papers over this by stating that a null reference, as a value, is type-compatible with all types, but that's just handwavery: A null reference cannot be asked about its type (no null.getClass()), and a null reference isn't instanceof anything. So let's not dwell any further on the handwavy bit and go back to the more useful view that all java types, as per java's original (v1.0) language design, carry a caveat of '... or a null reference'.

Where the concept of NUI is a neccessary2 thing to have as a runtime value, having a type system that is incapable of registering or reasoning about whether or not some expression/parameter/field/etc is allowed to hold NUI or not isn't.

Whether that's what Tony Hoare meant or not: Surely all are in agreement that NUI as a concept is real enough, and thus any attempt to aggressively eliminate NUI as a concept entirely is misguided. Thus, "null sucks, get rid of it!" is oversimplifying things: As a runtime value, replacing null with Optional.none() is just replacing one strange value with another and you gain absolutely nothing. What is presumably really intended is: "The fact that I cannot convey via the type system whether null is allowed or not sucks, get rid of it!", and once this clarification has been added it is now obvious that this means there are many different ways to accomplish this goal. Optional is just one of them; there are others.

[2] One can imagine a language system where NUI cannot be represented at all; there is no null, and no Optional. Presumably this is not a particularly nice language to program for, unless each type is capable of provided a singleton immutable value that serves as default value as well as the value provided when NUI occurs. I think this could be an interesting language design to explore, but java isn't it and will never be (how can you possibly shift a language so much in a backwards compatible fashion?), and I doubt that's good language design; a major issue with NUI is that the situation can occur where a programmer forgets to account for it. A large part of the point of NUI language design is that this situation is detected as well as possible. "No compile-time errors or warnings, and silently do nothing at runtime" sounds like a really bad solution to the problem. A language design that tries something like this (And in fact seems rather nice to program for, but I haven't tried, is Pony).

NUI in the type system: Subtype, or not?

There is a key question that must be answered before continuing our search for a solution to convey NUI in the type system:

Is No-NUI String a subtype of NUI-allowed String?

One would think the answer is yes. After all, the relationship seems to align: All possible valid values for an expression of type No-NUI String are also valid to assign to a NUI-allowed String. In other words:

Integer x = someExpr();
Number n = x;

is never wrong - it is not possible for something you can assign to a variable of type Integer, to not be assignable to a variable of type Number: In the java type system, Integer is a 'subtype' of Number. The same would apply to NUIness:

String[But cannot ever be NUI] x = someExpr();
String[Can be NUI] n = x;

has the same relationship: The above cannot possibly fail. Note that the reverse is illegal in both cases; you can't assign an expression of type Number to an Integer variable: It could be fine, but it could be a type violation (what if the expression of type Number resolves to a double instead?), and the same goes for trying to assign a NUIable string expression to a variable of type String, but not NUI: What if the NUIable expression resolves to NUI?

The question now becomes: Do you want this, or do you want to disassociate the two entirely, and consider that there is no relationship whatsoever.

Optional is the latter. This is not legal3:

String x = "hello";
Optional<String> y = x;

That's not neccessarily 'wrong' language design. However, that is quite a drastic step. It forces you to 'unwrap' that optional almost immediately after obtaining it, and it therefore means that writing code that works on optionals is most likely just a mistake (there is a reason that you can find a ton of suggestions to not put Optional in your list of parameter types, even for languages that were designed from the ground up with optional in mind such as scala or haskell).

[3] A mechanism similar to autoboxing could be used to attempt to make this legal, but it'd be quite strange to introduce autoboxing without auto-unboxing, and auto-unboxing an Optional<String> to a String is... getting us right back to making it easy to write code that has absolutely no indication that the programmer who wrote it is aware that some expression could be resolving to NUI. Which is the one and only problem that NUI language design is trying to solve! Therefore, autoboxing is a disaster here, and is not worth further consideration.

The alternative: subtyping NUI

In order to judge whether Optional is good language design, we need to list and consider each feasible alternative design. Virtually all debates on Optional in context of java fails to do this, and compares Optional solely to the current java 1.0 design, and thus leaves out rather important alternatives.

The obvious alternate strategy is to carry NUI-allowed-ness in the type system, but with subtyping relationships. The obvious way to accomplish this, is with annotations. Imagine that all types, everywhere they are used in java (a lot of places; from local variables to return types to parameter types to generics!), must carry an annotation to indicate whether NUI is allowed. Then we can make this happen:

public @NUI String someNuiExpr() { ... }
public @NonNUI String someOtherExpr() { ... }

@NUI String x = someNuiExpr(); // legal
@NonNUI String y = x; // compiler error

@NonNui String a = someOtherExpr();
@NUI String b = a; // legal!

That's a trick Optional just cannot perform. Nevertheless, it solves the the central problem that NUI causes, which is bugs where the programmer failed to take into account that some expression can resolve to NUI, presumably because the language design (the type system) is not helping the programmer out to highlight this possibility. In this hypothetical language design, if the programmer writes something like:

Map<String, String> foo = ...;
String name = foo.get("hello").toLowerCase();

then the compiler will error out or at least warn, and inform the programmer that it looks like they failed to take into account the possibility that the foo.get("hello") expression resolves to NUI, and dereferencing NUI (with the .) is an error.

Note that subtyped NUI is not a new idea; languages like ceylon have it.

Therefore, the notion that NUIable x is a supertype of NeverNUI x is more powerful than Optional. More powerful sure sounds like it's better (hey, we can write more succint code!), and that's why Optional is bad. Unless...

Complexities of NUI typing systems: type dimensions and variance

Adding that subtyping relationship is actually quite complicated, because it introduces the concept of 'type dimensions', and generics makes typing systems a lot more complicated in general. Let's look at a type hierarchy: Integer extends Number extends Object. Let's explode this into the 6 relevant types, and use the pseudocode notation that ! indicates 'never NUI' and ? indicates may hold NUI. Then the 6 involved types are: Integer?, Integer!, Number?, Number!, Object?, and Object!.

Unfortunately there is no linear typing relationship anymore. We can say that Integer! extends Number! extends Object!. We can say that Integer! extends Integer? extends Number? extends Object?, but there's no way to make a single linear chain of subtypes for all 6. Instead you get this 2 dimensional 'graph', where the arrow head points at the subtype:

Integer!  ⟵ Number!  ⟵ Object!
  ↑            ↑            ↑
Integer?  ⟵ Number?  ⟵ Object?

Read A⟵B as: all As are also Bs.

This introduces complexity. However, note that 'type dimensions' are an aspect of our universe: The same situation occurs e.g. when considering for example an 'HTML-safe string' (no unverified user input that hasn't been HTML escaped is present inside it) vs. 'unsafe', or even 'instance of Person.Builder on which the name method has been invoked' (because then we can detect calling build() on a personbuilder on which the mandatory property name hasn't been set yet at write time, which is awesome). The concept of adding different dimensions is extremely powerful. Updating the language to be capable of dealing with this gives us subtyped NUI, and so much more.

Ordinarily, having multiple type dimensions isn't actually a problem. For example, given this code:

Number? y = someExpr();

The compiler knows how to apply that type grid. If the signature of someExpr() indicates its type is Number? Integer?, Number!, or Integer!, it is valid. Otherwise it is not. No problem. That's because the basic type system is inherently covariant: an expression of type T can silently be considered as an expression of type P, where P is some supertype of T.

Where the type grid gets much more complicated is generics. That's because applying covariance to generics is broken:

List<Integer> integers = new ArrayList<Integer>();
List<Number> numbers = integers; // oops!
numbers.add(Double.valueOf(5.0));
Integer i = integers.get(0); // oh dear

Here, we applied the usual covariant rules in the second line, but that was wrong - it opened the door to a type violation.

Hence, generics by default in java are invariant: That second line doesn't compile.

However, an invariant typing system is extremely annoying to work with. That's why java does still have variance in generics. But, the programmer has to explicitly pick their variance! List<? extends Number> is a list that is covariant on Number. List<? super Number> is contravariant.

There is even a 4th variance: The legacy variance. Just List. This gets us to this 'table of powers':

Variance type Java code get() useful? .add() allowed? Types allowed Safe?
Covariance List<? extends Number> yes NO List<Integer> yes
Contravariance List<? super Number> NO yes List<Object> yes
Invariance List<Number> yes yes NO; can only use List<Number> yes
Legacy/raw List yes4 yes Anything goes NO

As the table shows, every one of the 4 different variances have downsides, and that's just intrinsic to the problem domain. There is no solving this; that's why all 4 are required. Well, you don't neccessarily need legacy/raw, but you do if you want to take a language that didn't have this type dimension in the first place (java, before java5), and introduce this type dimension in a backwards compatible way and without marking all APIs written before it as obsolete (obsolete as in: Cannot be transitioned to add this type dimension without that library becoming backwards incompatible with the currently existing version).

The same applies to NUIness!!

Yeah, 4 nullities. Not 2! That's.. more than most proposals and languages have. Kotlin only has 2. Most annotation-based nullity systems in java only have 2 (@Nullable and @NonNull), possibly 3 (some support for legacy). Only ceylon and checkerframework's nullity annotation system has support for all 4. Nevertheless, if we're designing proper systems to deal with NUI, let's design the best one first, and all 4 is what it takes.

The power of the legacy variance should be obvious: It's what lets existing java libraries transition. They start out with legacy nullity, and any existing code will silently type-convert any and all types into this legacy nullity, thus ensuring code continues to compile. It's entirely analogous to the change between java4 and java5.

This leaves some questions about the power of co/contra/invariance.

Imagine a method that scans a list for the first element that fits some predicate. Upon finding it, that one is duplicated (the value is added to the end of the list), and then the method returns. If nothing matches, add some default value instead.

There's nothing inherent about the definition of this method that makes it NUIable or non-NUI. It can work on either type, and does not produce NUI values unless you want it to. This works fine:

public <@NonNUI T> void dupeFirst(List<T> list, Predicate<T> pred, T defaultValue) {
    for (var elem : list) {
        if (pred.test(elem)) {
            list.add(elem);
            return;
        }
    }
    list.add(defaultValue);
}

This code cannot break the typing system: It is impossible for this code to add NUI to the list, and the compiler can ascertain this: The only values that are ever .add()ed to the list are either elem, whose type is whatever list.iterator()'s generics return, but that's the same T as list, which here is @NonNUI T, thus, fine, or, it is defaultValue, but that too is type-checked never NUI.

However, you can replace @NonNUI with @NUIable, and the exact same code works just as well: This time the predicate needs to be able to deal with NUIs, but it is still properly typed (as a Predicate<@NonNUI T>), and this time the code could end up adding NUI to the list, but that's okay, because the list itself is specified to be allowed to hold NUIs.

With Optional, you cannot write this method so that it can deal with both. You can at best just write the basic T-only variant, and then as caller pass a List<Optional<String>> along with a Predicate<Optional<T>> along with a default value wrapped with Optional.of.

However, if I already have a predicate, I need to make a new predicate just to unwrap that optional, or vice versa, if I make a predicate for this, and I want to use it later to filter a List<String>, I can't, even though type wise all is well: A predicate that can test() any Optional<String> can test non-optional strings just as well, but Optional, as a typing system, cannot convey this. The fundamental problem is that the Optional-based typing system change does not support co/contravariance.

[4] Well, it returns just Object, so it seems like it is the same as the contravariance case, i.e. not that useful, but that's because the info is missing, more than that it is about whether it is allowed or not.

In defense of optional

Whilst variance and type dimensions are extremely powerful, they are also extremely complicated: The sheer amount of questions about java, generics, and why you e.g. cannot pass a List<Integer> for a parameter that wants a List<Number> (just search Stack Overflow, for example) suggests that one way out is to just close our eyes, stick our fingers in our ears, and imagine that variance isn't a thing: Force programmers to 'unwrap' optionals on the spot ASAP.

But that does not make for a particularly powerful language.

A trade-off, and I see no further way to use logic or falsifiable facts to get any further on the debate. Your 'gut instinct' and experience will determine which of the trade-offs you prefer.

... except.. backwards compatibility!

There is one aspect of optional vs. subtyping null that should convince you that for java in particular, subtyped NUI is vastly superior, and that is: Backwards compatibility.

Given an existing API written before Optional ever existed, say, java.util.Map's get() method:

interface Map<K, V> {
    public V get(Object o);
}

It is crystal clear that if Optional is the way forward, that this API is now obsolete: It's just wrong, now. It should be public Optional<V> get(Object o) because the N part of NUI is an obvious possible response here: get may have to return 'not found'.

There is no way to do this in a backwards compatible fashion. The only option is to introduce java.util.Map2 or java.util2.Map, or to just decree that we have a java2 which is not compatible with retroactively renamed java1. You can probably use tooling to automatically rewrite java1 code to java2 code and it all looks quite similar, but compatibility is gone. Whilst popular languages have tried this before (python2 to python3, for example), the transition took decades and was extremely painful. It's a matter of opinion, but Team Lombok thinks that, however much you may find that Optional is a better strategy to put NUI into the type system that annotation-based type tags, breaking java in two is not worth that.

Contrast to type-tagging NUI, and you can transition java in a completely backwards compatible fashion. Echoing the introduction of generics in java5: Existing libraries can just start gaining the appropriate type tags, and until then, the legacy variance covers the transitional period.

The introduction of generics truly was a marvel: It's like trying to get the UK to switch to driving on the right side of the road, and somehow managing to make it possible to say to all drivers: "Just.. start driving on the other side at some point, when you feel like it".

And yet, legacy variance is that amazing. Map can be updated5:

// at some later point, after introducing NUI type dimension
interface Map<K, V> {
    public @NonNUI V get(Object key);

Any code that hasn't been updated yet will automatically typeconvert this return value to LegacyNUI V and thus nothing will change for them, other than that the compiler will start warning you that it cannot guarantee NUI safety. Code that has updated will now be made aware that the result of invoking map.get may hold NUI and will thus cause errors if code fails to take this into account.

A few things would have to be different. In particular, the mechanism by which code indicates whether it's still on legacy or has upgraded needs to be different, because surely asking java coders to spam !, ? and some other symbol for that third NUI-ty would be bad. Java aint perl, after all. Presumably, something at the very top of every source file, or in the package or module file, to indicate what default you want (presumably, not null), and then just the ?, or @Nullable, or something similar to indicate an alternative position on the type dimension of allows-NUI. Various annotation based nullity systems already work like this; they have a @NonNullByDefault annotation you can put on a class or even on an entire package.

[5] This shows another complexion of type dimensions: It must be possible to modify the state of a type in one particular dimension without affecting the type itself or any of the other dimensions: Here we want to convey that get returns a type-variable (V), but that the variable has been modified: Whatever position that V had on the type dimension of 'NUI-holding', what get returns is locked in the position of 'can be NUI' on that type dimension. But, otherwise, don't change anything: If it is a Map<Integer, @HtmlSafe @LowerCased @NonNUI String>, then map.get(k)'s type is @HtmlSafe @LowerCased @Nuiable String. Complicated perhaps. But possible, and very powerful.

Lombok's position on null and Optional

Team Lombok thinks that type dimensions are inherently a better language design than Optional (invariant NUIness that must be unwrapped ASAP). In other words, Team Lombok thinks that this aspect of the language design of e.g. Scala and Haskell is bad - a better language would have embraced type dimensions instead. However, that's a somewhat weakly held opinion. Maybe the complexity inherent in type dimensions is too high a cost to pay for the power that it gives you.

However, if only java is considered, then Team Lombok thinks that type dimensions are inherently better than Optional, and this is a strongly hold opinion: The significant damage that Optional does to backwards compatibility is a show-stopper: That is definitely far too high a cost to pay for the benefits, especially considering that an alternative (Type dimensions) exists. We can be swayed, but this would probably involve explaining how you can introduce an Optional style system in a backwards compatible way.

Hence, any feature requests that involve optional will likely be denied. For example, a request for e.g. @Getter on a String field to return generate a getter with an Optional<String> return type.

However, any feature request that moves the type dimensions concept forward is quite welcome.

One could wish for lombok to cater to both potential futures, or one could wish for a future where everybody just uses what they are more comfortable with; nullity annotations for some, optional for others.

But this is clearly a bad idea: nullity annotations can convey everything optional can and more, and having two different ways to accomplish the same thing is by itself obviously bad: It leads to pointless style debates and complexity for anybody needing to write code that interacts with systems that made different choices. Surely a casual glance at the state of logging frameworks in java is enough to convince the reader of this. Trying to cater to both ideas is just helping the java community dig its own grave, in that it'll likely lead to a future where both systems are in use, thus leading to annoyances and style debates. Team Lombok doesn't think this is a proper choice either.