Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nullness: Generic type semantics #19

Closed
wmdietlGC opened this issue Jan 10, 2019 · 27 comments
Closed

Nullness: Generic type semantics #19

wmdietlGC opened this issue Jan 10, 2019 · 27 comments
Labels
design An issue that is resolved by making a decision, about whether and how something should work. nullness For issues specific to nullness analysis.
Milestone

Comments

@wmdietlGC
Copy link
Collaborator

Type parameter declaration

A type parameter has both an upper and a lower bound.
The top of the type hierarchy is @Nullable j.l.Object and the bottom is @NotNull j.l.Void.
The null literal has type @Nullable j.l.Void.

See #12 (comment) for a discussion of upper bounds.

An annotation on the type parameter declaration itself can be used to annotate the lower bound:

class NullonlyData<@Nullable T extends @Nullable Number> { T g; }

Note that the lower bound annotation is important to decide what assignments are possible.
Within NullonlyData, null can be assigned to field g, because NullonlyData can only be instantiated with an @Nullable type argument.
We feel that this is useful in rare cases and propose to forbid annotating a type parameter declaration, even the short form class C<@Nullable T>. Instead the explicit class C<T extends @Nullable Object> should be used.

Type parameter use

An annotation on a type parameter use overrides the upper and lower bound annotation.

class Cache<T extends @Nullable Object> { @Nullable T cache; }

Within class Cache it is possible to assign null to field cache, because of the @Nullable annotation.

Wildcards

A wildcard can have an annotation on the wildcard itself and on the explicit bound. The annotation on the wildcard itself applies to the bound that is not explicitly specified.

class List<T> {  // @Nullable Object upper bound; @NotNull Void lower bound
    T f;
}

- List<?> l1; // @Nullable Object upper bound; @NotNull Void lower bound
    Read of l1.f: @Nullable Object, from upper bound of T
    Write of l1.f: not possible, as lower bound is @NotNull Void

- List<@Nullable ?> l2; // both bounds @Nullable
    Read of l2.f: @Nullable Object
    Write of l2.f: possible, but only with `null`

- List<? extends Number> l3;
    Read of l3.f: The type depends on the upper bound from the type parameter declaration
        and the bound of the wildcard, so here we get the intersection of @Nullable Object
        and @NotNull Number, so in effect we can read @NotNull Number
    Write of l3.f: @NotNull Void lower bound, no writes

- List<? extends @Nullable Number> l4;
    Read of l4.f: effectively @Nullable Number
    Write of l4.f: @NotNull Void lower bound, no writes

- List<? super Number> l5;
    Read of l5.f: Upper bound from type parameter declaration, @Nullable Object
    Write of l5.f: @NotNull Number types or their subtypes can be written.

- List<? super @Nullable Number> l6;
    Read of l6.f: @Nullable Object
    Write of l5.f: @Nullable Number types or their subtypes can be written.
@wmdietlGC wmdietlGC added the nullness For issues specific to nullness analysis. label Jan 10, 2019
@wmdietlGC wmdietlGC changed the title Generic type semantics Nullness: Generic type semantics Jan 10, 2019
@kevinb9n
Copy link
Collaborator

A type parameter has both an upper and a lower bound.
The top of the type hierarchy is @Nullable j.l.Object and the bottom is @NotNull j.l.Void.

It seems questionable to consider Void a bottom type. It isn't really (I can't write Void v = null; String s = v;). This is also an odd thing to say since @NotNull Void is practically-speaking nonsense.

If Java reference types have any bottom type at all it's the special undenotable type of the null literal. That might be more correct but also is hardly worth talking about.

Another question I want to raise is whether it is actually fruitful to talk about augmented types as existing in a single graph at all. Is it a viable alternative instead to say that we are imputing a two-dimensional type system, where types are assignable if they are BOTH assignable according to Java's rules AND according to our rules for assignability of nullability? That seems easier to understand and also more accurate to me.

The null literal has type @Nullable j.l.Void.

(Again, I don't think it does, and even if it did this would still fail to explain how I can assign it to String anyway.)

An annotation on the type parameter declaration itself can be used to annotate the lower bound:

class NullonlyData<@Nullable T extends @Nullable Number> { T g; }

Note that the lower bound annotation is important to decide what assignments are possible.
Within NullonlyData, null can be assigned to field g, because NullonlyData can only be instantiated with an @Nullable type argument.
We feel that this is useful in rare cases and propose to forbid annotating a type parameter declaration, even the short form class C<@Nullable T>. Instead the explicit class C<T extends @Nullable Object> should be used.

I'm confused what this is saying. Is it saying "here is what annotating the parameter itself could mean, but we propose to not support that?"

Type parameter use

An annotation on a type parameter use overrides the upper and lower bound annotation.

class Cache<T extends @Nullable Object> { @Nullable T cache; }

Within class Cache it is possible to assign null to field cache, because of the @Nullable annotation.

Yes, this is how I would have expected to address this scenario, and it doesn't seem to require the concept of specifying both upper and lower bounds.

Wildcards

A wildcard can have an annotation on the wildcard itself and on the explicit bound. The annotation on the wildcard itself applies to the bound that is not explicitly specified.

Can we (please) similarly decide that we don't need this either?

@dzharkov
Copy link
Collaborator

Should we have some kind of annotation that should be applied for types based on generic parameters with a meaning that nullability of that type depends on actual type argument?

For example:

    interface Box<T> {
        void put(T value);
    }
    
    void foo(Box<@Nullable String> nullable, Box<@NotNull String> notnull) {
        nullable.put(null); // should not be a warning here
        notnull.put(null); // a warning is expected
    }

How should we annotate Box::put to distinguish the cases in foo method?
It might be a major problem once we start annotating generic collection-like things

@kevinb9n
Copy link
Collaborator

For this code snippet are we under a @DefaultNotNull or @DefaultNullable or legacy scope?

Here's how I think I would write it while not relying on any defaults:

    interface Box<T extends @Nullable Object> {
        void put(T value);
    }
    
    void foo(
            @NotNull Box<@Nullable String> nullable,
            @NotNull Box<@NotNull String> notnull) {
        nullable.put(null); // should not be a warning here
        notnull.put(null); // a warning is expected
    }

My assumption has been that this will behave as described. Werner, confirm?

@dzharkov
Copy link
Collaborator

dzharkov commented Jun 14, 2019

My concern is that if we're under @DefaultNotNull scope or value parameter is marked with @NotNull explicitly, usually, it might mean that the method doesn't accept nulls even for Box<@Nullable String> nullable

Some time ago, we've been discussing a similar problem related to KEEP-99 and came to the idea that we need another special annotation called @StrictNullness that when being applied to a type parameter or a type based on a type parameter makes it respect the argument nullability.

@abreslav
Copy link

abreslav commented Jun 16, 2019

I second what @dzharkov is saying:

If we have effectively @NotNull T somewhere and substitute @Nullable String for T, we should get @NotNull String as a result (or use-site semantics will break). If we have a legacy T (T! in Kotlin terms) and substitute @NotNull String for it, we must get legacy String (String! in Kotlin terms). So, it seems like we need something like @ParametricNullity T (name is completely random) to represent the semantic of a proper type variable which upon substitution gives exactly the type it's substituted with.

Example showing the differences:

class Box<T extends @Nullable Object> {
    @NotNull T getNN() { ... }
    @Nullable T getN() { ... }
    T getLegacy() { return null; }
    @ParametricNullity T getT() { ... }
}

var bnn = new Box<@NotNull String>();
var bnn_nn = bnn.getNN(); // @NotNull String
var bnn_n = bnn.getN(); // @Nullable String
var bnn_l = bnn.getLegacy(); // legacy String
var bnn_t = bnn.getT(); // @NotNull String

var bn = new Box<@Nullable String>();
var bn_nn = bn.getNN(); // @NotNull String
var bn_n = bn.getN(); // @Nullable String
var bn_l = bn.getLegacy(); // @Nullable String (or legacy String)
var bn_t = bn.getT(); // @Nullable String

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 17, 2019

I see. I have been assuming that type variable usages always have parametric nullability by default -- even inside a @DefaultNotNull or @DefaultNullable scope. If we were to apply those defaults to type parameter bounds, it just unnecessarily restricts what parameterizations will be allowed, and if we apply them to type variable usages in signatures -- well, it just seems uncalled for. It should be enough to apply the default to parameterizations themselves instead, right?

However, if we're NOT in a defaulted scope, we could be in legacy code, and then I suppose that parametric nullability by default could create an incompatibility, is that it? Is that the only problem with the approach just described?

Also: still wanting a response to #issuecomment-500547338

@abreslav
Copy link

if we apply them to type variable usages in signatures -- well, it just seems uncalled for

Not sure about this. @DefaultNotNull for type variables looks like something uncalled for indeed, but @DefaultNullable makes perfect sense as in "I want all my Java code to be nullable by default, as it's Java after all". Then a method returning T is meant to be @Nullable T, because it can inject a null even if the parameterization for T is @NotNull.

However, if we're NOT in a defaulted scope, we could be in legacy code, and then I suppose that parametric nullability by default could create an incompatibility, is that it?

I'm not sure what you mean by "incompatibility". The problem is: it can be simply wrong. As in:

class C<T> {
    final T t;
    C(T t) {
        this.t = t;
    }

    T get() {
        if (<<some condition>>) return null;
        return t;
    }
}

Let's consider two cases:

  • C is in legacy mode

Then as C<@NotNull V>.get() still can return null, we have to treat it as legacy V, not @NotNull V.

  • C is under @DefaultNullable

If we treat C.get() return type as parametric, the checker should complain about return null; in the body of get(), which may be confusing to the user: "I'm already in the default nullable mode, why do I need to say @Nullable T get() explicitly?"

@abreslav
Copy link

Also: still wanting a response to #issuecomment-500547338
Sorry, here it is:

It seems questionable to consider Void a bottom type. It isn't really (I can't write Void v = null; String s = v;). This is also an odd thing to say since @NotNull Void is practically-speaking nonsense.

It being nonsense is the exact reason why it would theoretically work as a bottom type. But I don't think it's a good idea to use any Java class to signify a bottom type.

If Java reference types have any bottom type at all it's the special undenotable type of the null literal. That might be more correct but also is hardly worth talking about.

This is not entirely correct wrt nullity because the type of null is not a subtype of any non-null type. Only an empty type is, like "the non-null version of the type of null".

Is it a viable alternative instead to say that we are imputing a two-dimensional type system, where types are assignable if they are BOTH assignable according to Java's rules AND according to our rules for assignability of nullability?

This is something to discuss, but my gut feeling about it is rather pessimistic.

I think this conversation is somewhat similar to the discussion of the Glossary document.
The essence seems to be: on the one hand, to make reasoning about types sane we need some uniformity in formal properties of types (every type parameter has both bounds, subtying is defined as a standard lattice, every class has 0 or more type parameters, all type arguments are wildcards, etc), on the other hand, people who don't care about writing down formal rules (i.e. everybody who's not implementing a checker or similar tool), don't care about these uniformities and may be more familiar with some other ways of speaking about types.

An annotation on a type parameter use overrides the upper and lower bound annotation.

I think the word "override", while describing the effect correctly, is not exactly accurate here. One can think of annotations on type variables (or any other type usages) as operators on the type as a set of values:

  • @Nullable X = {values of X} + {null}
  • @NotNull X = {values of X} - {null}

So, if we have C<T extends @Nullable Object>, within the definition of C, T is neither a nullable not non-null type (if used in the parametric sense):

  • on the one hand, it can not be safely dereferenced, because T may be @Nullable String and its value can be null
  • on the other hand, it can not be assigned null as a value, because T may be @NotNull String and thus reject null values.

Applying @NotNull or @Nullable indeed makes sure the type is non-null or nullable respectively, by removing/adding null from/to the set of values.

@kevinb9n
Copy link
Collaborator

if we apply them to type variable usages in signatures -- well, it just seems uncalled for

Not sure about this. @DefaultNotNull for type variables looks like something uncalled for indeed, but @DefaultNullable makes perfect sense as in "I want all my Java code to be nullable by default, as it's Java after all". Then a method returning T is meant to be @Nullable T, because it can inject a null even if the parameterization for T is @NotNull.

It might be better to force the explicit @Nullable anyway for this reason: the basic nature of a type variable is to defer to the code that is parameterizing the type to establish the type. Counteracting that should probably be explicit, I think.

However, if we're NOT in a defaulted scope, we could be in legacy code, and then I suppose that parametric nullability by default could create an incompatibility, is that it?

I'm not sure what you mean by "incompatibility". The problem is: it can be simply wrong. As in:

class C<T> {
    final T t;
    C(T t) {
        this.t = t;
    }

    T get() {
        if (<<some condition>>) return null;
        return t;
    }
}

Let's consider two cases:

  • C is in legacy mode

Then as C<@NotNull V>.get() still can return null, we have to treat it as legacy V, not @NotNull V.

Agreed, this was the problem I was thinking of when I said "incompatibility".

  • C is under @DefaultNullable

If we treat C.get() return type as parametric, the checker should complain about return null; in the body of get(), which may be confusing to the user: "I'm already in the default nullable mode, why do I need to say @Nullable T get() explicitly?"

I think the question here is "should this user need to think about / understand the concept of 'parametric nullability' at all?" I believe they will have to. The confusion you refer to would come from not understanding it yet. But once they do, I think they should see why the explicit @Nullable is necessary. A bare T defers to the parameterizer to choose the nullness.

@kevinb9n
Copy link
Collaborator

A "similar" (maybe) confusion that exists today:

class C<N extends Number> {
  N get() {
    return new Integer(1);
  }
}

"I don't get it, I said N extends Number; this extends Number..."

The user needs to learn: oh, the parameterization gets to choose, and I have to live with THAT choice, not my own.

@abreslav
Copy link

Extracted the wildcards topic to #31

@kevinb9n
Copy link
Collaborator

Rough consensus on Monday seemed to be:

If the type parameter has legacy nullness (whether by being defined as <T extends @LegacyNull Foo> or by @DefaultLegacyNull being in effect or by the code being wholly unannotated), then all unannotated usages of the type variable must have legacy nullness.

If the type parameter is not-null (whether via <T extends @NotNull Foo> or by DefaultNotNull being in effect), then unannotated usages of the type variable T can be said to be not-null.

However, if <T extends @Nullable> or T is covered by @DefaultNullable, we must not say that type variable usages of T are nullable. Instead they have parametric nullability. Only not-null expressions can be assigned to T, and expressions of type T can only be assigned to nullable types. (This situation is oddly similar to that of legacy nullness, actually, however I wouldn't recommend literally treating them as having legacy nullness because I don't think they merit supporting lenient conversions the way legacy does.)

@wmdietlGC
Copy link
Collaborator Author

If I have a class Box<T extends @LegacyNull Foo> {} is a usage as both Box<@Nullable C> and Box<@NotNull C> valid without warning or error?
The not-null <: legacy <: nullable type hierarchy from #33 wouldn't allow this.

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 21, 2019

Hmm. We've said so far that usages of T within Box always have legacy nullness and NOT parametric nullness:

class Box<T extends @LegacyNull Foo> {
  void in(T t) { ... } // T has legacy nullness
  T out() { ... } // T has legacy nullness
}

So my first observation would be that neither Box<@Nullable C> nor Box<@NotNull C> even accomplishes anything in the first place. The parameter and return types would still have legacy nullness anyway (at least that's how I've interpreted it).

In theory that's because it can't accomplish anything. @NotNull can't assure me that out won't give me a null, and @Nullable can't assure me that in will accept one. In concept @NotNull could be made to apply just to in ("I want to prevent myself from sending a null here") and @Nullable could apply just to out ("I want to make myself check for null results"), if we actually wanted to create different sets of assignability rules to be used in different contexts (!!). But these certainly seem like lesser concerns than the assurances we really wanted.

Anyway, if it is true that trying to refine a legacy-bounded parameter to be nullable or notnull has no useful effect then we will want to make it illegal. And the assignability rules themselves wouldn't explain why it was illegal.

[EDIT: I no longer believe this; while compiling Box itself T must have legacy nullness, BUT, while compiling a dependent on Box it should be allowed to promote the type argument to have known nullness; I believe doing so should issue one unchecked-nullness warning at that point, but that's better than what this comment was suggesting which is that those warnings would spew at every usage after that.]

@donnerpeter
Copy link
Collaborator

The unannotated Box example from ^ looks suspiciously like List, and people seem to want to annotate its parameters: List<@NotNull String> or List<@Nullable String>

@wmdietlGC
Copy link
Collaborator Author

I think it would be important to support gradually migrating code. That is, a legacy generic type can be instantiated as either Box<@NotNull String> or Box<@Nullable String>.
To make this useful, I suggest a slightly different interpretation of legacy type parameter usage.

In the earlier comment #19 (comment) we said:

"If the type parameter has legacy nullness (whether by being defined as <T extends @LegacyNull Foo> or by @DefaultLegacyNull being in effect or by the code being wholly unannotated), then all unannotated usages of the type variable must have legacy nullness."

I would want the result to be the least upper bound (assuming not-null <: legacy <: nullable) of "legacy nullness" and the type argument.
That is, if I have a legacy Box<T> if I have a Box<@NonNull String> the effective type in in and out is @Legacy String. That accounts for us not knowing what the legacy class does and legacy behavior makes usage more convenient.

However, if we have a Box<@Nullable String> we should get @Nullable String in the method signatures. That will give more conservative behavior, which is clearly intended by the @Nullable type argument.

Does this make sense?

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 24, 2019

That seems more conservative only with regard to return types, but it seems overly lenient wrt parameter types. In a legacy class we don't know that it's promising to accept null for any particular parameter. I think we'd be reluctant to start making different rules for parameters vs. return types.

But, perhaps both Box<@Nullable String> and Box<@NotNull String> should be possible via suppressing a warning.

(Note: I continue to feel that the relationships that "unknown-nullness" has with each of "not-null" and "nullable" ought to be symmetrical, unless we decide we really want to create asymmetric assignability rules.)

@wmdietlGC
Copy link
Collaborator Author

If we hope that most code is in either a @DefaultNotNull or @DefaultNullable scope, issuing a warning for every use of a legacy generic type seems rather noisy.

@kevinb9n
Copy link
Collaborator

I might say that using legacy code without bothering to provide the correct annotations (whether via external file or not) should have some small disincentive. But the warnings we're talking about are only the "unchecked"-type warnings that we think many users will leave turned off anyway. So I don't think we really have a noise problem?

@kevinb9n
Copy link
Collaborator

New clarity, hopefully:

If I have a class Box<T extends @LegacyNull Foo> {} is a usage as both Box<@Nullable C> and Box<@NotNull C> valid without warning or error?
The not-null <: legacy <: nullable type hierarchy from #33 wouldn't allow this.

It would allow the latter, right? And though it wouldn't allow the former by virtue of assignability, my stance in #33 has been that it should still allow it by virtue of unchecked conversion, the distinction being importantly mostly because these two should be separately suppressable. (As we've said many times, most users will keep the unchecked-type warnings off most of the time.)

@stephan-herrmann
Copy link

[...] the basic nature of a type variable is to defer to the code that is parameterizing the type to establish the type. Counteracting that should probably be explicit, I think.

I agree. Indeed Eclipse's @NonNullByDefault does not apply to any use of a type variable (nor to wildcards).

@stephan-herrmann
Copy link

[...] If the type parameter has legacy nullness [...]

I see two different kinds of unconstrainedness:

  • legacy: either this is old code or the user was lazy :)
  • unannotated type parameter: the owner of this type explicitly leaves the choice to her clients. Still any instantiation of the type should be fully typed (legacyless).

After going around several issues, claiming we don't need the third annotation, would this perhaps be the counter example: providing a way to distinguish the above kinds of unconstrainedness? If so, what would we win from that?

Or could we still claim: anything under a default nullness must be developed with awareness of nullness, so any unconstrained type parameter is variant (2) above.

In Eclipse we apply heuristics to warn here:

// legacy:
class Foo<T> {
    T foo() { return null; }
}
// new code:
class Client {
    @NonNull Object test(Foo<@NonNull Object> f) {
        return f.foo(); // warning here
    }
}

This will answer at the indicated location: "Unsafe interpretation of method return type as '@NonNull' based on substitution 'T=@NonNull Object'. Declaring type 'Foo' doesn't seem to be designed with null type annotations in mind"

I admit this is quite a mouthful, but it serves its purpose so far.

If we follow the rules of unchecked conversions then perhaps we should flag the declaration Foo<@NonNull Object> because @NonNull Object is not a strictly valid substitution for @Legacy T (which we can't currently detect).

@cpovirk
Copy link
Collaborator

cpovirk commented Aug 2, 2019

RE: legacy:

I like the idea of permitting only Box<@LegacyNull Foo>. This feels to me little different than the case of any other unannotated class. Compare:

class Box1 {
  Object get();
  void set(Object o);
}

vs.:

class Box2<T> {
  T get();
  void set(T o);
}

I'm not sure why we should view it as a goal to let the user refine the parameter and return types of the methods on Box2 but not on Box1. True, I understand that there's some argument for doing so because we can, since the user has to put something inside the angle brackets when writing Box2<...>. But if we think our system of annotations can be good enough when it has no way to refine the types of Box1, then I think we can make the argument that it's good enough when it has no way to refine the types of Box2, too. At some point, users either have to suppress, perform a lot of null checks, or write stubs.

@wmdietlGC
Copy link
Collaborator Author

I think issue #57 discusses some of these topics now.
I agree that for a type parameter with an @UnknownNullness bound, only instantiating with @UnknownNullness should be allowed without warning: #57 (comment)

@stephan-herrmann
Copy link

I understand that there's some argument for doing so because we can,

That is not my reasoning. The purpose of a type parameter is to pass the choice of type to the client. It would be against the nature of generics, to exclude some property of the type (nullness) from this.

@cpovirk
Copy link
Collaborator

cpovirk commented Aug 12, 2019

That is not my reasoning. The purpose of a type parameter is to pass the choice of type to the client. It would be against the nature of generics, to exclude some property of the type (nullness) from this.

I think this is fundamentally the same discussion as on #45 and on #62.

I would be in favor of letting users specify the nullness of the type parameter if the owner had specified the nullness of its usages. However, in both the Box1 and Box2 classes above, the owner hasn't given us information about nullness of usages. So it's just as unsafe to assume that Box2.get and set can conform to user-specified nullness as it is to assume that Box1.get and set could (if users even had a way to specify in that case).

My go-to example has been Optional, but maybe Map is the better example here: Even if we were to permit users to instantiate an unannotated Map type as Map<@NotNull String, @NotNull Integer>, that doesn't mean it's sound to infer that the return type of get is @NotNull Integer.

As in #45, our spec-writing eyes are laser-focused on the @UnknownNullness case. In practice, many types will be annotated (directly in the sources or in stubs) as, e.g., interface List<E extends @Nullable Object>. With such types, for which we also have nullness information about type-parameter usage sites, users can choose to instantiate as List<@NotNull String> or List<@Nullable String>, and the right thing will happen.

@kevinb9n
Copy link
Collaborator

I'm closing this out as being too broad; I think it is covered by more focused issues. If you made points in here you don't want to get lost, please try to figure out where they more properly belong.

@kevinb9n kevinb9n added the design An issue that is resolved by making a decision, about whether and how something should work. label Mar 13, 2024
@kevinb9n kevinb9n added this to the 0.3 milestone Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design An issue that is resolved by making a decision, about whether and how something should work. nullness For issues specific to nullness analysis.
Projects
None yet
Development

No branches or pull requests

7 participants