Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A single shorthand scheme for augmented types, for prose and other non-source-code contexts #129

Open
cpovirk opened this issue Sep 28, 2020 · 44 comments
Labels
documentation For issues related to user-facing documentation nullness For issues specific to nullness analysis.
Milestone

Comments

@cpovirk
Copy link
Collaborator

cpovirk commented Sep 28, 2020

kevin1e100 and I touched on this a bit in #32, and I think it's come up elsewhere. It's not a question that it's JSpecify job to answer, but we should keep it in mind as we discuss naming of annotations -- and maybe features that could add more information to augmented types (like having both lower and upper bounds?). Plus, we have many tool authors following these issues, and the GitHub issue UI is nice (and permits cross-linking with our other discussions). So I'm creating an issue here about the question. If nothing else, the issue may help me resist the urge to talk about this further on other issues :)

One way for tools to print augmented types (in their error messages, in tooltips in IDEs, etc.) is to make them look much like the source -- @Nullable Foo, @NullnessUnspecified T, etc. [edit: This is complicated by the existence of both declaration and type-use annotations, either or both of which may be in use in a given file.]

Another way is to use special symbols, similar to Kotlin -- Foo?, T!, etc.

We already gave some thoughts about symbols (especially !) in the linked discussion on issue 32. (And I owe kevin1e100 a reply, which I'll put below.)

Another thing to think about is that Java already uses ? for wildcards. This isn't ideal, but it doesn't have to be a fatal flaw: I don't think there's true ambiguity. (I do seem to recall that, the Checker Framework sometimes says things like @Nullable ?. Under Kotlin rules, that would become ??, which is no good. Perhaps we'd say (?)? or something.... But I expect most tools to refer to captured type variables instead thereby sidestepping the problem (by producing T1?, etc.).)

We can also consider whether we'll someday have non-nullness-related kinds of type annotations. I don't know that we will, but you can at least imagine having a type-use @CompileTimeConstant, @FormatString, @TypeCompatibleWith, something related to immutability and thread-safety?? Anyway, my point is that we might end up ouputting types using special symbols for nullness but using annotations for the other cases. That's probably tolerable, too.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Sep 28, 2020

from #32:

@NullnessUnspecified is so noisy when it is never the cause of the failure

Thanks for observing this.

To me, ! suggests "not null"

That is in fact what it meant in Spec# I believe. I've at times used !?, FWIW, but am not claiming it's a great solution.

It may not be popular, but besides String!? there does seem to be another reasonable solution: just String could be used to mean @NullnessUnspecified String / String!? in error messages as long as both other options, @NonNull String / String!! and @Nullable String / String?, are always made explicit. This would remove "unspecified" clutter as desired and is consistent with the observation that

the token "@NullnessUnspecified" never appears in the source file I'm imagining.

I understand that this would take us away from Kotlin's error messages and is incompatible with the desire to generally write String!! as String. Instead, this solution would leverage "unspecified" as the default-default (so should still be reasonably intuitive to Java users).

Thanks for the reference to Spec#. I could probably go along with !?, among other options.

As for making String itself stand for @NullnessUnspecified String: My main concern is type-variable usages. We need a way to distinguish "T in annotated code" from "T in unannotated code." Using a more Kotlin-style notation, they're T and T!?, respectively. If we instead use T for the unannotated code, then "T in annotated code" probably has to become T!. That may sound like "T but not null," when in fact T might still permit nulls.

(Maybe this is solvable by replacing ! with a different symbol in that model. Avoiding ! may be nice, anyway, as you say, to avoid the Kotlin connection.)

I also feel a little sad that annotating more code for nullness may make error messages slightly longer (by adding ! in places). But I'm not too concerned about a single character, even when multiplied by n types.

And I do acknowledge the upside that using String for unspecified matches what users see today in Java -- and somewhat matches what they will continue to see in the various non-nullness-related error messages they get from javac.

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 17, 2021

I'm definitely in favor of this project developing a body of recommendations for tool authors in addition to our specifications.

And I think consensus guidelines about error/warning messages are a great fit for that.

And I heartily approve of concise notations for this (String? for @Nullable String). The notation must be universal regardless of whether the offending code is under @NullMarked or not, [so String must mean what it always has] [edit: nah].

@netdpb netdpb added this to the Release-independent milestone Oct 6, 2021
@kevinb9n
Copy link
Collaborator

I've realized that I want a good way to talk about nullness-augmented types in prose, and I think that need and this may be the same need.

We often do this in prose by writing out source code, but it's unwieldy. Plus technically we aren't bound by this convention, especially since what source code shows isn't really types anyway; it only shows declarations of types and usages of types, never types "themselves".

Optimistically, we can teach users that the source code @Nullable String represents a usage of a type that we write down in prose as String? and that we pronounce verbally as "string-or-null". That primes them to understand compiler messages once they start seeing them. IntelliJ probably already has an option to render @Nullable String that way in the editor as it is (?).

So then what notation for the other two kinds? I would suggest:

  • For unspecified, String*, pronounced "string-maybe-or-null" or "string-unspecified" (or "string-oblivious" to be like C#)
  • For non-null, String!, pronounced "string" because after all we should be saying everything with enthusiasm!

I think it at least makes sense to decorate all three kinds for these reasons

  • All three carry "specific knowledge", whereas an unannotated usage of a nullable-bounded type variable (UUONBTV?) has parametric nullness, which is like the lack of all specific knowledge, and that seems like the thing to show unadorned.
  • While it does seem like the fact that javac messages will show unadorned base types is an argument for letting unadorned mean unspecified, I'm not so sure. If nullness messages are always using one of String!/String*/String? then a message including just String will look more clearly like something that's just not trying to talk about nullness at all, which I think might be good.

@kevinb9n kevinb9n changed the title Discussion for tool authors: how to print augmented types? How shall we describe augmented types in prose, which tool owners may take after in their finding messages? Apr 22, 2022
@kevinb9n kevinb9n modified the milestones: Release-independent, 1.0 Apr 22, 2022
@cpovirk
Copy link
Collaborator Author

cpovirk commented Apr 22, 2022

I think I'd be good with that in prose.

In error messages, I'd feel a little sad adding even one character for the non-null case (i.e., saying String! instead of String). One character is not a big deal, but it adds up for complex types. That goes not just for printing parameterized types individually but also for printing whole signatures, which may contain a number of types.

Also, the difference from Kotlin's meaning is unfortunate.

Neither of those is a deal-breaker on its own. But put them together, and I think we'd be better served by String in error messages, especially when users are likely to be switching between Java and Kotlin error messages as they work.

I think it would be OK for prose and error messages to differ in this respect: Prose needs to express "When you write Foo, you get Foo! or Foo*, depending on...." In contrast, error messages live in a world in which we've already converted the source code to types. It's true that users will also see compile errors that are unrelated to nullness, and there is definitely value in making those look distinct. But I think the greater danger is in making nullness error messages look scarier than they need to.

@kevinb9n
Copy link
Collaborator

I would very very much like us to be able to settle on ONE scheme for how to notate these types. So I think we have some tension over whether non-null should be explicitly marked with !.

One complication is that we're supporting non-null projection of type variables (for the moment), meaning that T and T! have distinct meanings (unprojected T, and T projected to non-null).

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 22, 2022

I do think this is important to nail down and rely on in communication/documentation/etc.

The suffix that goes naturally with each of the 4 nullness descriptors in Chris's current spec are:

  • UNION_NULL "?"
  • MINUS_NULL "!"
  • NO_CHANGE ""
  • UNSPECIFIED "*"

For type variables these are all distinct and all make sense. (Well, maybe the last doesn't, but let's move on)

For non-variable reference types in null-marked code, "" and "!" are the same in effect. I propose we promote the explicit, redundant usage of the "!" suffix to provide needed clarity, in all contexts. This scheme will then work equally well for talking about null-unmarked code, where "no change" might have seemed to mean "unspecified".

I think the only case I left uncovered is value types (i.e. the 8 primitives) and suffixes should just not apply in that case.

So the proposal:

  • ? suffix always means UNION_NULL and is always used for that
  • ! suffix always means MINUS_NULL and is always used for that even when strictly unnecessary
  • * suffix always means UNSPECIFIED and is always used for that
  • No suffix always means NO_CHANGE and is only recommended for type variable usages and value types (for nonvariable reference types, use MINUS_NULL/! instead)

Also, as I think maybe David was the first to suggest, I think Foo!<Bar?> is much nicer than Foo<Bar?>!.

I think our reference checker should follow this scheme in its error messages. But if Tool X doesn't want to, okay. We could try out letting these actually show to our internal users and see how it goes.

@netdpb
Copy link
Collaborator

netdpb commented Jun 22, 2022

I like this proposal generally. I'm a little worried about leaving out the ! for value types. Right now the primitive types are well known and spelled differently by convention (lowercase initial letter), so it's easy to remember that int doesn't include null. But in a Valhalla world there will be value types that are not well known and are spelled with an initial uppercase letter just like reference types, and I think those should use !.

What's wrong with always using int! for consistency? (Not disallowing int, but not preferring it either.)

@cpovirk
Copy link
Collaborator Author

cpovirk commented Jun 22, 2022

In error messages in particular, do you think that the lack of ! would lead to confusion?

For example, if a user saw this...

my/project/File.java:79: error: invalid return type
    method return type: Foo
    expression type: Foo?

...I think that it would be pretty clear that the method was returning a nullable value when a non-nullable value was expected.

Maybe there are cases in which ! would be more useful. But I think it's more likely to be a net negative, as laid out in #129 (comment). To expand slightly on that: It would be a nice carrot for error messages to become simpler as users specify the nullness of more types -- in particular when the alternative is for error messages to look more like Kotlin but with slightly different meaning!

@netdpb
Copy link
Collaborator

netdpb commented Jun 22, 2022

I think we should consider specifications separately from tool outputs.

I think specifications, including these design discussions and conformance tests, should always prefer the most unambiguous terms, including int!.

On the other hand, tool outputs such as error messages can be freer to prefer shorter forms when they expect users to take context into account. In any case, we don't intend to constrain tool output very harshly.

I'd like not to care very much about tool outputs. I don't expect Kotlin to change how they refer to platform types (with !), for example, even though that directly conflicts with our notation here.

@kevinb9n
Copy link
Collaborator

But in a Valhalla world there will be value types that are not well known and are spelled with an initial uppercase letter just like reference types,

News: I've been pushing hard that the value type always be spelled Foo.val (where Foo is the corresponding reference type), and at the moment I seem to be winning. But if it goes the other way, we'll have plenty of time to reconsider.

What's wrong with always using int! for consistency? (Not disallowing int, but not preferring it either.)

My feeling is that people would find it strange, distracting, and/or overzealous.

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 22, 2022

I think we should consider specifications separately from tool outputs.

Among tools, my proposal is addressing only the reference checker's output; we won't tell tool owners how to write their error messages. As the issue title says, tools "may take after" these decisions, though.

I think specifications, including these design discussions and conformance tests, should always prefer the most unambiguous terms, including int!.

Note that I'm adding user-facing documentation to that list. We have to call the types we're creating something, and calling them e.g. @NonNull Foo in prose is bad in a few ways.

@kevinb9n
Copy link
Collaborator

kevinb9n commented Jun 22, 2022

In error messages in particular, do you think that the lack of ! would lead to confusion?

What I feel strongly about is that a lack of a single consistent scheme that we follow everywhere (user guide, javadoc, conformance tests, ref checker messages, glossary, user communications, etc.) will lead to confusion -- again, tool owners will do what they please, but we do have our sphere of control and of influence, and I think we should try to influence them in this direction.

It is already an issue of some size that our scheme must diverge from Kotlin's. It can at least be consistent with itself.

Maybe there are cases in which ! would be more useful. But I think it's more likely to be a net negative, as laid out in #129 (comment).

I acknowledge those arguments, but I'm proposing that the need for a single consistent scheme outweighs them.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Jun 22, 2022

Hmm, I am confused:

  • Are we trying to make a decision that applies only to the reference checker or also to docs, spec (if it uses this format -- currently it doesn't), conformance tests, or anything else? By "My proposal is addressing only the reference checker's output," I think you are excluding only other tools, while still including those other things?
  • You said that omitting ! would be inconsistent. What would it be inconsistent with? Were you responding to the fact that I was focusing on error messages, opening the possibility that they'd be inconsistent with the other places we might need to describe types?

@netdpb
Copy link
Collaborator

netdpb commented Jun 22, 2022

I don't think the reference checker has to be concise in its error messages.

I do think it would be good if the reference checker's output matched what we do in specifications and conformance tests.

I think we should be aggressively unambiguous in specifications and conformance tests.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Jun 22, 2022

If I were drawing a line, I'd probably have:

  • specification, conformance tests, maybe reference checker

vs.

  • user-facing docs, other checkers

In particular, I think users would be best served by agreement between user-facing docs and other checkers -- not that we're trying to force checkers to do anything or vice versa, but we can each nudge one another in either direction.

It's awkward because, for the second bucket, both of the following are true:

  • In user-facing docs, ! addresses ambiguity, which is perhaps most likely to arise in such docs.
  • In other checkers' error messages, ! makes the output longer and introduces the possibility of confusion with Kotlin.

@kevinb9n
Copy link
Collaborator

Hey tool owners @artempyanykh @wmdietl @lazaroclapp @msridhar @akozlova @elizarov and anyone else.

JSpecify wants to have a consistent shorthand for augmented types, and overall I feel good about the proposal in #129 (comment)

What we don't know is whether, assuming JSpecify tried to advance that as the official shorthand, your tools would feel motivated to match it in your error messages and documentation and whatnot. It would help to have some idea about that.

We do realize that it is slightly different from what Kotlin does, and Kotlin would not change.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Aug 12, 2022

One possible nice thing about using ! (or really using any character instead of using nothing) is that error messages for complex types may align better:

expected: Map?<SomeKey!, SomeValue!>
but was : Map!<SomeKey!, SomeValue?>

vs.

expected: Map?<SomeKey, SomeValue>
but was : Map<SomeKey, SomeValue?>

(That still doesn't help once type variables enter the picture, since E and E! can be different types. But it can help outside of those cases.)

I imagine that different people could have different opinions on different error message as to whether the advantages of alignment outweigh the disadvantages of the extra ! characters:

  • Some people might employ a strategy of "Look at each type component and see where the mismatched character is." They'd likely benefit more from alignment.
  • Other people might employ a strategy of "Look at each ? in the actual type and see if it's present in the expected type." They'd likely benefit more from fewer characters.

(A fully correct process would have to be more complex than either of those, but they seem like reasonable starting points / heuristics.)

(RE: Kotlin: If only we had Unicode everywhere and could output ...)

@artempyanykh
Copy link
Collaborator

artempyanykh commented Aug 15, 2022

I really like this proposal. I've been pondering the idea of using a special notation for a while now; we already do a bit of customization in how we render types, so switching from annotations to symbolic operators would be pretty easy.

When it comes to the notation itself:

  1. 👍 ? to mean union null.
  2. 👍 * to mean unspecified (unspecified types are pretty rare in our code base as we invested heavily in models/overlays/signatures/whatever-the-name-is).
  3. 😕 always using ! to denote not-nullable types.

Regarding p.3 – it makes sense to use String! in the spec because you need the most explicitness and precision there, but in User Guide and error messages I struggle to see the added value. IIUC String and String! will denote the same type of not-nullable strings (in contrast with String* which denotes Java string type), so why bother, potentially confuse people?

Also, we have quite a bit of Kotlin and I can predict that ! on types will really confuse devs initially.

I guess, what I want to understand is why would we prefer explicit ! on not-nullable types, what are the benefits apart from consistency?

Assuming, we always denote types from null-unmarked code as Type* there shouldn't be ambiguity even when omitting the ! on types (correct?). So, I kind of see this dichotomy as

  1. We assume that people reading the spec/using the tool will have most of their code null-unmarked. Then String means Java's String, and String! has this extra punctuation to draw people's attention to the fact that it's a not-nullable String. This is pretty convenient for those new to the explicit nullness game, but may be irritating for those who have large chunks of their code null-marked.
  2. We prefer the null-marked view of the world by default. Then String means the intended not-nullable String which is good for people late in the nullness game, but may be confusing for those only entering this path.

@kevinb9n
Copy link
Collaborator

kevinb9n commented Aug 15, 2022

Thanks Artem. I do find this absolutely debatable, a main reason the issue remains open.

META-QUESTION

I see a lot of value in "one consistent recommended scheme (for everyone who's able to adopt it)". Could we agree that there's enough value that we'd each be willing to follow it even if the String/String! decision goes the way we didn't prefer? I would agree to that, personally.

CLARIFYING THE TWO OPTIONS

First I don't think the options are fully specified yet:

  • For always-suffix-for-non-null, does that include an unprojected usage of a non-null-bounded type variable? We are saying that those usages are definitively non-null, not parametric, but if we denote them as T! is that too misleading, suggesting a redundant non-null projection is happening? (It isn't necessarily harmful to just call them parametric, but it seems to withhold useful information.)

  • For no-suffix-means-non-null, we must decide how it would distinguish T from @NonNull T for a type variable. My instinct is that non-null projection is special enough that it deserves a special symbol (presumably ! but could be whatever). But parametric T might need a suffix too (what would it be? %?), because (a) otherwise when a user sees Foo<Bar<T>> they would need to think "not null, not null, maybe null", which seems iffy to expect; (b) a suffixless T would appear only for an unprojected usage of a non-null-bounded type variable... which I think is good?

ARGUMENTS

Some arguments for no-suffix-means-non-null:

  • For fully migrated code (which our designs have to serve indefinitely, not temporarily), users may experience exclamation point fatigue. (However, this might be overstated? They are probably encountering it only when there is an actual nullness-related issue, right?)

  • We're asking them to internalize "string means string" when writing their APIs, which then doesn't carry over here. (But this is explainable since string-means-string is only for fully null-marked code whereas these messages are universal.)

  • Similarity to kotlin.

Some arguments for always-suffix-for-non-null:

  • When the user sees Foo<Bar>, they will not always know that this a context where?/* would have been used had they applied. In particular, any message coming straight from javac will not have those. Or any other tool that isn't JSpecify-aware. In fact, even JSpecify-compliant analyzers for other purposes like @MayIgnoreResult should probably not bother confronting the user with null-augmented types (and wouldn't want to go to the trouble to compute those types).

  • We maybe wouldn't strictly need to come up with a symbol meaning "parametric". (But, we might want to anyway?, so that an unprojected usage of a non-null-bounded type variable could leave it out and be minimally misleading.)

  • Alignment (see cpovirk's last comment)

@lazaroclapp
Copy link
Collaborator

lazaroclapp commented Aug 15, 2022

So, I definitely don't mind having a compact, semi-formal, non-context-dependent (e.g. @NullMarked vs @NullUnmarked code) way of expressing nullability of types. For that use, I am not sure I feel strongly about T vs T! as the default rendering of a non-null T.

That said, as for making it part of NullAway's error messages, I see two issues with it at a glance:

  • As @artempyanykh mentions, having an explicit ! contradicts the view of "T means non-null T by default", at least within @NullMarked code, because now the code reads String, but we render it as String! every time.
  • The error messages are less actionable:
    • "Passing List!<String!> argument this.foo to the first argument of void C.bar(List!<String?>)" -> The developer needs to be aware of nullness types, their syntax, and then parse what this means before translating it back to 'I am in a @NullMarked context, and that means that to change the type to be String? I need an @Nullable annotation'.
    • "Passing List<String> argument this.foo to C.bar(List<@Nullable String> list) but List<@Nullable String> expected" -> No need to think. The message tells you what to change in the code to fix it: insert @Nullable before String in List<String> foo.
      No need to even know it's about typing, a simple "ah, I need to add @Nullable there to make the linter happy" level of reasoning suffices (sure, this shallow reasoning breaks when there is some complex design tradeoff about nullability, but the happy path is that the developer doesn't need to make any space on their mental cache for JSpecify/nullability types/etc, just 'apply fix suggested by tooling, continue thinking about the problem at hand')

p.s. Note that, after years of NullAway deployment and successful usage at Uber, we still get plenty of questions that imply that some developers don't have a mental model of NullAway as a type system / type extension. This is not a judgement on them; my understanding of, say, Hadoop, whenever I have to deal with it, is equally shallow. Ideal tooling error messages are: a) meaningful to those who wish to understand them, even without a lot of area knowledge, b) helpful even to those who just wish to 'cargo cult' their way around the tool 😉

cc: @msridhar

@kevinb9n
Copy link
Collaborator

I should have said that in any scenario where a tool is able and willing to explain in readable, pastable code form, it is always welcome to do that.

The "consistency" I keep referring to was meant to only cover "when a shorthand is desired, can we use a consistent one". I think, anyway, that it will be desired in enough circumstances including just in prose.

@lazaroclapp
Copy link
Collaborator

The "consistency" I keep referring to was meant to only cover "when a shorthand is desired, can we use a consistent one". I think, anyway, that it will be desired in enough circumstances including just in prose.

Agreed here. Not arguing against the main point, just noting that I think we would be unlikely to use such notation in user-facing messages. It might be useful within comments on NullAway's codebase itself or issues in our tracker, and definitely for communicating divergences/similarities with other tools :)

@netdpb
Copy link
Collaborator

netdpb commented Aug 17, 2022

Would it be okay for our documentation to describe the two notation schemes and to say that some tools, in some contexts, may use one or the other scheme, but that JSpecify specs and documentation will consistently use one (I assume require-suffix)?

@lazaroclapp
Copy link
Collaborator

lazaroclapp commented Aug 17, 2022

So, separate from whether we will use it in error messages:

I wouldn't hate either T or T!, but I think T! might be clearer for something that is a formal notation. My thoughts being that T! clearly indicates "non-null T" independent of whether we are in @NullMarked or @NullUnmarked code. I would generally expect to see T!, T?, and T* often, but unmodified T rarely when using this notation.

This decreases ambiguity and makes the meaning explicit without people having to refer to our definition of the notation (i.e. if people see T, they need to check how we resolved this issue, but T! is non-null either way). It makes the notation relatively verbose, though, and even less likely to be used for user-facing messages or docs. So, um, I guess this point not entirely separate from whether we will use it in error messages...

@kevinb9n
Copy link
Collaborator

From Werner

I know there is an issue for this, but I just wanted to note that I find the examples much more readable with this verbose syntax: it's exactly like I would have to write it in source code. I'm particularly worried about the array examples, where the String*[]![]? is confusingly different from the actual syntax String @NonNull [] @Nullable[], where the NonNull is where the * is in the shorthand, and the Nullable is where the ! is.

@netdpb
Copy link
Collaborator

netdpb commented Aug 22, 2022

It seems like @wmdietl's concern is based on the fact that our proposed notation occurs after (to the right of) the base type, while annotations occur before (left of). I'm not so concerned about that. The alternative might be !String (or for the longer example, *String![]?[]), which I find harder to read.

@netdpb
Copy link
Collaborator

netdpb commented Aug 22, 2022

I'd also be concerned about the verbosity of spelling out annotations, as well as the fact that most code (eventually) would omit @NonNull, so the actual syntax would not match the explicit syntax anyway.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Aug 30, 2022

Rewinding a bunch:

I think Foo!<Bar?> is much nicer than Foo<Bar?>!.

I think I could go for that. It's a little weird not to have the ! at the very end, given that it comes at the end for not only "normal" types but also arrays. But if you ignore arrays (which are weird no matter what), then the rule is that it comes right after the type name.

I'm curious if the Kotlin folks (@erokhins) gave thought to this when they were choosing their syntax. I note that they didn't have the array issue, since they use Array<Foo> for that. I also note that this is not just the format they use in error messages but also the syntax that appears in their source code :)

@netdpb
Copy link
Collaborator

netdpb commented Aug 31, 2022

I think I could go for that. It's a little weird not to have the ! at the very end, given that it comes at the end for not only "normal" types but also arrays. But if you ignore arrays (which are weird no matter what), then the rule is that it comes right after the type name.

The rule is consistent: the mark comes directly after the type component. [] comes after the component type vs. before, but that's on Java.

A nullable array of non-null Foos would be Foo![]?.

@erokhins
Copy link

erokhins commented Sep 1, 2022

I'm curious if the Kotlin folks (@erokhins) gave thought to this when they were choosing their syntax. I note that they didn't have the array issue, since they use Array for that. I also note that this is not just the format they use in error messages but also the syntax that appears in their source code :)

Well... We had the nullable types forever (way before flexible types) and used ? at the end of the type, i.e. List<String>?. I don't know if was there any discussion regarding List?<String> vs List<String>?... So List<String>! is just an evolution of that quite early decision for nullable generic types: List<String>?.

(Sorry, I'm not in the context, but I hope I got the question +- right)

@kevinb9n kevinb9n changed the title How shall we describe augmented types in prose, which tool owners may take after in their finding messages? A single shorthand scheme for augmented types, for prose and other non-source-code contexts Sep 12, 2022
@kevinb9n
Copy link
Collaborator

This comment assumes #275 is closed with no change (as I've proposed there).

Rereading this comment above ... I still find myself more strongly persuaded by the pro-! arguments.

But, if we decide to always use a one-character suffix in all cases, then if anyone really is determined enough to drop the !s they could at least do so unambiguously. I'd rather they didn't, but I like always-suffix for other reasons too.

We'd need a character to represent "parametric nullness", like % ("percent" starts with the same letter as "parametric", and if you look at the glyph and ignore its usual meaning it kinda is suggestive of "this or that".)

Perhaps depending on #248 then, we might also need a character for "assumed parametric nullness". :-/

@cpovirk
Copy link
Collaborator Author

cpovirk commented Sep 13, 2022

I still don't love ! but have figured I could live with it, at the very least as a trial to see how people (like Kotlin users) react. With % in the picture, I worry again that we're trying too hard to make our types look different from normal types and/or to express everything that a user could hope to know about a type in one character¹.

? is fairly well established; ! has precedent (but unfortunately precedent is split); * (I think) "generally makes sense once you see it" (but is the same as Kotlin's !); % has justification... but put all together, it's starting to feel like we're adding a step to the process of learning JSpecify that is "memorize the symbols," and that may be intimidating. Maybe it's a necessary step, but it nudges me personally a bit back toward outputting @Nullable String-style output, as discussed above.

¹ As outlined in #275 (comment), knowing that a type "has parametric nullness" doesn't exhaustively answer the nullness-subtyping questions you could ask about it. And as for needing another glyph for #248: I just replied to a non-GitHub discussion about it with the claim that both the options for #248 produce another "kind of nullness." So that may be more motivation not to try to represent each "kind" with a distinct glyph.

@kevinb9n
Copy link
Collaborator

kevinb9n commented Sep 13, 2022

but put all together, it's starting to feel like we're adding a step to the process of learning JSpecify that is "memorize the symbols," and that may be intimidating. Maybe it's a necessary step, but it nudges me personally a bit back toward outputting @Nullable String-style output, as discussed above.

That's a fair concern. I've noticed that any time I've used the shorthand I've felt compelled to give a link to the "key", over and over, and something feels a bit dodgy about that.

Let's flesh out a "scheme" that achieves the same goals as this one except for not being into the whole, you know, brevity thing.

What are those goals?

  • Encompass the base type plus all the nullness information we have about it, as completely and unambiguously as is... reasonable.
  • ... that's it?

Without a brevity goal, it makes all the sense in the world for that to look as much like code as possible. (Note that this is really separate from the case of a tool that wishes to output literally pasteable code; I take it as given that when that's what it wants, it should just do that. It doesn't address all the goals here though.)

  • It would seem important to always write it as though in null-unmarked context, so that unannotated Foo is unambiguous and because there's no way to write @UnspecifiedNullness Foo anyway. So, we should expect plenty of @NonNull Map<@NonNull Key, @NonNull List<@NonNull Value>>.
  • With the shorthand I think it's useful to just always show the nullness, even in int! or MyValueClass.val!. After all, there is positive knowledge of non-nullness. But I expect that seeing a lot of @NonNull int would not go over well.
  • I suppose that while unannotated Foo means "unspecified (or non-null if it's a value type)", unannotated T (type variable) has to mean parametric. I vaguely feel like there's going to be a problem here, not sure.

Overall, it doesn't feel like this would make anyone happy. I think that when writing a type usage in code vs. when talking about the type in a non-code context we may just have different requirements.

@netdpb
Copy link
Collaborator

netdpb commented Sep 14, 2022

How should we refer to wildcards?

interface Foo<T extends @Nullable Object> {
  T t();
  @NonNull T nonNullT();
  @Nullable T nullableT();
}

How do we refer to the types returned by t(), nullableT(), and nonNullT() for Foo<? extends @Nullable Object>?

  • t() returns ? extends Object?
  • nonNullT() returns ? extends Object!
  • nullableT() returns what? ? extends Object??? (? extends Object?)?? ?? extends Object??

@kevinb9n
Copy link
Collaborator

kevinb9n commented Sep 16, 2022

Great question. I'm going to capture my slow thought process here even though you're already ahead of most of it.

Expression types (and expression context or 'sink' types) are post-capture, so they're sometimes anonymous. Javac error messages get away with calling them CAP#1, CAP#2, etc. because they list the details about the bounds lower down.

Here's javac reporting an in-type this way, but for some reason erasing an out-type to Object instead of CAP#2 (dunno why, sticking a pin in it):

WildcardError.java:6: error: method set in interface List<E> cannot be applied to given types;
    i.set(0, i.get(0));
     ^
  required: int,CAP#1
  found: int,Object
  reason: actual argument Object cannot be converted to CAP#1 by method invocation conversion
  where E is a type-variable:
    E extends Object declared in interface List
  where CAP#1 is a fresh type-variable:
    CAP#1 extends Object from capture of ?

I guess we are stuck defining a syntax, to mean "a 'fresh type-variable' having the following 1 or more upper bounds and the following 0 or more lower bounds".

I suppose(?) that might as well be: CAP#1 extends UpperBound1 [& UpperBound2 [...]] [super LowerBound]

The upper bounds can come from two places: 0 or more from the wildcard, 1 or more from the corresponding type parameter declaration, and the order of all these bounds should not matter. I don't know whether we should make any attempt to simplify the bound list (but really, if any upper bound is ! then they effectively all are).

The lower bound can only come from the wildcard (so I don't think there can be multiple).

Say foo is our variable of type Foo!<? extends Object?>. (Note that we do still need ? extends s in our syntax, as that is a not-captured-yet wildcard.)

Then foo.t() returns CAP#1 extends Object?.

And for foo.nonNullT() we could jam in another upper bound of Object! but all we have to do is flip any existing upper bound to !: CAP#1 extends Object!. For clarity we should probably flip all the bounds.

But now we have foo.nullableT(). No fiddling with bounds will handle it. But we would rather avoid this syntax having to be a generalized grammar with projections as operators (e.g. as your (? extends Object?)? would at least suggest).

Since we write "Bar<Baz> union null-type" as Bar?<Baz!>, perhaps this is just CAP#1? extends Object? .... ?

@kevinb9n
Copy link
Collaborator

It is unfortunate that we end up still using ? two different ways in the same syntax, but I don't see how to avoid it.

Of course, when inquiring whether something matches a specified type then which capture you labeled as CAP#1 vs CAP#9 etc. should not matter.

@kevinb9n
Copy link
Collaborator

it's starting to feel like we're adding a step to the process of learning JSpecify that is "memorize the symbols," and that may be intimidating. Maybe it's a necessary step, but it nudges me personally a bit back toward outputting @Nullable String-style output, as discussed above.

I think it should work to separate the topic of defining what the shorthand and longhand forms are from the topic of when we should use the shorthand vs the longhand. And this issue is just trying to define the shorthand.

With % in the picture, I worry again that we're trying too hard to make our types look different from normal types and/or to express everything that a user could hope to know about a type in one character.

Yes, let's agree that we can't express everything. P% might be convertible to U% and this won't be apparent from the symbols alone.

I think this gets close enough to "useful", in the spirit of comment.

We can go with "%" or "" as the suffix for parametric (I don't see a passionate debate developing over which symbol). The advantages of "%" aren't massive but:

  • It's a bit more "definite"; a bit more apparently-incorrect for a non-null bounded type variable (assuming current proposal for #275)
  • Though I don't favor someone adopting "JSpecify shorthand except with "!" assumed", it feels like someone's gonna do it anyway, and this at least keeps that well-defined
  • There is maybe a very tiny visual-alignment argument

As for needing another glyph for #248: I just replied to a non-GitHub discussion about it with the claim that both the options for #248 produce another "kind of nullness." So that may be more motivation not to try to represent each "kind" with a distinct glyph.

I am happy to not need to represent "assumed parametric nullness" as its own thing.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Sep 19, 2022

(It occurs to me that we can get visual alignment by using a trailing space, too: T? aligns with T . But we agree that visual alignment is a small concern.)

It's a bit more "definite"; a bit more apparently-incorrect for a non-null bounded type variable (assuming current proposal for #275)

I can see that, but I'm wondering how often users need that: Typically, they're going to see types because a checker is telling them that those types are incorrect. So they might see, under a javac-style message:

dereference of possibly null value of type T

  where T is a type-variable:
    T extends Object?

Or:

incompatible types: T cannot be converted to Object!

  where T is a type-variable:
    T extends Object?

When you see that, I suspect that either you already understand (in which case an added % isn't necessary) or you don't understand (in which case an added % may just make things scarier). For comparison:

incompatible types: T% cannot be converted to Object!

  where T is a type-variable:
    T extends Object?

Maybe people with more Kotlin experience have a sense of how often T cannot be converted to Object! (well, Object, in their case) confuses users? Maybe it happens more often with more complex types, in which you need to identify the specific parts that are mismatched? But then those more complex types get even noisier if we add more glyphs :\

(I want to relate this to my favorite conversation, about static imports: It's straightforward to argue for making any given piece of information prominent for the user—in that case, more information about where a method is declared; this this case, more information about the nullness of a type variable. But in the common case, it's information that you don't need, so it ever so slightly makes the information you do need harder to find.)


I think that part of the argument for % (and !, though I don't mind it as much, at least if we can find something without the Kotlin conflict) has been that it gives every augmented type at least one glyph. That makes it always clear whether we're talking about an augmented type or a base type. Another way to serve that goal is to always output augmented types inside special markers, at least when using them in prose or other locations that we worry may be ambiguous. So we could write that "The type of List<@Nullable String> in null-unmarked code is 'List*<String?>', but the type in null-marked code is 'List<String?>'."

@cpovirk
Copy link
Collaborator Author

cpovirk commented Sep 20, 2022

Maybe people with more Kotlin experience have a sense of how often T cannot be converted to Object! (well, Object, in their case) confuses users?

Perspectives from people who actually use Kotlin regularly would be welcome, but I dug up a couple threads:

These don't seem wildly common, though it's hard to know for sure from the searches I did.

(The other bit of good news, of course, is that users will typically need to explicitly write <T extends @Nullable Object> in order to even encounter this problem.)

@kevinb9n
Copy link
Collaborator

kevinb9n commented Sep 22, 2022

Wiki page for this notation exists now. Incomplete, but spells out the goals I think we're shooting for.

Another way to serve that goal is to always output augmented types inside special markers, at least when using them in prose or other locations that we worry may be ambiguous. So we could write that "The type of List<@Nullable String> in null-unmarked code is 'List*<String?>', but the type in null-marked code is 'List<String?>'."

Reminiscent of putting slashes around regular expressions when you mention them. (But, most people don't do that.)

My guess is that people would bother to do that rarely enough that no one would understand what it means when they do anyway. It also feels a wee bit like an "extra" detail to have to teach people besides just the idea of a suffix.

Issues

Issue 1: non-null character

IF a non-null character is needed, what should it be?

  • !
    • − means something else in kotlin
  • +
    • ~ regex precedent argues for no suffix, but + is at least close-ish to the right meaning

Issue 2: parametric character

IF a parametric character is needed, what should it be?

  • %
    • + starts with p; symbol could be seen as suggesting 'this or that'
  • ...?
    • (haven't heard any other suggestions)

Issue 3: Scheme

  1. Always suffix
    • + Maximally clear; no base/augmented ambiguity
  2. Omit suffix for parametric only
    • + Parametric is probably the weirdest symbol
  3. May omit... (allow both 1 and 2)
  4. Omit suffix for non-null only, except we need it for a "real" non-null projection
    • + Similar to kotlin
  5. May omit... (allow both 1 and 4)
  6. Omit suffix for "parametric if a nullable-bounded type variable, non-null otherwise"
    • ~ Baasically "no change", but this is a nullness "state", not a nullness "operator" a la the spec
  7. May omit... (allow both 1 and 6)

Comparison of options

For the table below I'll just arbitrarily assume + and % from above.

Example signature: class Foo<A extends @Nullable Object, B>

Code Description 1 2 3 4 5 6 7
A parametric A% A A, A% A% A% A A, A%
@NonNull A "real" non-null projection A+ A+ A+ A+ A+ A+ A+
B/@NonNull B non-null type var B+ B+ B+ B B, B+ B B, B+
Xy/@NonNull Xy non-null non-t.v. Xy+ Xy+ Xy+ Xy Xy, Xy+ Xy Xy, Xy+

@kevinb9n kevinb9n modified the milestones: 1.0, 0.3 Oct 14, 2022
@kevinb9n
Copy link
Collaborator

I still find myself really reluctant to give this up in the current docs, so I'm going to quasi-optimisticallly retarget for 0.3, knowing we could still bump it back out again.

@kevinb9n
Copy link
Collaborator

To get a "working decision" for now, I claim the simplest thing we can do is:

  • Always suffix
  • Just use ! for NN because that's what we've been doing

I can see a likely enough change to + or to a different scheme before 1.0.

For now we are marking issues closed when they have a working decision.

@kevinb9n
Copy link
Collaborator

There's a detailed reasoned argument against putting the nullness signifier before the type arguments. Enough work was put into it that I'm reopening this for now, but the doc I'm referring to should be published. @eamonnmcmanus

@kevinb9n kevinb9n reopened this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation For issues related to user-facing documentation nullness For issues specific to nullness analysis.
Projects
None yet
Development

No branches or pull requests

6 participants