-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected interaction between variance and GADTs #5985
Comments
Comment author: @garrigue Well-spotted. This is actually a soundness problem, as you can see by adding the code: let x = T 3;; |
Comment author: @lpw25 Here is a similar example:
|
Comment author: @lpw25 And another:
|
Comment author: @lpw25 Here is an example that doesn't use GADTs;
In fact this example doesn't use any "language extensions", and is present in at least 3.12.1 |
Comment author: @gasche I worked with these "bivariant" type variables in the work on variance for GADTs ( http://hal.inria.fr/hal-00772993/ ) -- we called this variance "irrelevance", but "bivariance" is rather nice. I feel guilty for not spotting these problems when I did this work, as that would have been a rather natural occasion to experiment with these edge cases. I guess I have too much faith in the type-checker :-) Jacques, do you think you can fix those unification issues? (To innocent bystanders reading the bugtracker: bivariant types ('a t) have the distinct property that the equation (foo t = bar t) does not necessarily implies (foo = bar), while types of any other variance do.). I think it would be rather elegant if we could have explicit support of both bivariance/irrelevance and invariance as first-class variances in the type signature language. The syntax currently only allows '+' and '-' annotations (and '=' is implicitly available), but I think most of the type-system would extend gracefully to bivariance/irrelevance. Having it explicitly could also help us reason about those corner cases, and avoid such issues in the future. There is the delicate question of syntax; we used an unicode blurb in our paper, but I daringly suggest '0' or '*'.
(Is it time to reopen discussion on #5688? #5688 . I think we properly understand the soundness question now, but it's unclear how the extended criterion we have in our article would fare without extending the OCaml signature language for upward/backward-closed types. I think it would still be useful in most actual cases, but the limitations have made me slow at asking for work on the type-checker implementation.) |
Comment author: @lpw25 If we add annotations for bivariance and invariance I would suggest using existing prefix operator symbols, for example:
Note that some bivariant definitions are safe, for example
it is only bivariant type equations that cause problems. I wonder whether there are any uses of
that couldn't be replaced by
without too much effort. |
Comment author: @garrigue As Leo found by himself, this problem is not directly related to GADTs: it is about having type variables that are not determined by type parameters. Actually this bug is probably in OCaml from the very beginning.
In some way the introduction of variance helped, since we at least check that in the current environment at the point of definition this problem does not occur. In 2.00, the following was also accepted:
I have attached a patch that fixes those problems by assuming that all abstract types, except those that come from a separate compilation unit, may end up being bivariant. |
Comment author: @garrigue Extra notes:
|
Comment author: @gasche
I'm not sure I understand, why is it correct to deduce (foo = bar) from (foo t = bar t) when t is defined in a separate compilation unit? Couldn't you have the exact same problem if the separate compilation unit exposed, for example, a value of type ('a t, 'b t) eq ?
I have already personally adopted that style after #5853 ( #5853 ). I believe the only robust way to make sure two types are seen as incompatible is to define them as transparent incompatible definition. I have talked about this style in e.g. ( http://stackoverflow.com/questions/15721528/type-level-arithmetics-in-ocaml ), maybe that would merit a point of its own in the manual. I'm fine with type-checking changes that require that style. I'm not sure what is the best way to expose (in)equalities between the primitive types (which I believe is you main rationale for handling abstract types structure items separately from the rest). Short of other solutions, one might want to consider a new way to define generative abstract types, new type 'a foo Transparent type definitions are of two kinds, the generative ones that are guaranteed to be incompatible with all previous definitions, and the type synonyms that may be equal to previous datatypes. There is currently no way to expose the generativeness of an abstract type, and I think that's part of the problem here. (This may also help (partially) solve #5361, "Syntax to specify that a custom type is never a float" #5361 ) |
Comment author: @garrigue
This is fine, because from (int t, bool t) eq you won't be able to deduce that int = bool (at least since my fix of last saturday, which was an independent problem). For the question of "unique" types I agree this would be nice to have a syntax. type "unique" t (* different from all other nominal types ) On the other hand, we don't want to make things too complicated. |
Comment author: @garrigue The first patch had a stupid bug (didn't expand types when checking variance) so that it didn't work with the original report :-( |
Comment author: @lpw25
But you can still deduce that int t s = bool t s, which is what causes the problem. For example, your patch still allows the following two files:
which seg fault:
|
Comment author: @lpw25 The problem is that you cannot allow
and assume that give "type 'a t"
I can see two possible solutions to this problem:
|
Comment author: @yallop It'd be better to avoid prohibiting irrelevant/bivariant type synonyms, which are both useful and common -- see e.g. the various instantiations of the Symantics signature in "Finally Tagless, Partially Evaluated": R.repr (p10), L.repr (p11), R.repr (p12) etc. or Fold_map, Rw_mutex etc. in Core https://github.com/janestreet/core_extended/blob/master/lib/fold_map.ml#L123 Insisting that all irrelevant type constructs define datatypes would break lots of existing code and make various common idioms clunky to write and slow to run. |
Comment author: @garrigue Right. And the option to check the variance after substitution is not good either: module F (X : sig type 'a s val e : (int s, bool s) eq end) : sig end = struct
type _ t = T : 'a -> 'a s t
(* ... *)
end is sufficient to obtain a contradiction inside the body of F, which is not At this point I see indeed no easy solution, short of adding a "non-bivariant" Surprising we completely overlooked that, while we were aware of the problem |
Comment author: @lpw25
The run-time cost could be eliminated with support for something like: type 'a s = new int where new is similar to private but allows coercions in either direction, so that both: (3 :> string s) and let f (x: bool s) = (x :> int) would be allowed. However, it would still require existing code using these types to be rewritten in a much more verbose style. |
Comment author: @gasche There is a natural order relation between the four variances, bivariant/irrelevant <= {co,contra}variant <= invariant You should always allow (sig type v'a t end :> sig type w'a t end) when the variances (v) and (w) are in this order relation (v <= w). It might be interesting to have an "injective" modality in type signature, correspondingto the ! qualifier of Leo (that is claiming that the actual implemented variance of the constructor is not bivariance), but that would require that variance inference never over-approximates the variance, and I'm not sure that's a reasonable assumption. What breaks if we simply stop making the assumption that abstract types are injective? I think the examples of Jeremy Yallop would still be accepted (contrary to option (2) of Leo above), but without a change to the existing code you would have weaker unequalities after the type abstraction -- and a "unique" modality on abstract types would not work for type synonyms. Note that having explicit inequalities is only important for code using GADTs (so probably not this existing code Jeremy is talking about), as to my knowledge it is only used for finer dead case analysis in GADTs matching. One limitation of the current type system is that the only available way to turn a structural synonym into a nominal type (making a variant with a single constructor) has a runtime cost. It may be interesting to have a form of "nominal type synonym", and it could be naturally presented as either:
Note that any of these choice already allows to solve the injectivity problem (without introducing "injective" and/or "unique" modalities), as you can always make unicity transparent by exposing a nominal convertibility with an abstract type. Instead of writing
you would write
Edit: I wrote my post simultaenously with Leo's, so indeed his "new" idea is exactly my "equiv". I'll note that the "wrap" thing is done in Haskell (which allows 'newtype' to use the variant syntax with only one constructor, with the static guarantee that there is no runtime representation change), and that I occasionally though of "equiv"/new as useful independently of this problem, to define things like "type dollar = int;; type euro = int", without the risk for genuine mistake, without the runtime cost of a one-variant constructor. |
Comment author: @lpw25
Thinking about it some more, this might not be too bad. The only reason to use a type definition like: type 'a t = int is so that you can hide the equality through a module signature. This means that external code must preserve the type in the type variable, but internal code can simply ignore it. Any type definition of the form: type 'a t = typexpr where typexpr does not mention 'a can be rewritten as: type _t = typexpr Then the interfaces to the module can simply cast from t to _t (or vice versa). So most of the internal code can be left as it is. |
Comment author: @garrigue
Actually, it would be enough to have a "do not vanish" modality, as I think Leo points too.
This just means that you must separately under-approximate. |
Comment author: @gasche Doesn't type 'a t = Foo also vanish? |
Comment author: @lpw25
No, because it doesn't allow int t to be unified with bool t, which is the root of the problem. |
Comment author: @gasche Indeed, thanks. They are not unifiable (therefore not equal according to GADTs), but still "equivalent" in the sense of being in the antisymmetric closure of explicit subtyping.
This supports the idea that distinguishing type equality and equivalence is not a crazy idea (it's already the case in the existing system!), which improves my confidence in the idea of "equiv"/new nominal synonyms (or rather "equivalences"?). |
Comment author: @yallop I wonder if we can lift the notion of value subtyping to the module level. If so, that'd certainly ease the pain of explicit coercions between types that are equivalent but not equal (i.e. Leo's "new" / Gabriel's "equiv" types). Suppose we have a signature
that we want to instantiate with the following module:
As things stand we'd have to coerce each function individually in order for the ascription (M : S) to succeed:
However, if we extended the semantics of signature ascription to perform subtyping coercion, either directly or by adding a second form (M :> S), then this syntactic heaviness would be avoided. |
Comment author: @gasche In fact, my analogy with private types does not quite work, as private types are exactly equal to their definition inside the module boundaries, and therefore not considered distinct from it even outside the boundary (which would be unsound as you could publish an explicit equality witness). My idea was to simply write (type 'a t = int) inside the boundary, but that cannot work. So I think my idea of "equiv" type is not quite right for the purposes of having strong inequalities. Maybe Leo's idea was subtly different and is not affected. Jeremy, I think the specifics of your proposal are a bit disappoiting. If we can have the equality ('a t = int) inside the boundary, it's unsound for the same reason as above. If not, there are still a fair amount of cases where the ascription-directed coercions will not suffice (they will only help for the types that are exposed through the interface, which in practice may end up being only a little part of the actual values flowing in the implementation). |
Comment author: @lpw25 While private types are usually only used in signatures, you are allowed to create a type using private in a structure: you just won't be able to create any members of that type. Similarly, you would use "new" within the module rather than in the signature. This solves the problem because with the following definition: type 'a t = new int 'a t is never equal (in unification terms) with int, but can be cast to and from int (effectively being equal in subtyping terms). As I said above, any type defined as: type 'a t = typexpr (* typexpr does not contain 'a *) could be rewritten as: type _t = typexpr so that _t would be used within the module and t would be used outside the module. Jeremy's point is that this will involve having a lot of definitions of the form: let f: _t -> _t list = ... This would be less annoying if it could be done automatically using the signature of the module. |
Comment author: @yallop Gabriel: the idea is that the equality doesn't hold anywhere, but the equivalence holds inside the boundary, or wherever the type equation is visible. I don't think it introduces any unsoundness. The idea is just to have a module-level analogue of subtyping, which conceptually applies :> componentwise, much like the way subtyping works for (say) object types.
Module-level subtyping doesn't in itself do anything solve the soundness issue discussed above; it simply makes the proposed new/equiv types palatable, drastically reducing the number of coercions in the source. Without it I'm not sure that type equivalences are really practicable. Do you have any concerete examples where there are likely to be other values in need of coercion than those in the signature? I suspect that, as Leo says above, irrelevant type synonyms are only really useful when the equality is hidden behind a signature. |
Comment author: @gasche Re. the "new" feature itself: That would still be a departure from the way "private" is handled right now: if you define (type t = private int) in an implementation, you don't have an unequality between (t) and (int), as you can check e.g. with type priv = private int
type _ t =
| A : priv t
| B : int t
let should_be_exhaustive : int t -> unit = function
| B -> ()
(* exhaustiveness warning *) I think if "new" is defined in terms of subtyping, it should really be consistent in its behavior with "private". Maybe we can change "private" so that implementation provides unequality, but I find it quite distasteful that (* some stuff *)
type t = private foo (* or: new foo *)
(* other stuff *) would have different typing properties than module Parametrized (M : sig
(* some stuff *)
type t = private foo
end) = struct
include M
(* other stuff *)
end Re. the signature ascription: I had an example in mind of the form module M : sig
type 'a _t
type 'a t = new 'a _t
type 'a t_list = Nil | Cons of 'a t * 'a t_list
val foo : 'a t_list -> int
end = struct
type 'a _t = int
type 'a t = new int
type 'a t_list = Nil | Cons of 'a t * 'a t_list
let rec foo = function
| Nil -> 0
| Cons (x, xs) -> (x :> int) + foo xs
end In this case, the fact that I want my externally visible ('a t_list) to be defined in term of the "proper" type ('a t), the one with the good inequality guarantees, forces me to use coercions when manipulating values of type ('a t_list) internally, because I don't think it would make sense to somehow "coerce" the datatype definition. (So ascription-derived coercion seem like a good idea, but if I've understood correctly they're more of an ad-hoc help than a fundamentally game-changing feature when evaluating the idea of "new" types, so maybe they can be discussed separately.) |
Comment author: @lpw25
These exhaustiveness warnings are not done using equality (in the sense of unification). They use the more loosely defined "compatibility" as defined by Ctype.mcomp. In fact priv and int are not equivalent, as demonstrated by:
|
Comment author: @gasche Ok, but I think my point about module Parametrized (M : sig
(* some stuff *)
type t = private foo
end) = struct
include M
(* other stuff *)
end still stands: if in "some stuff" you include an equality witness (t, foo) eq -- and those are definable in a module with the stronger implementation (type t = private foo) -- you can then get a context where priv and int end up being unifiable. On the contrary, a transparent one-element variant never allows convertibility, because it is not the abstraction of a stronger definition that does. (Of course, the downside of this property is that it always require a verbose wrapping/unwrapping.) |
Comment author: @lpw25
t and foo can only ever be unifiable if somewhere there is a definition type t = foo An equality witness cannot be created based on type t = private foo. Since my proposal was to ban definitions of the form: type 'a t = int there is no way for int and 'a t to become unifiable (or more importantly for int t and bool t to become unifiable). Note that I am not proposing any change to how "private" is interpreted, and that "new" is a natural extension to "private" |
Comment author: @gasche After discussing the matter with Didier (he was the only person around able to decipher Jacques killing 2.00 example), I have a better understanding of the issue. Sorry, Jacques, for having been a bit slow to understand some of the subtleties!
|
Comment author: @garrigue Update: branches/non-vanishing is now ready for testing. type (#'a) t for injective type parameters Also, subtyping is allowed during module subtyping. |
Comment author: @garrigue As a small note, you can look at the code in branches/non-vanishing/testsuite/tests/typing-gadts/omega07.ml for how to define type indexes that work even outside of the local module. type ('a,'b) fk = FK This way they are distinct from any type with a different constructor name, and all datatypes are All the infrastructure with new and # is only intended for abstract types, i.e. when you don't want to reveal the internal representation of. In some way this is not strictly needed. One can always exports two types: type 'a t0 This is sufficient to make t injective and distinct from other types. Of course, we are still to solve the problem of providing "unique" abstract types. |
Comment author: @gasche
What about new type 'a t considered semantically equivalent to type 'a t0 ? Does this have the expected properties of being provably distinct from all other known-to-be-concrete (by transparency or exhibition of new-ness) datatypes? On the other hand, one may want (in the general case where "new" here is just understood as a marker for "uniquenes") module M : sig to type-check. It would be tempting to have new-in-signature be a valid signature for both new-in-structures and (nominal) concrete datatypes. |
Comment author: @garrigue I thought of something like that, say type 'a t = new to keep a closer syntax. First problem, you cannot allow it to wrap normal datatypes, because then you wouldn't be able to distinguish new types from datatypes: module M : sig This can be kind of fixed by adding "new" datatypes too: type 'a t = new T Here a "new" datatype is just like a normal datatype, except The next problem is how to prevent aliasing. So yes, this is the basic idea, but integrating it in the language |
Comment author: @gasche This all looks quite tricky; in particular, the export of datatype equality makes thing less generative than we would expect them to be. Maybe we could consider forgetting about "unique" abstract datatypes and keep the Do we need unique abstract types for anything other than the already suspicious phantom type tricks with GADTs? |
Comment author: @garrigue I'm not sure what you mean by "already suspicious phantom type tricks with GADTs".
Here int and array are handled by the special case for builtin types, so we now let f (type a) (x : a Queue.t t) = becomes partial, and to make it complete we must write all cases, since Queue.t Another useful point of unique abstract types is that they do not depend on But of course this is not as bad as the injectivity problem, since at least you |
Comment author: @garrigue Just for fun, I've added abstract new types and new datatypes: type t = new The idea is that an abstract new type can only abstract a new type, and since they do not allow aliasing, they are all different. At this point I have no proof of correctness whatsoever, but it might be an alternative to variance annotations (any new type is automatically injective), which additionally ensures uniqueness. |
Comment author: @garrigue Unfortunately, this is broken: module M : sig type t = new type _ is_t = I : ('a,t) comp -> 'a is_t end = module N = M;; let e = M.I Eq;; |
Comment author: @garrigue There is now a new version of abstract and concrete new types in branches/abstract-new The idea is that taking a copy of a module will turn all new types into aliases to the original definition (when there is a valid path), or just turn them into abstract types when there is no valid path. module M = struct type t = new int end;;module M : sig type t = new int end module N = M;;module N : sig type t = M.t end The same thing of course happens if you path a named module to a functor, so that you cannot preserve the newness there. For this reason, while this is a nice approach, it doesn't seem to be a full replacement for injectivity annotations: they are the only way to keep abstract types injective in functors, or when the original definition is not visible. Again, examples are in typing-private/new.ml |
Comment author: @alainfrisch Setting Target Version to 4.01: we probably need to do something before this release, at least assuming that all parametrized abstract types (except predefined ones such as lazy, array) are non injective. |
Comment author: @garrigue Fixed in trunk. (commit 13634) |
Comment author: @alainfrisch Keeping open, because of "no syntax for asserting injectivity of abstract types", but this doesn't look like a blocking issue for 4.01 anymore. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Original bug ID: 5985
Reporter: @yallop
Assigned to: @garrigue
Status: confirmed (set by @garrigue on 2013-04-13T11:46:07Z)
Resolution: open
Priority: normal
Severity: feature
Target version: later
Category: typing
Tags: patch
Related to: #6275 #7360
Child of: #5984 #5998
Monitored by: @lpw25 mcclurmc @glondu @dra27 @alainfrisch @diremy
Bug description
The following program
type _ t = T : 'a -> 'a s t
is accepted if s is covariant (e.g. type 'a s = 'a), contravariant (e.g. type 'a s = 'a -> unit) or invariant (e.g. type 'a s = 'a -> 'a), but rejected if s is "bivariant" (e.g. type 'a s = int):
However, if s is abstracted then the program is accepted, even if s is instantiated with a bivariant constructor:
File attachments
The text was updated successfully, but these errors were encountered: