Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected interaction between variance and GADTs #5985

Closed
vicuna opened this issue Apr 12, 2013 · 51 comments
Closed

Unexpected interaction between variance and GADTs #5985

vicuna opened this issue Apr 12, 2013 · 51 comments

Comments

@vicuna
Copy link

@vicuna vicuna commented Apr 12, 2013

Original bug ID: 5985
Reporter: @yallop
Assigned to: @garrigue
Status: confirmed (set by @garrigue on 2013-04-13T11:46:07Z)
Resolution: open
Priority: normal
Severity: feature
Target version: later
Category: typing
Tags: patch
Related to: #6275 #7360
Child of: #5984 #5998
Monitored by: @lpw25 mcclurmc @glondu @dra27 @alainfrisch @diremy

Bug description

The following program

type _ t = T : 'a -> 'a s t

is accepted if s is covariant (e.g. type 'a s = 'a), contravariant (e.g. type 'a s = 'a -> unit) or invariant (e.g. type 'a s = 'a -> 'a), but rejected if s is "bivariant" (e.g. type 'a s = int):

# type 'a s = int;;
type 'a s = int
# type _ t = T : 'a -> 'a s t ;;
Characters 4-27:
  type _ t = T : 'a -> 'a s t ;;
      ^^^^^^^^^^^^^^^^^^^^^^^
Error: In this definition, a type variable has a variance that
       is not reflected by its occurrence in type parameters.

However, if s is abstracted then the program is accepted, even if s is instantiated with a bivariant constructor:

# include
  (functor (S : sig type 'a s end) ->
    struct
      include S
      type _ t = T : 'a -> 'a s t
    end)
    (struct type 'a s = int end)
;;
type 'a s = int
type _ t = T : 'a -> 'a s t
#

File attachments

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 13, 2013

Comment author: @garrigue

Well-spotted.

This is actually a soundness problem, as you can see by adding the code:

let x = T 3;;
match x with T y -> y;;

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 13, 2013

Comment author: @lpw25

Here is a similar example:

          OCaml version 4.00.1

  # type ('a, 'b) eq = Eq: ('a, 'a) eq;;
  type ('a, 'b) eq = Eq : ('a, 'a) eq
  # module M : sig type 'a s val se: (int s, string s) eq end = struct
      type 'a s = int
      let se = Eq
    end;;
        module M : sig type 'a s val se : (int s, string s) eq end
  # type _ t = T : 'a -> 'a M.s t;;
  type _ t = T : 'a -> 'a M.s t
  # let castT (type a) (type b) (x : a t) (e: (a, b) eq) =
      ((match e with Eq -> (x : b t)) : b t);;
    val castT : 'a t -> ('a, 'b) eq -> 'b t = <fun>
  # let st : string M.s t = castT (T 3) M.se;;

  Process ocaml-toplevel segmentation fault

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 13, 2013

Comment author: @lpw25

And another:

          OCaml version 4.00.1

  # module rec A : sig type 'a s val cast : int s B.t -> string s B.t end = struct
     type 'a s = int
     let cast (x: int s B.t): string s B.t = x
    end and B : sig type _ t = T : 'a -> 'a A.s t end = struct
     type _ t = T : 'a -> 'a A.s t
    end;;
            module rec A : sig type 'a s val cast : int s B.t -> string s B.t end
  and B : sig type _ t = T : 'a -> 'a A.s t end
  # A.cast (B.T 3);;

  Process ocaml-toplevel segmentation fault

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 14, 2013

Comment author: @lpw25

Here is an example that doesn't use GADTs;

          OCaml version 4.00.1

  # include
        (functor (S : sig type 'a s end) ->
          struct
            include S
            type 'a t = 'b * ('a -> unit) constraint 'a = 'b s
            let x = ((3, fun _ -> ()) : int s t)
          end)
          (struct type 'a s = int end)
      ;;
                  type 'a s = int
  type 'a t = 'b * ('a -> unit) constraint 'a = 'b s
  val x : int s t = (<poly>, <fun>)
  # match x with (s, _) -> s ^ "hello";;

  Process ocaml-toplevel segmentation fault

In fact this example doesn't use any "language extensions", and is present in at least 3.12.1

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 14, 2013

Comment author: @gasche

I worked with these "bivariant" type variables in the work on variance for GADTs ( http://hal.inria.fr/hal-00772993/ ) -- we called this variance "irrelevance", but "bivariance" is rather nice. I feel guilty for not spotting these problems when I did this work, as that would have been a rather natural occasion to experiment with these edge cases. I guess I have too much faith in the type-checker :-)

Jacques, do you think you can fix those unification issues? (To innocent bystanders reading the bugtracker: bivariant types ('a t) have the distinct property that the equation (foo t = bar t) does not necessarily implies (foo = bar), while types of any other variance do.).

I think it would be rather elegant if we could have explicit support of both bivariance/irrelevance and invariance as first-class variances in the type signature language. The syntax currently only allows '+' and '-' annotations (and '=' is implicitly available), but I think most of the type-system would extend gracefully to bivariance/irrelevance. Having it explicitly could also help us reason about those corner cases, and avoid such issues in the future. There is the delicate question of syntax; we used an unicode blurb in our paper, but I daringly suggest '0' or '*'.

type *'a = ...
type 0'a = ...

(Is it time to reopen discussion on #5688? #5688 . I think we properly understand the soundness question now, but it's unclear how the extended criterion we have in our article would fare without extending the OCaml signature language for upward/backward-closed types. I think it would still be useful in most actual cases, but the limitations have made me slow at asking for work on the type-checker implementation.)

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 14, 2013

Comment author: @lpw25

If we add annotations for bivariance and invariance I would suggest using existing prefix operator symbols, for example:

type ?'a t (* Bivariant *)

type !'a t (* Invariant *)

Note that some bivariant definitions are safe, for example

type 'a s = S

it is only bivariant type equations that cause problems. I wonder whether there are any uses of

type 'a s = int

that couldn't be replaced by

type 'a s = S of int

without too much effort.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @garrigue

As Leo found by himself, this problem is not directly related to GADTs: it is about having type variables that are not determined by type parameters.
It is not related to variance either :-)

Actually this bug is probably in OCaml from the very beginning.
I verified that the following code typechecks in version 2.00.

module F(T:sig type 'a t end) = struct
  class ['a] c x = object constraint 'a = 'b T.t val x' : 'b = x method x = x' end
end;;
module M = F(struct type 'a t = unit end);;
let o = new M.c 3;;
o#x;;
- : 'a = <poly>

In some way the introduction of variance helped, since we at least check that in the current environment at the point of definition this problem does not occur. In 2.00, the following was also accepted:

type 'a t = unit;;
class ['a] c x = object constraint 'a = 'b t val x' : 'b = x method x = x' end;;

I have attached a patch that fixes those problems by assuming that all abstract types, except those that come from a separate compilation unit, may end up being bivariant.
This is fine, but it is going to break some programs, mainly (only?) those using GADTs
As you can see in the diff file, omega07.ml had to be modified, to make type indices concrete.
For instance
type 'a succ
becomes
type 'a succ = Succ of 'a
This may feel kind of heavy.
On the other, this is also necessary to ensure that one can prove inequalities outside of the current, so maybe this is fine.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @garrigue

Extra notes:

  1. As mentioned in my previous note, this is not directly related to variance, so that extra notations for bivariance and invariance do not help here. If you want to discuss that it would be better to open another PR.

  2. I wouldn't qualify this as a "child of" 5984, but indeed another way to fix it would be to recomputed variance after each signature substitution, and detect problems at this point. However, this is a rather big change, and this would mean that signature substitution can fail. I'm waiting for feedback on 5984 before doing anything in that direction.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

I have attached a patch that fixes those problems by assuming that all abstract types, except those that come from a separate compilation unit, may end up being bivariant.

I'm not sure I understand, why is it correct to deduce (foo = bar) from (foo t = bar t) when t is defined in a separate compilation unit? Couldn't you have the exact same problem if the separate compilation unit exposed, for example, a value of type

('a t, 'b t) eq

?

This is fine, but it is going to break some programs, mainly (only?) those using GADTs [...]
For instance
type 'a succ
becomes
type 'a succ = Succ of 'a

[...] this is also necessary to ensure that one can prove inequalities outside of the current module, so maybe this is fine.

I have already personally adopted that style after #5853 ( #5853 ). I believe the only robust way to make sure two types are seen as incompatible is to define them as transparent incompatible definition. I have talked about this style in e.g. ( http://stackoverflow.com/questions/15721528/type-level-arithmetics-in-ocaml ), maybe that would merit a point of its own in the manual.

I'm fine with type-checking changes that require that style.

I'm not sure what is the best way to expose (in)equalities between the primitive types (which I believe is you main rationale for handling abstract types structure items separately from the rest). Short of other solutions, one might want to consider a new way to define generative abstract types,

new type 'a foo

Transparent type definitions are of two kinds, the generative ones that are guaranteed to be incompatible with all previous definitions, and the type synonyms that may be equal to previous datatypes. There is currently no way to expose the generativeness of an abstract type, and I think that's part of the problem here.

(This may also help (partially) solve #5361, "Syntax to specify that a custom type is never a float" #5361 )

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @garrigue

I'm not sure I understand, why is it correct to deduce (foo = bar) from (foo t = bar t) when t is defined in a separate compilation unit? Couldn't you have the exact same problem if the separate compilation unit exposed, for example, a value of type

('a t, 'b t) eq

This is fine, because from (int t, bool t) eq you won't be able to deduce that int = bool (at least since my fix of last saturday, which was an independent problem).
Should still double check.

For the question of "unique" types I agree this would be nice to have a syntax.
We need a discussion somewhere.
Personally, I would prefer introducing a new syntactic category of "qualifiers", so that we
are not limited by keywords for everything. This would have many applications.

type "unique" t (* different from all other nominal types )
type "final" 'a t (
cannot be instantiated in a signature *)

On the other hand, we don't want to make things too complicated.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @garrigue

The first patch had a stupid bug (didn't expand types when checking variance) so that it didn't work with the original report :-(
Now fixed.
I suppose I'm going to commit this, but not in the 4.00 branch.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

I'm not sure I understand, why is it correct to deduce (foo = bar) from (foo t = bar t) when t is defined in a separate compilation unit? Couldn't you have the exact same problem if the separate compilation unit exposed, for example, a value of type

('a t, 'b t) eq

This is fine, because from (int t, bool t) eq you won't be able to deduce that int = bool (at least since my fix of last saturday, which was an independent problem).

But you can still deduce that int t s = bool t s, which is what causes the problem. For example, your patch still allows the following two files:

  (* s.ml *)

  type ('a, 'b) eq = Eq: ('a, 'a) eq

  module M : sig type 'a s val se: (int s, string s) eq end = struct
    type 'a s = int
    let se = Eq
  end
  (* t.ml *)

  type _ t = T : 'a -> 'a S.M.s t

  let castT (type a) (type b) (x : a t) (e: (a, b) S.eq) =
    ((match e with S.Eq -> (x : b t)) : b t)

  let st : string S.M.s t = castT (T 3) S.M.se

  let _ = match st with T s -> s ^ "hello"

which seg fault:

  $ ocamlc s.ml t.ml
  $ ./a.out 
  Segmentation fault

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

The problem is that you cannot allow

sig type 'a t = int end :> sig type 'a t end

and assume that give "type 'a t"

'a t = 'b t => 'a = 'b

I can see two possible solutions to this problem:

  1. Since "bivariance" is a conservative approximation for types like:
    type 'a t = int
    
    We could add more annotations and restrictions to variance. For example,
    if ?'a meant bivariance and !'a meant invariance, we could have:
     'a t = 'b t => 'a = 'b iff t is invariant, covariant or contravariant
    
    and
    sig type ?'a t end :> sig type 'a t end
    
    but not
    sig type ?'a t end :> sig type !'a t end
    
  2. We could ban type definitions of the form:
    type 'a t = typexpr
    
    where typexpr does not use the type variable 'a. Note that definitions like
    type 'a t = T of int
    
    and
    type 'a t
    
    would still be fine.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @yallop

It'd be better to avoid prohibiting irrelevant/bivariant type synonyms, which are both useful and common -- see e.g. the various instantiations of the Symantics signature in "Finally Tagless, Partially Evaluated":

R.repr (p10), L.repr (p11), R.repr (p12) etc.
http://www.cs.rutgers.edu/~ccshan/tagless/jfp.pdf

or Fold_map, Rw_mutex etc. in Core

https://github.com/janestreet/core_extended/blob/master/lib/fold_map.ml#L123
https://github.com/janestreet/core_extended/blob/master/lib/rw_mutex.ml#L13

Insisting that all irrelevant type constructs define datatypes would break lots of existing code and make various common idioms clunky to write and slow to run.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @garrigue

Right.
I've uploaded a correct version that only allows abstract type constructors from pervasives,
but the restriction is severe.

And the option to check the variance after substitution is not good either:

module F (X : sig type 'a s  val e : (int s, bool s) eq end) : sig end = struct
   type _ t = T : 'a -> 'a s t
   (* ... *)
end

is sufficient to obtain a contradiction inside the body of F, which is not
visible outside...

At this point I see indeed no easy solution, short of adding a "non-bivariant"
variance annotation. I.e. if a type is marked as non-bivariant in a signature,
it cannot be substituted (or subtyped) by a bivariant one. But this requires
changing existing code.

Surprising we completely overlooked that, while we were aware of the problem
with abstraction and injectivity...

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

Insisting that all irrelevant type constructs define datatypes would break lots of existing code and make various common idioms clunky to write and slow to run.

The run-time cost could be eliminated with support for something like:

type 'a s = new int

where new is similar to private but allows coercions in either direction, so that both:

(3 :> string s)

and

let f (x: bool s) = (x :> int)

would be allowed.

However, it would still require existing code using these types to be rewritten in a much more verbose style.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

There is a natural order relation between the four variances,

bivariant/irrelevant <= {co,contra}variant <= invariant

You should always allow (sig type v'a t end :> sig type w'a t end) when the variances (v) and (w) are in this order relation (v <= w).

It might be interesting to have an "injective" modality in type signature, correspondingto the ! qualifier of Leo (that is claiming that the actual implemented variance of the constructor is not bivariance), but that would require that variance inference never over-approximates the variance, and I'm not sure that's a reasonable assumption.

What breaks if we simply stop making the assumption that abstract types are injective? I think the examples of Jeremy Yallop would still be accepted (contrary to option (2) of Leo above), but without a change to the existing code you would have weaker unequalities after the type abstraction -- and a "unique" modality on abstract types would not work for type synonyms.

Note that having explicit inequalities is only important for code using GADTs (so probably not this existing code Jeremy is talking about), as to my knowledge it is only used for finer dead case analysis in GADTs matching.

One limitation of the current type system is that the only available way to turn a structural synonym into a nominal type (making a variant with a single constructor) has a runtime cost. It may be interesting to have a form of "nominal type synonym", and it could be naturally presented as either:

  • a notion of one-constructor variants whose parameter would be erased at runtime (I think it should be a separate notion from the usual variants as the implementation probably relies on variant types having a specific runtime representation)

    type "wrap" 'a foo = Bar of blah

  • or a variant of "private" types that allows subtyping in both directions, but does not produce a type equality (this has the downside of the making the subtyping relation non-antisymmetric, which I believe is possible in presence of explicit subtyping only, but a bit weird and may bite the type system in the future)

    type "equiv" 'a foo = blah

Note that any of these choice already allows to solve the injectivity problem (without introducing "injective" and/or "unique" modalities), as you can always make unicity transparent by exposing a nominal convertibility with an abstract type. Instead of writing

sig
  type 'a t (* fully abstract, injectivity/unicity status unclear *)
end

you would write

sig
  type 'a u (* fully abstract *)
  type "equiv" 'a t = 'a u (* nominal, injective, yet cost-free *)
end

Edit: I wrote my post simultaenously with Leo's, so indeed his "new" idea is exactly my "equiv". I'll note that the "wrap" thing is done in Haskell (which allows 'newtype' to use the variant syntax with only one constructor, with the static guarantee that there is no runtime representation change), and that I occasionally though of "equiv"/new as useful independently of this problem, to define things like "type dollar = int;; type euro = int", without the risk for genuine mistake, without the runtime cost of a one-variant constructor.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

However, it would still require existing code using these types to be rewritten in a much more verbose style.

Thinking about it some more, this might not be too bad. The only reason to use a type definition like:

type 'a t = int

is so that you can hide the equality through a module signature. This means that external code must preserve the type in the type variable, but internal code can simply ignore it.

Any type definition of the form:

type 'a t = typexpr

where typexpr does not mention 'a can be rewritten as:

type _t = typexpr
type 'a t = new _t

Then the interfaces to the module can simply cast from t to _t (or vice versa). So most of the internal code can be left as it is.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @garrigue

It might be interesting to have an "injective" modality in type signature, correspondingto the ! qualifier of Leo (that is claiming that the actual implemented variance of the constructor is not bivariance),

Actually, it would be enough to have a "do not vanish" modality, as I think Leo points too.
This has the advantage of including all concrete types (everything that is neither abstract nor alias).

but that would require that variance inference never over-approximates the variance, and I'm not sure that's a reasonable assumption.

This just means that you must separately under-approximate.
This is what I do in my patch . I think the patch is correct, it is just probably too restrictive alone.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

Doesn't

type 'a t = Foo

also vanish?

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

Doesn't

type 'a t = Foo

also vanish?

No, because it doesn't allow int t to be unified with bool t, which is the root of the problem.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

Indeed, thanks. They are not unifiable (therefore not equal according to GADTs), but still "equivalent" in the sense of being in the antisymmetric closure of explicit subtyping.

# type 'a t = Foo;;
# let test x = (x : int t :> bool t);;
val test : int t -> bool t = <fun>

This supports the idea that distinguishing type equality and equivalence is not a crazy idea (it's already the case in the existing system!), which improves my confidence in the idea of "equiv"/new nominal synonyms (or rather "equivalences"?).

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @yallop

I wonder if we can lift the notion of value subtyping to the module level. If so, that'd certainly ease the pain of explicit coercions between types that are equivalent but not equal (i.e. Leo's "new" / Gabriel's "equiv" types). Suppose we have a signature

  module type S =
  sig
    type 'a t
    val f : int t -> string t
    ...
  end

that we want to instantiate with the following module:

  module M =
  struct
     type 'a t = new int
     let f x = x + 1
     ...
  end

As things stand we'd have to coerce each function individually in order for the ascription (M : S) to succeed:

  let f x = x + 1
  let (f :> int t -> string t)

However, if we extended the semantics of signature ascription to perform subtyping coercion, either directly or by adding a second form (M :> S), then this syntactic heaviness would be avoided.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

In fact, my analogy with private types does not quite work, as private types are exactly equal to their definition inside the module boundaries, and therefore not considered distinct from it even outside the boundary (which would be unsound as you could publish an explicit equality witness). My idea was to simply write (type 'a t = int) inside the boundary, but that cannot work.

So I think my idea of "equiv" type is not quite right for the purposes of having strong inequalities. Maybe Leo's idea was subtly different and is not affected.
The idea of an "erasable" one-constructor variant would still stand. But it also has the problem of being a bit painful to handle manually (as Jeremy notes in his examples).

Jeremy, I think the specifics of your proposal are a bit disappoiting. If we can have the equality ('a t = int) inside the boundary, it's unsound for the same reason as above. If not, there are still a fair amount of cases where the ascription-directed coercions will not suffice (they will only help for the types that are exposed through the interface, which in practice may end up being only a little part of the actual values flowing in the implementation).

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

While private types are usually only used in signatures, you are allowed to create a type using private in a structure: you just won't be able to create any members of that type.

Similarly, you would use "new" within the module rather than in the signature. This solves the problem because with the following definition:

type 'a t = new int

'a t is never equal (in unification terms) with int, but can be cast to and from int (effectively being equal in subtyping terms).

As I said above, any type defined as:

type 'a t = typexpr (* typexpr does not contain 'a *)

could be rewritten as:

type _t = typexpr
type 'a t = new _t

so that _t would be used within the module and t would be used outside the module.

Jeremy's point is that this will involve having a lot of definitions of the form:

let f: _t -> _t list = ...
let f = (f :> 'a t -> 'a list)

This would be less annoying if it could be done automatically using the signature of the module.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @yallop

Gabriel: the idea is that the equality doesn't hold anywhere, but the equivalence holds inside the boundary, or wherever the type equation is visible. I don't think it introduces any unsoundness.

The idea is just to have a module-level analogue of subtyping, which conceptually applies :> componentwise, much like the way subtyping works for (say) object types.

   (object
     method a : [`A] = `A
     method b : [`B] = `B
   end
    :> 
     < a : [`A|`B];
       b : [`B|`C]
     >)

Module-level subtyping doesn't in itself do anything solve the soundness issue discussed above; it simply makes the proposed new/equiv types palatable, drastically reducing the number of coercions in the source. Without it I'm not sure that type equivalences are really practicable.

Do you have any concerete examples where there are likely to be other values in need of coercion than those in the signature? I suspect that, as Leo says above, irrelevant type synonyms are only really useful when the equality is hidden behind a signature.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

Re. the "new" feature itself:

That would still be a departure from the way "private" is handled right now: if you define (type t = private int) in an implementation, you don't have an unequality between (t) and (int), as you can check e.g. with

  type priv = private int

  type _ t =
  | A : priv t
  | B : int t
 
  let should_be_exhaustive : int t -> unit = function
  | B -> ()
  (* exhaustiveness warning *)

I think if "new" is defined in terms of subtyping, it should really be consistent in its behavior with "private". Maybe we can change "private" so that implementation provides unequality, but I find it quite distasteful that

  (* some stuff *)
  type t = private foo   (* or: new foo *)
  (* other stuff *)

would have different typing properties than

  module Parametrized (M : sig
    (* some stuff *)
    type t = private foo
  end) = struct
    include M
    (* other stuff *)
  end

Re. the signature ascription: I had an example in mind of the form

   module M : sig
     type 'a _t
     type 'a t = new 'a _t
     type 'a t_list = Nil | Cons of 'a t * 'a t_list
     val foo : 'a t_list -> int
   end = struct
     type 'a _t = int
     type 'a t = new int
     type 'a t_list = Nil | Cons of 'a t * 'a t_list
     let rec foo = function
       | Nil -> 0
       | Cons (x, xs) -> (x :> int) + foo xs
   end

In this case, the fact that I want my externally visible ('a t_list) to be defined in term of the "proper" type ('a t), the one with the good inequality guarantees, forces me to use coercions when manipulating values of type ('a t_list) internally, because I don't think it would make sense to somehow "coerce" the datatype definition.

(So ascription-derived coercion seem like a good idea, but if I've understood correctly they're more of an ad-hoc help than a fundamentally game-changing feature when evaluating the idea of "new" types, so maybe they can be discussed separately.)

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

That would still be a departure from the way "private" is handled right now: if you define (type t = private int) in an implementation, you don't have an unequality between (t) and (int), as you can check e.g. with

type priv = private int

type _ t =
| A : priv t
| B : int t

let should_be_exhaustive : int t -> unit = function
| B -> ()
(* exhaustiveness warning *)

These exhaustiveness warnings are not done using equality (in the sense of unification). They use the more loosely defined "compatibility" as defined by Ctype.mcomp. In fact priv and int are not equivalent, as demonstrated by:

  # let f (x: priv) = (x: int);;
  Error: This expression has type priv but an expression was expected of type
           int

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @gasche

Ok, but I think my point about

  module Parametrized (M : sig
    (* some stuff *)
    type t = private foo
  end) = struct
    include M
    (* other stuff *)
  end

still stands: if in "some stuff" you include an equality witness (t, foo) eq -- and those are definable in a module with the stronger implementation (type t = private foo) -- you can then get a context where priv and int end up being unifiable.

On the contrary, a transparent one-element variant never allows convertibility, because it is not the abstraction of a stronger definition that does. (Of course, the downside of this property is that it always require a verbose wrapping/unwrapping.)

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 15, 2013

Comment author: @lpw25

if in "some stuff" you include an equality witness (t, foo) eq -- and those are definable in a module with the stronger implementation (type t = private foo) -- you can then get a context where priv and int end up being unifiable.

t and foo can only ever be unifiable if somewhere there is a definition

type t = foo

An equality witness cannot be created based on type t = private foo.

Since my proposal was to ban definitions of the form:

type 'a t = int

there is no way for int and 'a t to become unifiable (or more importantly for int t and bool t to become unifiable).

Note that I am not proposing any change to how "private" is interpreted, and that "new" is a natural extension to "private"

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 16, 2013

Comment author: @gasche

After discussing the matter with Didier (he was the only person around able to decipher Jacques killing 2.00 example), I have a better understanding of the issue. Sorry, Jacques, for having been a bit slow to understand some of the subtleties!

  • "injectivity", as is (not generativity/nominality, etc.), is necessary to fix the "constraint" example, whose soundness is motivated solely by an injectivity argument. Adding Jacques' #'a annotation on type parameters is the right solution for this problem.

  • GADTs have a (secondary) need for type inequalities. We used to define abstract types for their phantom components, but this raises two independent issues:

    type z
    type 'a s
    type _ nat = Z : z | S : 'n nat -> ('n s) nat
    
    let two : (z s s) nat -> unit = function (S (S Z)) -> ()
    

    (1) A phantom abstract type may not be injective. This example relies on the fact that no ('n s) other than ('n = z s) is equal to (z s s), that is the injectivity of 's'. One may decide that private types introduced in implementations are always injective, or (my preference) allow the following as an abstract implementation

    type #'a s
    

    (2) Two abstract types constructors may not be distinct. This example also relies on the fact that the Z branch is dead, that is ('a s != z) for any 'a. If we really don't want to have to write

      type z = Z
      type 'a s = S of 'a
    

    we need to think about a way to express that two abstract types are distinct from each other, without giving their definition. This is a less pressing need than injectivitiy annotations (it doesn't affect soundness), but still important for GADTs (we want to know which inequalities are derivable). "new" (Jacques' version rather than Leo's) might be a solution to this problem. The problem with only assuming that abstract type definitions have this property (as I believe is, soundly, currently done) is that there is no way to preserve these inequalities through a signature ascription, which breaks modularity.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 22, 2013

Comment author: @garrigue

Update: branches/non-vanishing is now ready for testing.
It implements most of the ideas discussed here:

type (#'a) t for injective type parameters
type 'a t = new int for concrete types convertible by subtyping

Also, subtyping is allowed during module subtyping.
I have done only light testing myself, but at least the typing testsuite goes through,
and some counter-examples properly fail.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 22, 2013

Comment author: @garrigue

As a small note, you can look at the code in branches/non-vanishing/testsuite/tests/typing-gadts/omega07.ml for how to define type indexes that work even outside of the local module.
The idea is just to define them as normal datatype with one constructor.

type ('a,'b) fk = FK

This way they are distinct from any type with a different constructor name, and all datatypes are
injective, so there is no need to add arguments to FK.

All the infrastructure with new and # is only intended for abstract types, i.e. when you don't want to reveal the internal representation of. In some way this is not strictly needed. One can always exports two types:

type 'a t0
type 'a t = MyT of 'a t0

This is sufficient to make t injective and distinct from other types.
This just happens to be rather heavy.

Of course, we are still to solve the problem of providing "unique" abstract types.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 22, 2013

Comment author: @gasche

Of course, we are still to solve the problem of providing "unique" abstract types.

What about

new type 'a t

considered semantically equivalent to

type 'a t0
type 'a t = new 'a t0

?

Does this have the expected properties of being provably distinct from all other known-to-be-concrete (by transparency or exhibition of new-ness) datatypes?

On the other hand, one may want (in the general case where "new" here is just understood as a marker for "uniquenes")

module M : sig
new type ('a, 'b) fk
end = struct
type ('a, 'b) fk = FK
end

to type-check.

It would be tempting to have new-in-signature be a valid signature for both new-in-structures and (nominal) concrete datatypes.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 22, 2013

Comment author: @garrigue

I thought of something like that, say

type 'a t = new

to keep a closer syntax.
However, the problem is not the syntax.
It's just very hard to get it right.

First problem, you cannot allow it to wrap normal datatypes, because then you wouldn't be able to distinguish new types from datatypes:

module M : sig
type 'a t = new
type 'a u = T
end = struct
type 'a t = T
type 'a u = 'a t = T
end

This can be kind of fixed by adding "new" datatypes too:

type 'a t = new T

Here a "new" datatype is just like a normal datatype, except
that you cannot alias it, i.e. type 'a u = 'a t = T becomes illegal.

The next problem is how to prevent aliasing.
Part of the solution is that for a type to be exported as "new",
it must have been defined as "new".
This alone already eliminates "include" as it is interpreted in
terms of aliases.
We also have to prevent "copying" a module:
module M2 = M
which could create another type sharing the same new.
We must also be careful about functors.
And I'm not sure the list is exhaustive.
For instance, dynamic types are going to be a problem
too (actually their witnesses share very similar problems)

So yes, this is the basic idea, but integrating it in the language
is not going to be that easy.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 22, 2013

Comment author: @gasche

This all looks quite tricky; in particular, the export of datatype equality makes thing less generative than we would expect them to be.

Maybe we could consider forgetting about "unique" abstract datatypes and keep the
type ('a, 'b) fk = Fk
style for now, as long as the rest of the injectivity+new system stabilizes?

Do we need unique abstract types for anything other than the already suspicious phantom type tricks with GADTs?

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 23, 2013

Comment author: @garrigue

I'm not sure what you mean by "already suspicious phantom type tricks with GADTs".
A typical example where we need unique abstract types is type witnesses:

type _ t =
  | Tint : int t
  | Tarrow : 'a t * 'b t -> ('a -> 'b) t
  | Tarray : 'a t -> ('a array) t
  | Tqueue : 'a t -> ('a Queue.t) t

Here int and array are handled by the special case for builtin types, so we now
that they are unique. But this is not the case for Queue.t.
As a result the above type cannot be defined, even in the non-vanishing branch.
That part is easily solved by adding an injectivity annotation in queue.mli.
However, if we cannot use type annotation to discard cases in pattern matching:

let f (type a) (x : a Queue.t t) =
match x with Tqueue ta -> ...

becomes partial, and to make it complete we must write all cases, since Queue.t
could be absolutely anything.

Another useful point of unique abstract types is that they do not depend on
their names: you cannot happen to create two indistinguishable types just
because you gave them the same name.

But of course this is not as bad as the injectivity problem, since at least you
can complete your pattern-matching with some "assert false".

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 23, 2013

Comment author: @garrigue

Just for fun, I've added abstract new types and new datatypes:

type t = new
type t = new A | B
type t = new {x : int}

The idea is that an abstract new type can only abstract a new type, and since they do not allow aliasing, they are all different.
A new datatype is just like a normal datatype (no coercion needed), except that it cannot have an equation, and cannot be the target of the equation of a normal datatype.
At the module level, strengthening is disabled, so that different paths are really different types
(just copies of the original one).

At this point I have no proof of correctness whatsoever, but it might be an alternative to variance annotations (any new type is automatically injective), which additionally ensures uniqueness.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 23, 2013

Comment author: @garrigue

Unfortunately, this is broken:

module M : sig type t = new type _ is_t = I : ('a,t) comp -> 'a is_t end =
struct type t = new int type _ is_t = I : ('a,t) comp -> 'a is_t end;;

module N = M;;

let e = M.I Eq;;
let N.I e' = e;; (* (M.t, N.t) comp )
match e' with Diff -> false;; (
doesn't warn *)

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 24, 2013

Comment author: @garrigue

There is now a new version of abstract and concrete new types in branches/abstract-new

The idea is that taking a copy of a module will turn all new types into aliases to the original definition (when there is a valid path), or just turn them into abstract types when there is no valid path.

module M = struct type t = new int end;;

module M : sig type t = new int end

module N = M;;

module N : sig type t = M.t end

The same thing of course happens if you path a named module to a functor, so that you cannot preserve the newness there.

For this reason, while this is a nice approach, it doesn't seem to be a full replacement for injectivity annotations: they are the only way to keep abstract types injective in functors, or when the original definition is not visible.

Again, examples are in typing-private/new.ml

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 29, 2013

Comment author: @alainfrisch

Setting Target Version to 4.01: we probably need to do something before this release, at least assuming that all parametrized abstract types (except predefined ones such as lazy, array) are non injective.

@vicuna
Copy link
Author

@vicuna vicuna commented Apr 30, 2013

Comment author: @garrigue

Fixed in trunk. (commit 13634)
Injectivity is now properly inferred, but there is no syntax for asserting injectivity of abstract types.

@vicuna
Copy link
Author

@vicuna vicuna commented Jun 14, 2013

Comment author: @alainfrisch

Keeping open, because of "no syntax for asserting injectivity of abstract types", but this doesn't look like a blocking issue for 4.01 anymore.

@github-actions
Copy link

@github-actions github-actions bot commented Jul 21, 2021

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jul 21, 2021
@yallop
Copy link
Member

@yallop yallop commented Jul 21, 2021

Keeping open, because of "no syntax for asserting injectivity of abstract types"

Since 4.12.0 we have a way to assert injectivity of abstract types: type !'a t (#9500, #9727).

@yallop yallop closed this Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants