Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't treat definitions from the current module specially #8900

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

lpw25
Copy link
Contributor

@lpw25 lpw25 commented Aug 28, 2019

The type-checker treats abstract type definitions in the current specially:

# type (_, _) eq = Refl : ('a, 'a) eq;;
    
# module M = struct
      type a
      type b
      let absurd : 'a. (a, b) eq -> 'a = function _ -> .
    end;;
module M : sig type a type b val absurd : (a, b) eq -> 'a end

# module M = struct
      module N = struct
        type a
        type b
      end
      let absurd : 'a. (N.a, N.b) eq -> 'a = function _ -> .
    end;;
Line 6, characters 54-55:
6 |       let absurd : 'a. (N.a, N.b) eq -> 'a = function _ -> .
                                                          ^
Error: This match case could not be refuted.
       Here is an example of a value that would reach it: Refl

In my experience this special treatment confuses people a lot and is not really useful for anything.

This PR removes the special treatment. This is not a backwards compatible change so it could probably use some testing on opam.

@glondu
Copy link
Contributor

glondu commented Aug 28, 2019

In your example, the special treatment looks legit: two abstract types defined in another module (N.a and N.b) could actually be equal (so that a value of type (N.a, N.b) eq can be constructed) whereas when they are defined in the same module, they are different by construction.

@lpw25
Copy link
Contributor Author

lpw25 commented Aug 28, 2019

they are different by construction

This is debatable.

When I declare an abstract type I might mean a fresh empty type. I might also mean a non-empty type where all the values are constructed using C primitives. For example, Obj.t is declared as an abstract type in Obj but it is the opposite of empty.

Similarly, when I declare two abstract types I might mean two separate types, but I also might mean two equivalent types whose equivalence I do not intend to expose. Without a concrete definition OCaml cannot be certain that two types are genuinely incompatible.

However, this is all besides the point. The issue is more practical than that. Treating types from the same structure differently simply creates confusion for no increase in expressivity. Having explained what is happening is such cases many, many times I think it would be easier to just remove this special casing.

@Drup
Copy link
Contributor

Drup commented Aug 28, 2019

On paper I agree, but I'm fairly certain this is going to break some code, so we really need to evaluate this on opam.

@trefis
Copy link
Contributor

trefis commented Aug 28, 2019

I'm fairly certain this is going to break some code, so we really need to evaluate this on opam.

Yes… and no?
The fix is trivial, backward compatible, and even if many packages break, it's unlikely that any package will be broken in more than one or two places.

Which doesn't mean that we shouldn't run this on opam. But I think once that's done we just want to merge, and notify maintainers that they should update their package.

@garrigue
Copy link
Contributor

garrigue commented Sep 2, 2019

As much as I wish I hadn't created this difference in the first place, this needs to be debated and advertised.
This basically means that almost all code snippets using GADTs in ocaml available around will break, which may not be the best way to help beginners.

Personally I would advocate to wait until we provide a way to define abstract types with an identity, which would also avoid the workaround we have for abstract types from the initial environment.
Actually, the current rule is not that complicated: the abstract type treated specially are those which have a non-prefixed, be it because they are from the initial environment or from the current module.

@gasche
Copy link
Member

gasche commented Sep 2, 2019

This basically means that almost all code snippets using GADTs in ocaml available around will break, which may not be the best way to help beginners.

I think that this point should make us take a long, hard pause.

@lpw25
Copy link
Contributor Author

lpw25 commented Sep 2, 2019

I really hope this doesn't break many code snippets, because such snippets are very confusing for beginners. They are wrong and teach people to use GADTs incorrectly. I suspect that making such snippets be an explicit error will probably improve the situation for beginners.

@yminsky
Copy link

yminsky commented Sep 2, 2019

Maybe the way to make progress on this PR is to gather information from opam on the nature of the breakages that it creates? That should be some clue as to the nature of the incompatibilities introduced by the change.

@gasche
Copy link
Member

gasche commented Sep 2, 2019

An example that works today and would be broken by the change is:

type true_type
type false_type

type _ singleton_bool =
  | True : true_type singleton_bool
  | False : false_type singleton_bool

Where true_type and false_type are "axiomatized types" (abstract types declared in the local module, that will never be given a definition), so they are considered as distinct (today) and eg. true_type singleton_bool can only be True. (To fix the example under the change, one has to give a phony definition to those types, for example = Phony in both cases.)

However, the example I tried to write first is below, and I realized that it is in fact rejected today.

type zero
type 'n succ

# type ('a, _) vec =
  | Nil : ('a, zero) vec
  | Cons : 'a * ('a, 'n) vec -> ('a, 'n succ) vec
;;
Error: In this definition, a type variable cannot be deduced
       from the type parameters.

If I understand correclty, this fails because axiomatized types (abstract types declared in the current module) are not considered injective in their parameters: succ is not injective in its parameter, so that 'n is not "deducible" from 'n succ in the return type of Cons.
(Note: the error message could be more helpful, for example by pointing out the constructor for which the problem occurs, and maybe even the parameter position that cannot be determined.)

This suggests that only zero-ary uses of axiomatized types could be used for GADTs in the wild, and that the axiomatized-types approach would (already today) soon show its limitations to the GADT user.

@Octachron
Copy link
Member

I have run some tests on the opam repository comparing a version of the compiler with this PR integrated and without:

The handful of new failures seems completely unrelated.

@lpw25
Copy link
Contributor Author

lpw25 commented Oct 4, 2019

Thanks @Octachron. I think those results on opam are very encouraging. Are people still nervous about making the change, or can we go ahead with it? If people are nervous, are there further experiments that would help make them less nervous?

@garrigue
Copy link
Contributor

garrigue commented Oct 4, 2019

As I said above, I still advocate to wait until we have a way to give abstract types an identity, so that we could also remove the special case for predefined abstract types.
I would not oppose a warning, but this may be more work (hard to track whether this was really needed).

@gasche
Copy link
Member

gasche commented Oct 4, 2019

I find the results generally reassuring.

We had this intuition (Jacques mentioned it and I shared it) that many beginner-oriented tutorials would be affected by the change. Could we maybe look at a couple such places (starting with the nice introduction to GADTs by @Octachron in the manual) and check that there is no issue there as well?

@Octachron here are some comments from my experience looking at your opamcheck-PR results:

  1. The basic information we get is that there was a working version-choice (a choice of versions for all dependencies of the package) before, and the tool did not find a working version-choice after. But opamcheck tried hard to find a version-choice, exploring different choices that are different from the working before-choice (which may help finding a success); when they all fail, it reports one of the failure it uses a very different version-choice from before. As a result, often the failure is unrelated to the PR at hand and it makes things hard to understand. (example) Could we always see the failure corresponding to the same version-choice as the working pre-PR version-choice?

  2. There seems to be something odd going on with the bytes/string difference, or at least I'm missing something. See for example the macaroons failure: both the working before-choice and the failing after-choice are using 4.08.1, yet the first builds and the second fails with a seemingly classic bytes/string type incompatibility. What is going on? Could this be related to the present PR?

I wish we could investigate point (2), which may be related to the current PR.

@Octachron
Copy link
Member

Concerning tutorials, I would argue that any tutorial relying on the special behavior for local abstract types is dangerous.

Such tutorial teaches how to write toy examples that become mysteriously broken once you try to transform them to real code by putting the code in a separated module. This is utterly confusing for most people.

@Octachron
Copy link
Member

After a bit of investigation, the macaroon failure is not directly related, the package builds with just 4.08.0+pr8900 .

Concerning tutorials, the manual tutorial and ocamlverse tutorials would not need to be changed after this PR. Scouring the first 10 pages of google results for blog post on GADTs in OCaml, I only found two blog posts using abstract types as type level GADT. (I sent a fix for one of them). However, there was a handful of lecture slides subject to this issue. (For instance, https://www.irif.fr/~treinen/teaching/pfav/cours/cours10.handout.pdf or http://yann.regis-gianas.org/mpri/01-gadts-checking.pdf )

@gasche
Copy link
Member

gasche commented Oct 8, 2019

cc @yurug @treinen: this change to OCaml will break your teaching documents for GADTs, because the pattern of declaring fully-abstract types locally will stop working. Instead of

type empty
type non_empty
type _ gadt = Foo : empty gadt | Bar : non_empty gadt

you should use known-distinct types for the phantom types, for example

type empty = Empty
type non_empty = Non_empty
type _ gadt = Foo : empty gadt | Bar : non_empty gadt

@yurug
Copy link

yurug commented Oct 8, 2019

Hi @gasche! My mpri slides are already using this idiom, which I find more sensible.

@lpw25
Copy link
Contributor Author

lpw25 commented Oct 14, 2019

Do we want to consider this for 4.10? or would people like more time to make a decision?

@Drup
Copy link
Contributor

Drup commented Oct 15, 2019

As I said above, I still advocate to wait until we have a way to give abstract types an identity, so that we could also remove the special case for predefined abstract types.

@garrigue Do you have a concrete plan for that ? If you don't, I propose we just merge the current version, and improve the situation for predefined abstract types later.

@garrigue
Copy link
Contributor

I can write a proposal this week. Or earlier if you prefer.
Basically the idea would be to use an attribute to indicate an id for a pure abstract type (for which no implementation can be given inside ocaml). Then two types with different ids are different.

@garrigue garrigue mentioned this pull request Oct 15, 2019
@garrigue
Copy link
Contributor

Just added a proof of concept proposal of unique types: #9042

@Octachron
Copy link
Member

I am still very much in favor of this PR, and still semi-regularly explaining that the special behavior for abstract types defined in the current module is a beginner trap.

@gasche
Copy link
Member

gasche commented Feb 4, 2022

It's an old discussion, people are generally in agreement that the current behavior is not good, and the opam results are good. Sounds good, right?

Besides the question of teaching materials (which I think is important, but basically impossible to quantify), the thing that is giving me pause is that personally I consider that the idiom type foo = Foo is hack. It's using a feature that was not meant to do this, in a way that is clever and looks nice, but the intent is hard to understand for people not already familiar with the technique, and it has unintended term-level consequences (we can write terms that are not supposed to make sense).
So we would be deprecating something which made sense (which known downsides) to replace it with a hack?

@garrigue is patiently pushing for a different approach using "generative abstract types" (see #9042 and ocaml/RFCs#4). I haven't invested enough effort in the proposal to know what to think of it, but in general I admire the idea of not satisfying ourselves with a hack, and trying to find a feature that captures our intent here. (And I think that the external/generative/nominal proposals are also justified by other needs, for example @alainfrisch's interest in runtime type representations, it's not like the admittedly-small problem we are discussing here is the only motivation.)

You want more support (at least from @garrigue and myself) for changing the behavior of type <foo> in modules? Please go reach a consensus on Jacques' proposals :-)

@yallop
Copy link
Member

yallop commented Jan 9, 2023

Besides the question of teaching materials (which I think is important, but basically impossible to quantify), the thing that is giving me pause is that personally I consider that the idiom type foo = Foo is hack.

I'm not especially keen on it, either, but I consider type foo = | better, because it doesn't have this drawback:

it has unintended term-level consequences (we can write terms that are not supposed to make sense).

Copy link
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that I've been long in advancing the nominal types, I agree that it is better to get rid of this kludge.
I expected more changes in the testsuite.
This seems to show that I'm already always writing my code that doesn't depend on this.

Before merging, there should be a log message explaining the change and how to fix code in case it breaks it.

testsuite/tests/typing-gadts/test.ml Outdated Show resolved Hide resolved
typing/ctype.ml Outdated Show resolved Hide resolved
@gasche
Copy link
Member

gasche commented Jan 12, 2023

@yallop

I'm not especially keen on it, either, but I consider type foo = | better, because it doesn't have this drawback:

I am worried by the interaction between this choice and the fact that people are asking for structural subtyping specifically for empty types (see eg #9459). We can add structural subtyping without type equalities, but it feels to me like we are treading on very fragile ground with this choice.

On the other hand I actually like @lpw25's proposal of using type foo = private Foo instead. This is quite neat.

@yallop
Copy link
Member

yallop commented Jan 12, 2023

Could you say exactly what you're worried about? I don't see what problem there could be.

Copy link
Member

@yallop yallop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to add @stedolan's segfaulting example from #11492 to the test suite in this PR.

We could also include a variation on the example that uses ascription rather than functors

module M = struct type t = int let int = Int end
include (M : sig type t val int : t is_int end)

since that's also currently unsound.

@gasche
Copy link
Member

gasche commented Jan 12, 2023

I am worried about the fact that someday type t = | and type u = | could be treated as equal types, breaking the distinctness assumption of people using them as generative phantom GADT markers.

@yallop
Copy link
Member

yallop commented Jan 12, 2023

I'm not worried about that, because it doesn't seem any more likely than type s = Zero and type t = Zero being treated as equal types.

@lpw25
Copy link
Contributor Author

lpw25 commented Jan 12, 2023

I think maybe I'm missing something: whilst type t = | and type u = | are aren't known to be equal, they also aren't known to be not equal, so they don't work well as GADT indices.

@gasche
Copy link
Member

gasche commented Jan 12, 2023

At least this currently works (OCaml 5.0)

type t = |
type u = |
type _ gadt = T : t gadt | U : u gadt

let f : t gadt -> unit = function T -> ()
(* no exhaustivity warning *)

@yallop
Copy link
Member

yallop commented Jan 12, 2023

Oh, @lpw25 is right, of course. While they're currently known to be unequal, they won't be after this PR.

@gasche
Copy link
Member

gasche commented Jan 12, 2023

Hm, this also means that my old proposal to allow renaming constructors when re-exporting a datatype is definitely dead for good after this PR. (Not that I was very insistent on it or anything .)

It will be even harder if this PR is merged to reason on the fact that types are known to be incompatible. I guess that it will be a motivation to look again at #9042 or other proposals in that space.

@lpw25
Copy link
Contributor Author

lpw25 commented Jan 12, 2023

Rebased, comments addressed and Changes entry added

@gasche
Copy link
Member

gasche commented Jan 13, 2023

Note: this PR should also update the manual:

Of course, not all abstract types can be refined, as this would
contradict the exhaustiveness check. Namely, builtin types (those
defined by the compiler itself, such as "int" or "array"), and
abstract types defined by the local module, are non-instantiable, and
as such cause a type error rather than introduce an equation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants