Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local scopes to detect univar escapes #10787

Open
wants to merge 25 commits into
base: trunk
Choose a base branch
from

Conversation

mrmr1993
Copy link
Contributor

This PR implements a 'local scope' mechanism, used to track the number of binders (currently only Tpolys) that types appear within, and used to detect scope escapes. This is pretty closely related to @trefis's extended levels proposal, with some key differences:

  • the existing 'level' concept is left as-is, and the new level is stored separately
    • this is important for modular explicits, where we want to detect path escapes both where the level of some concrete path is too high, and where the local scope of some local path appears outside of that local scope
  • the levels update mechanism detects escapes of Tunivars, rather than blindly lowering them to an inaccurate level
  • the univars_escape checks previously used by enter_poly have been removed, since they can be entirely replaced by the new univar escape checks in update_level
  • copying of types also does some adjustment of local scopes, to ensure that the expansion of e.g. 'a 'b. ('a, 'b) contains_polys still maintains a correct local scope structure when new levels of nesting are introduced.

This PR also contains some miscellaneous fixes enabled by this new approach:

  • copying in the presence of univars within non-fixed rows now works correctly
    • e.g. let f (n : < m : 'a. [< Foo of 'a & int | Bar] >) = (n : < m : 'b. [< Foo of 'b & int | Bar] >)
  • row types with shared row_mores but containing distinct univars are now printed in their entirety, instead of as an opaque variable
    • the printing should perhaps still be amended to mark the rows as associated e.g. < m : 'a. [< Foo of 'a | Bar] as 'd > -> < m : 'b. [< Foo of 'b | Bar] as 'd > -> < m : 'c. [< Foo of 'c | Bar] as 'd >
  • a known issue with the interaction between type constraints and class types has been fixed.

Technical notes:

  • type_expr and friends now have a local scope (called lscope in the code for brevity), which is tracked and updated in a similar fashion to levels
  • a new concept of more_variables has been added to Typetexp, so that non-fixed rows and objects may be associated with univars as before, but have their local scope lowered if they aren't captured as such
  • all other type variables (excl. quasi-univars, where we temporarily use a Tvar in place of a Tunivar while initially constructing a Tpoly) are placed at local scope 0, ensuring that they can't capture univars and allow them to escape.

Aside from getting the algorithms for copy and update_level correct, most of the trickiness comes from choosing the 'correct' local scope, especially in interaction with row and object types. We can't blindly use a current_level like we do for existing levels, so unfortunately this PR introduces a lot of explicit arguments for the lscope into various functions.

@trefis
Copy link
Contributor

trefis commented Nov 23, 2021

I haven't looked at the code yet¹, and there are still parts of the design that I'm unsure of (mainly the decision to have a separate field for « local scopes », which makes sense, but which could also be argued against).
Nevertheless, one issue came to mind upon reading your description:

the univars_escape checks previously used by enter_poly have been removed, since they can be entirely replaced by the new univar escape checks in update_level

I agree with this, except some parts of the typechecker call enter_poly, without calling update_level: mcomp, equal, etc.
So by just removing the checks from enter_poly, you allow univars to escape in those places.
For instance, the following is accepted on your branch:

module M : sig
  type t = < m: 'a. < m: 'b. 'a (* fresh at each cycle *) * 'x > option > as 'x
end = struct
  type t = < m: 'a. (< m: 'b. 'a (* always pointing to the first binder *) * < m: 'c. 'x option > > as 'x) option >
end

while it is rejected with the following error on 4.14:

Error: Signature mismatch:
       ...
       Type declarations do not match:
         type t = < m : 'a. (< m : 'a * < m : 'b option > > as 'b) option >
       is not included in
         type t = < m : 'a. < m : 'a * 'b > option > as 'b
       The type < m : 'a. (< m : 'a * < m : 'b option > > as 'b) option >
       is not equal to the type < m : 'a. < m : 'a * 'c > option > as 'c
       The method m has type < m : 'a * < m : 'd > > option as 'd,
       but the expected method type was
       'a0. < m : 'a0 * < m : 'a0. 'e > > option as 'e
       The universal variable 'a would escape its scope

tl;dr: local scopes are going to matter for checking equality / compatibility of types.

Although I don't think we want to be calling update_level in those places. We probably just want to check that the scopes are the same. I hope that's going to be enough, given the contexts those functions are called from, and that it's not going to give spurious failures. (Just checking the scopes are the same is probably not going to work, we will have to adjust them. But that's not unheard of, moregen already does some of this)


1: I will do so at some point, but it's probably going to wait a few weeks. (But given that nothing is going to get merged before multicore anyway, it shouldn't be a problem?)

@trefis trefis closed this Nov 23, 2021
@trefis trefis reopened this Nov 23, 2021
@garrigue
Copy link
Contributor

This looks very promising, and I should look into this in detail, but something surprises me in the presentation:

row types with shared row_mores but containing distinct univars are now printed in their entirety, instead of as an opaque variable. The printing should perhaps still be amended to mark the rows as associated e.g. < m : 'a. [< Foo of 'a | Bar] as 'd > -> < m : 'b. [< Foo of 'b | Bar] as 'd > -> < m : 'c. [< Foo of 'c | Bar] as 'd >

There is an invariant that variant types sharing the same row variable should be identical up to levels of internal nodes.
So the type you are showing here is in my view plainly invalid.
If you start to break that, I'm afraid a lot of things need to be fixed, and we need a new theory.
If the invariant was already broken, this is probably a bug.

@@ -648,7 +648,7 @@ val f :
- : (< m : 'a. 'a * < p : 'b. 'b * 'd * 'c > as 'd > as 'c) ->
('f * < p : 'b. 'b * 'e * 'c > as 'e)
= <fun>
- : < m : 'a. < p : 'a; .. > as 'b > -> 'b = <fun>
- : < m : 'a. < p : 'a; .. > > -> < p : 'b; .. > = <fun>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the type annotations should probably be rejected as ill-formed: one cannot have a free row variable on a variant type containing univars. I don't think there is any way to build such a type other than such an ill-formed type annotation.

@mrmr1993
Copy link
Contributor Author

There is an invariant that variant types sharing the same row variable should be identical up to levels of internal nodes.
...
If the invariant was already broken, this is probably a bug.

The invariant was already broken, see e.g. (tested in a 4.11.2 toplevel):

# fun (x : <a: 'a. [> `A of 'a]>) (y : <a: 'b. [> `A of 'b]>) b -> if b then x else y;;
- : < a : 'a. [> `A of 'a ] as 'c > ->
    < a : 'b. 'c > -> bool -> < a : 'a. 'c >
= <fun>

Ctype.copy was broken when handling these types, so they become 'toxic' as soon as they are committed to the environment:

# let f (x : <a: 'a. [> `A of 'a]>) (y : <a: 'b. [> `A of 'b]>) b = if b then x else y;;
Error: Values do not match:
         val f :
           < a : 'a. [> `A of 'a ] as 'c > ->
           < a : 'b. 'c > -> bool -> < a : 'a. 'c >
       is not included in
         val f :
           < a : 'a. [> `A of 'a0 ] as 'c > ->
           < a : 'b. 'c > -> bool -> < a : 'a0. 'c >

This PR fixes that broken Ctype.copy behaviour, and improves the printing (although not perfectly, since we don't have a syntax for capturing just the 'row more').

If you start to break that, I'm afraid a lot of things need to be fixed, and we need a new theory.

Do you have a sense of what we should expect to be broken? I would like to try to get this fixed as part of the modular explicits work, so that we don't get strange errors with the equivalent in local modules e.g.

module type S = sig
  type t
  val f : t -> [> `A of t]
end
let f (module M: S) x = M.f x
(* val f : (module M: S) -> M.t -> [> `A of M.t] *)
let g b = if b then f else f

Handling these cases is obviously less pressing than the more general concept, so I'm happy to separate out these changes from this PR if they're too objectionable.

@garrigue
Copy link
Contributor

Indeed, I could already see from your PR that there were examples in poly.ml that broke this invariant.
I'm not sure why this went undetected.
Basically, all the theory assumes that types that share the same row variable are identical. You can see my papers on structural polymorphism for details, under a slightly different representation (there are no row variables, rather the types themselves are assumed to be variables constrained by a kind).

So the answer is that the printer is correct, I think that Ctype.copy is correct too, what is broken is Typepetexp.transl_type which should reject types where a univar appears free in a type whose row variable is a normal variable (or a univar bound at a more external level than the one in the body). This is a bit tricky to do, as currently transl_type uses a notion of "potential univar", so whether a variable is a normal one or a univar is decided later. An extra pass on the translated type would be the solution, particularly with your new representation that should make detecting binding easier.

@garrigue
Copy link
Contributor

By the way, the relation of your explicit modular example with this problem is subtle.
A way to see it is that M.t is bound by (module M : S), while the row variable is quantified on the whole function, so ends up being more external:

val f : forall 'a. (module M: S) -> M.t -> [> `A of M.t] as 'a

However, the relaxed value restriction says that since 'a only appears strictly positively, it is not constrained by potential side-effects. Moreover Ohori has developed a theory of rank-0 polymorphism, showing that one can do inference with quantifiers deeper in types, as long as their are at strictly positive positions. So actually, we can see this type as:

val f : (module M: S) -> M.t -> forall 'a. [> `A of M.t] as 'a

which is fine.
Of course we need a proper theory for that, but at least this case seems OK.

Yet, the symmetrical case with

val f : [< `A of t] -> t

inside S seems to require real rank-0 polymorphism (the row variable is in a contravariant position). I.e., the type of the using function should really be:

val f : (module M: S) -> forall 'a. ([> `A of M.t] as 'a) -> M.t

It is not that surprising that explicit polymorphism would require a finer expression of polymorphism in other places.

@gasche
Copy link
Member

gasche commented Jan 13, 2023

cc @goldfirere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants