Use local scopes to detect univar escapes #10787

mrmr1993 · 2021-11-22T12:08:00Z

This PR implements a 'local scope' mechanism, used to track the number of binders (currently only Tpolys) that types appear within, and used to detect scope escapes. This is pretty closely related to @trefis's extended levels proposal, with some key differences:

the existing 'level' concept is left as-is, and the new level is stored separately
- this is important for modular explicits, where we want to detect path escapes both where the level of some concrete path is too high, and where the local scope of some local path appears outside of that local scope
the levels update mechanism detects escapes of Tunivars, rather than blindly lowering them to an inaccurate level
the univars_escape checks previously used by enter_poly have been removed, since they can be entirely replaced by the new univar escape checks in update_level
copying of types also does some adjustment of local scopes, to ensure that the expansion of e.g. 'a 'b. ('a, 'b) contains_polys still maintains a correct local scope structure when new levels of nesting are introduced.

This PR also contains some miscellaneous fixes enabled by this new approach:

copying in the presence of univars within non-fixed rows now works correctly
- e.g. let f (n : < m : 'a. [< Foo of 'a & int | Bar] >) = (n : < m : 'b. [< Foo of 'b & int | Bar] >)
row types with shared row_mores but containing distinct univars are now printed in their entirety, instead of as an opaque variable
- the printing should perhaps still be amended to mark the rows as associated e.g. < m : 'a. [< Foo of 'a | Bar] as 'd > -> < m : 'b. [< Foo of 'b | Bar] as 'd > -> < m : 'c. [< Foo of 'c | Bar] as 'd >
a known issue with the interaction between type constraints and class types has been fixed.

Technical notes:

type_expr and friends now have a local scope (called lscope in the code for brevity), which is tracked and updated in a similar fashion to levels
a new concept of more_variables has been added to Typetexp, so that non-fixed rows and objects may be associated with univars as before, but have their local scope lowered if they aren't captured as such
all other type variables (excl. quasi-univars, where we temporarily use a Tvar in place of a Tunivar while initially constructing a Tpoly) are placed at local scope 0, ensuring that they can't capture univars and allow them to escape.

Aside from getting the algorithms for copy and update_level correct, most of the trickiness comes from choosing the 'correct' local scope, especially in interaction with row and object types. We can't blindly use a current_level like we do for existing levels, so unfortunately this PR introduces a lot of explicit arguments for the lscope into various functions.

trefis · 2021-11-23T13:06:05Z

I haven't looked at the code yet¹, and there are still parts of the design that I'm unsure of (mainly the decision to have a separate field for « local scopes », which makes sense, but which could also be argued against).
Nevertheless, one issue came to mind upon reading your description:

the univars_escape checks previously used by enter_poly have been removed, since they can be entirely replaced by the new univar escape checks in update_level

I agree with this, except some parts of the typechecker call enter_poly, without calling update_level: mcomp, equal, etc.
So by just removing the checks from enter_poly, you allow univars to escape in those places.
For instance, the following is accepted on your branch:

module M : sig
  type t = < m: 'a. < m: 'b. 'a (* fresh at each cycle *) * 'x > option > as 'x
end = struct
  type t = < m: 'a. (< m: 'b. 'a (* always pointing to the first binder *) * < m: 'c. 'x option > > as 'x) option >
end

while it is rejected with the following error on 4.14:

Error: Signature mismatch:
       ...
       Type declarations do not match:
         type t = < m : 'a. (< m : 'a * < m : 'b option > > as 'b) option >
       is not included in
         type t = < m : 'a. < m : 'a * 'b > option > as 'b
       The type < m : 'a. (< m : 'a * < m : 'b option > > as 'b) option >
       is not equal to the type < m : 'a. < m : 'a * 'c > option > as 'c
       The method m has type < m : 'a * < m : 'd > > option as 'd,
       but the expected method type was
       'a0. < m : 'a0 * < m : 'a0. 'e > > option as 'e
       The universal variable 'a would escape its scope

tl;dr: local scopes are going to matter for checking equality / compatibility of types.

Although I don't think we want to be calling update_level in those places. ~~We probably just want to check that the scopes are the same. I hope that's going to be enough, given the contexts those functions are called from, and that it's not going to give spurious failures.~~ (Just checking the scopes are the same is probably not going to work, we will have to adjust them. But that's not unheard of, moregen already does some of this)

1: I will do so at some point, but it's probably going to wait a few weeks. (But given that nothing is going to get merged before multicore anyway, it shouldn't be a problem?)

garrigue · 2021-11-24T03:59:18Z

This looks very promising, and I should look into this in detail, but something surprises me in the presentation:

row types with shared row_mores but containing distinct univars are now printed in their entirety, instead of as an opaque variable. The printing should perhaps still be amended to mark the rows as associated e.g. < m : 'a. [< Foo of 'a | Bar] as 'd > -> < m : 'b. [< Foo of 'b | Bar] as 'd > -> < m : 'c. [< Foo of 'c | Bar] as 'd >

There is an invariant that variant types sharing the same row variable should be identical up to levels of internal nodes.
So the type you are showing here is in my view plainly invalid.
If you start to break that, I'm afraid a lot of things need to be fixed, and we need a new theory.
If the invariant was already broken, this is probably a bug.

garrigue · 2021-11-24T04:10:18Z

testsuite/tests/typing-poly/poly.ml

@@ -648,7 +648,7 @@ val f :
 - : (< m : 'a. 'a * < p : 'b. 'b * 'd * 'c > as 'd > as 'c) ->
    ('f * < p : 'b. 'b * 'e * 'c > as 'e)
 = <fun>
- : < m : 'a. < p : 'a; .. > as 'b > -> 'b = <fun>
+- : < m : 'a. < p : 'a; .. > > -> < p : 'b; .. > = <fun>


Here the type annotations should probably be rejected as ill-formed: one cannot have a free row variable on a variant type containing univars. I don't think there is any way to build such a type other than such an ill-formed type annotation.

mrmr1993 · 2021-11-24T21:41:23Z

There is an invariant that variant types sharing the same row variable should be identical up to levels of internal nodes.
...
If the invariant was already broken, this is probably a bug.

The invariant was already broken, see e.g. (tested in a 4.11.2 toplevel):

# fun (x : <a: 'a. [> `A of 'a]>) (y : <a: 'b. [> `A of 'b]>) b -> if b then x else y;;
- : < a : 'a. [> `A of 'a ] as 'c > ->
    < a : 'b. 'c > -> bool -> < a : 'a. 'c >
= <fun>

Ctype.copy was broken when handling these types, so they become 'toxic' as soon as they are committed to the environment:

# let f (x : <a: 'a. [> `A of 'a]>) (y : <a: 'b. [> `A of 'b]>) b = if b then x else y;;
Error: Values do not match:
         val f :
           < a : 'a. [> `A of 'a ] as 'c > ->
           < a : 'b. 'c > -> bool -> < a : 'a. 'c >
       is not included in
         val f :
           < a : 'a. [> `A of 'a0 ] as 'c > ->
           < a : 'b. 'c > -> bool -> < a : 'a0. 'c >

This PR fixes that broken Ctype.copy behaviour, and improves the printing (although not perfectly, since we don't have a syntax for capturing just the 'row more').

If you start to break that, I'm afraid a lot of things need to be fixed, and we need a new theory.

Do you have a sense of what we should expect to be broken? I would like to try to get this fixed as part of the modular explicits work, so that we don't get strange errors with the equivalent in local modules e.g.

module type S = sig
  type t
  val f : t -> [> `A of t]
end
let f (module M: S) x = M.f x
(* val f : (module M: S) -> M.t -> [> `A of M.t] *)
let g b = if b then f else f

Handling these cases is obviously less pressing than the more general concept, so I'm happy to separate out these changes from this PR if they're too objectionable.

garrigue · 2021-11-26T01:27:39Z

Indeed, I could already see from your PR that there were examples in poly.ml that broke this invariant.
I'm not sure why this went undetected.
Basically, all the theory assumes that types that share the same row variable are identical. You can see my papers on structural polymorphism for details, under a slightly different representation (there are no row variables, rather the types themselves are assumed to be variables constrained by a kind).

So the answer is that the printer is correct, I think that Ctype.copy is correct too, what is broken is Typepetexp.transl_type which should reject types where a univar appears free in a type whose row variable is a normal variable (or a univar bound at a more external level than the one in the body). This is a bit tricky to do, as currently transl_type uses a notion of "potential univar", so whether a variable is a normal one or a univar is decided later. An extra pass on the translated type would be the solution, particularly with your new representation that should make detecting binding easier.

garrigue · 2021-11-26T01:49:26Z

By the way, the relation of your explicit modular example with this problem is subtle.
A way to see it is that M.t is bound by (module M : S), while the row variable is quantified on the whole function, so ends up being more external:

val f : forall 'a. (module M: S) -> M.t -> [> `A of M.t] as 'a

However, the relaxed value restriction says that since 'a only appears strictly positively, it is not constrained by potential side-effects. Moreover Ohori has developed a theory of rank-0 polymorphism, showing that one can do inference with quantifiers deeper in types, as long as their are at strictly positive positions. So actually, we can see this type as:

val f : (module M: S) -> M.t -> forall 'a. [> `A of M.t] as 'a

which is fine.
Of course we need a proper theory for that, but at least this case seems OK.

Yet, the symmetrical case with

val f : [< `A of t] -> t

inside S seems to require real rank-0 polymorphism (the row variable is in a contravariant position). I.e., the type of the using function should really be:

val f : (module M: S) -> forall 'a. ([> `A of M.t] as 'a) -> M.t

It is not that surprising that explicit polymorphism would require a finer expression of polymorphism in other places.

gasche · 2023-01-13T16:30:04Z

cc @goldfirere

mrmr1993 added 25 commits November 18, 2021 02:15

Add lscope to type_expr

2d4eafc

Bootstrap

08ddf2c

Allow updating the local_scope while changing levels

3e5b7e9

Add lscope to Printtyp.raw_type

1855b07

Expose ~lscope arguments on newty constructors

5de5647

Enable local scopes for copying of types and type parsing

ad4ef0d

Fix off-by-one error in local scope updates while copying

ccbf1c4

Track visited nodes in update_level

ff11e50

Bootstrap

47637c4

Use warning scope for transl_type

cdb2065

Use the correct local scope in Typetexp.transl_fields

c4c2b93

Check for incompatible local scopes when checking univar unification

7718195

Use the correct local scope for object fields when unifying etc.

dd34fc5

Don't raise escape errors in pattern mode for univars in type params

3d8c986

Add missing local scopes in Ctype and Typetexp

cf5c95d

Use local scope for row variables in Typetexp, only lower after Tpoly

861844f

Update expected test outputs, add additional tests for edge-case

b85711a

Use a TypeSet instead of a TypeHash in update_level

47f2984

Don't ignore pattern univar escapes, catch them in unify1_var

c86254d

Update expected test output

f5d0c33

Remove univar checks from enter_poly, upate test output

7e3be62

Use the outer local scope when reducing row fields

1c9b84e

Use the outer local scope for row types in moregen

31a1ea1

Copy row types with univars as constructor arguments correctly

4c26ce8

Avoid proxying Tobjects when their local scope differs from their 'more'

e4716c3

trefis closed this Nov 23, 2021

trefis reopened this Nov 23, 2021

garrigue reviewed Nov 24, 2021

View reviewed changes

mrmr1993 mentioned this pull request Dec 31, 2021

Soundness bug: polymorphism and row types #10840

Open

mrmr1993 mentioned this pull request Feb 16, 2022

Modular explicits #9187

Open

lpw25 mentioned this pull request Nov 3, 2022

Fix #10664 by fixing Ctype.copy in erase_either mode #11600

Merged

gasche assigned lpw25 Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use local scopes to detect univar escapes #10787

Use local scopes to detect univar escapes #10787

mrmr1993 commented Nov 22, 2021

trefis commented Nov 23, 2021 •

edited

Loading

garrigue commented Nov 24, 2021

garrigue Nov 24, 2021

mrmr1993 commented Nov 24, 2021

garrigue commented Nov 26, 2021

garrigue commented Nov 26, 2021

gasche commented Jan 13, 2023

Use local scopes to detect univar escapes #10787

Are you sure you want to change the base?

Use local scopes to detect univar escapes #10787

Conversation

mrmr1993 commented Nov 22, 2021

trefis commented Nov 23, 2021 • edited Loading

garrigue commented Nov 24, 2021

garrigue Nov 24, 2021

Choose a reason for hiding this comment

mrmr1993 commented Nov 24, 2021

garrigue commented Nov 26, 2021

garrigue commented Nov 26, 2021

gasche commented Jan 13, 2023

trefis commented Nov 23, 2021 •

edited

Loading