Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementations of the static check for recursive definitions #1942

Merged
merged 16 commits into from Sep 14, 2018

Conversation

Projects
None yet
6 participants
@gasche
Copy link
Member

commented Jul 30, 2018

This GPR is on top of #1937.

This work was conducted by Alban Reynaud @Cemoixerestre during an internship at INRIA
Saclay, with the help and under the supervision of Gabriel Scherer in
the Parsifal team.

A summary is included below. For more details, see

https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value

or the comments in the code itself, which has evolved a bit from
Alban's internship report above.

Summary:

Rec_check is based on the idea of assigning "access modes" to variables

m ::= Dereference (* full access and inspection of the value *)
    | Unguarded (* the value result is directly returned *)
    | Guard (* the value is stored under a data constructor *)
    | Delayed (* the value is not needed at definition time *)
    | Unused (* the value is not used at all *)

Before this patchset, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes

u ::= (x : m)*

and contexts Γ mapping variables to uses

Γ ::= (x : u)*

The check relied on a judgment of the form

Γ ⊢ e : u

where, in the typing algorithm, Γ and e are the inputs, and u is
the output — this could be written -Γ ⊢ e : +u.

After this patchset, the implementation uses a simpler notion of context
mapping variables to mods

Γ ::= (x : m)*

and the judgment has the simpler structure

 Γ ⊢ e : m

but now m is an input to the judgment, indicating in which usage
context e is checked, and Γ is the output: +Γ ⊢ e : -m.

Other fixes:

This patchset also fixes a soundness bug around { foo with ... }
which is already present in 4.06 and 4.07.

Another soundness bug found during this work was submitted and fixed
independently as GPR #1915.

This work also led to the refactoring of GPR #1922, and the refactoring
of the testsuite in GPR #1937.

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2018

(cc @yallop)

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2018

Do you think this might bring us closer to producing, as a side-effect of the check, a data-structure which would drive the compilation scheme and make the check "correct by construction"?

@gasche

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2018

I would say that this would only be marginally easier than with the previous implementation; the big conceptual jump was @yallop's idea of using a type-system-like check, this one is easier to reason about than the one it replaces, but in both cases the idea would be to look at the typed derivations as evidence.

One thing that is rather easy to do in both cases is to get a sort of dependency order on the mutual declarations. This would allow accepting more things when a topological order is just not one big cycle. But I am not sure that it helps us make the current compilation scheme "more correct".

@yallop, @Cemoixerestre and myself are thinking of writing a short paper / extended abstract on the new system, which could give me better intuitions on these things.

Thinking about compilation was on the list of things to do "if there is time" for Alban's internship, but this is a six-weeks end-of-bachelor internship, and there was no time.

@gasche gasche force-pushed the gasche:alban-rec-check branch from aa6437c to fadff5f Jul 30, 2018

(* If x is used with the mode m in e[x], and e[x] is used with mode
m' in e'[e[x]], then x is used with mode m'[m] in e'[e[x]].
Return is neutral for composition: m[Neutral] = m = Neutral[m].

This comment has been minimized.

Copy link
@Armael

Armael Jul 30, 2018

Member

Neutral ~> Return here?

This comment has been minimized.

Copy link
@gasche

gasche Jul 30, 2018

Author Member

Good catch, I'll fix this.

(inspect (expression env e2)))
(*
G1 |- e1: m[Dereference]
G2 |- e2: m[Dereference] (*TODO: why is this a dereference?*)

This comment has been minimized.

Copy link
@Armael

Armael Jul 30, 2018

Member

There are a couple TODO remaining, is it on purpose?

This comment has been minimized.

Copy link
@gasche

gasche Jul 30, 2018

Author Member

Yes, I wanted @yallop to have a chance to comment on them, because he wrote the original implementation. I believe that using Guard here would be correct, but I wanted his opinion before making the check more liberal than the one we replace. (Other opinions are also welcome, of course.)

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

It's a dereference because writing a value to a mutable field involves accessing the value. Here's a program that fails with a segmentation fault if Dereference is weakened to Return here:

    type t = { mutable f: float }
    let g = { f = 0.0 }
    let rec x = (g.f <- y; ()) and y = 2.0

(You could add the program to the test suite.)

@gasche gasche force-pushed the gasche:alban-rec-check branch 2 times, most recently from 0a955c7 to ed39a58 Jul 30, 2018

@gasche gasche force-pushed the gasche:alban-rec-check branch from ed39a58 to 055fadc Aug 9, 2018

(* We have carefully placed `u` before `t` here,
so that the copy { t with .. }, if accepted,
is evaluated before 't' is initialized -- making
the assertion below fail, typically abording

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

abording ~> aborting ?

match Use.unguarded ty, Use.dependent ty, classify_expression expr with
let ty = expression expr Return in
match Env.unguarded ty idlist, Env.dependent ty idlist,
classify_expression expr with
| _ :: _, _, _ (* The expression inspects rec-bound variables *)
| _, _ :: _, Dynamic -> (* The expression depends on rec-bound variables

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

The first underscore can be turned into a [] which saves a few brain cycles to the reader.

@Armael
Copy link
Member

left a comment

This is a first superficial review, I mostly read the comments but not the actual implementation yet.


(* Returns the most conservative mode of the two arguments.
Deref < Return < Guard < Delay < Unused

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

Deref -> Dereference

| Delay, _ -> Delay
| m', Return -> m'
| Return, m
| Guard, m -> m

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

This last case confused me for a moment, because I read it as Guard[m] = m, which is not the case.
Because of the earlier m', Return -> m' innocuous-looking case we have Guard[Return] = Guard, and Guard[m] = m otherwise.
I wonder if it would be worth adding an explicit (redundant) case for Guard[Return], to showcase more explicitly the role of the Guard mode.

(** Combine the access information of two expressions *)
val unguarded : t -> Ident.t list -> Ident.t list
(** unguarded e l: the list of all identifiers in e that are unguarded of
dereferenced in the environment e. *)

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

identifiers in e -> identifiers in l
Also, ungarded of dereferenced -> dereferenced or returned ?

val single : Ident.t -> access -> t
(** Combine the access information of two expressions *)
val dependent : t -> Ident.t list -> Ident.t list
(** unguarded e l: the list of all identifiers in e that are used in e. *)

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

unguarded e l -> dependent e l, identifiers in e -> identifiers in l

(** No variables are accessed in an expression; it might be a
constant or a global identifier *)
val join : t -> t -> t
val join_list : t list -> t

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

It would maybe nice to have a comment indicating that environments sharing bindings is supported, and that in that case the right thing happens (the most conservative mode is picked).

Deref < Return < Guard < Delay < Unused
*)
let prec m m' =

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

@gasche says a better name for this could be join

compute the environment of a subterm from its mode,
so the corresponding function has type [... -> mode -> Env.t].
We write ['a -> term_judg] in this case.

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

Shouldn't 'a be ... instead, to match the convention used in the line above?

correspond to binding constructs that have both an
exterior environment (the environment of the whole term)
and an interior environment (the environment after the binding
construct has introduced new names in scope).

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

Which one of Gamma and Gamma' is the exterior (resp interior) environment?

G |- e : m + m'
-------------------------------
G |- (let x = e : m) -| x:m', G

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

There's an issue with this rule I think (name collision on G; and the final output of the rule is likely not just one of the two Gs in the scope).
Would the correct rule be something like as follows?

       G |- e : m + m'
       -----------------------------------
       x:m', G, G' |- (let x = e : m) -| x:m', G'

This comment has been minimized.

Copy link
@Armael

Armael Aug 13, 2018

Member

(also, I assume m + m' is prec m m'? whatever it is it should be made explicit)

This comment has been minimized.

Copy link
@gasche

gasche Aug 18, 2018

Author Member

(done)

fun f inner_mode -> fun outer_mode -> f (compos outer_mode inner_mode)

(* A binding judgment [binder] expects a mode and an outer environment,
and returns an inner environment. [binder >> judg] computes

This comment has been minimized.

Copy link
@Armael

Armael Aug 9, 2018

Member

This is in contradiction with the comment above? (which says that bind_judg takes an interior environment and returns an exterior environment)

@Armael

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

In the description of the PR, you give a list of 4 modes (Dereference, Unguarded, Guard, Delay), but this does not seem to match the modes before the PR (3 modes, Dereferenced, Guarded, Unguarded) or in the PR (5 modes, Dereference, Return, Guard, Delay, Unused). Could you clarify?

@Armael
Copy link
Member

left a comment

Added a couple comments while reading the rules

G |- e : m + m'
-------------------------------
G |- (let x = e : m) -| x:m', G

This comment has been minimized.

Copy link
@Armael

Armael Aug 13, 2018

Member

(also, I assume m + m' is prec m m'? whatever it is it should be made explicit)

Show resolved Hide resolved typing/rec_check.ml
Show resolved Hide resolved typing/rec_check.ml

@gasche gasche force-pushed the gasche:alban-rec-check branch from 055fadc to 06fef59 Aug 18, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Aug 18, 2018

I just went over @Armael's comments, improved the PR as desired and rebased. Thanks @Armael, and feel free to keep reading :-)

G |- e : m + m'
-----------------------------------
G+G' |- (let x = e : m) -| x:m', G'

This comment has been minimized.

Copy link
@Armael

Armael Aug 30, 2018

Member

The closing parenthesis is off there.

@yallop
Copy link
Member

left a comment

I think this an improvement -- thanks! The setup feels a bit more complex, but the rules are simpler, which seems like a reasonable tradeoff.

the evaluation of the whole program. This is the mode of
a variable in an expression in which it does not occur. *)

(* Returns the most conservative mode of the two arguments.

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

Minor: more conservative, not most.

I think "most conservative" could be clarified here: it means something like "the more demanding context".

| Unused
(** [Unused] is for subexpressions that are not used at all during
the evaluation of the whole program. This is the mode of
a variable in an expression in which it does not occur. *)

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

Minor: I think Unused should be Discard or Ignore, since all the others are verbs. (Alternatively, change them all to adjectives: Dereferenced, Returned, etc.)

This comment has been minimized.

Copy link
@gasche

gasche Sep 1, 2018

Author Member

I went for Ignore, to me Discard includes the idea of an (active) action.

Show resolved Hide resolved typing/rec_check.ml Outdated
Return is neutral for composition: m[Return] = m = Return[m].
*)
let compos m' m = match m', m with

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

Minor: compos? Why not compose?

Aside: 'not' patterns would be useful here.

Besides noting that Return is neutral for composition, you might also note that compos is associative (I checked that it is), and that Unused is an annihilator.

returned in the environment e. *)

val dependent : t -> Ident.t list -> Ident.t list
(** unguarded e l: the list of all identifiers in e that are used in e. *)

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

Two things to fix in this comment: it should say "dependent", not "unguarded", and it should say "in l", not "in e"


(* Expression judgment:
G |- e : m
where (m) is an output of the code and (G) is an output;

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

This should say "where (m) is an input of the code"

(inspect (expression env e2)))
(*
G1 |- e1: m[Dereference]
G2 |- e2: m[Dereference] (*TODO: why is this a dereference?*)

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

It's a dereference because writing a value to a mutable field involves accessing the value. Here's a program that fails with a segmentation fault if Dereference is weakened to Return here:

    type t = { mutable f: float }
    let g = { f = 0.0 }
    let rec x = (g.f <- y; ()) and y = 2.0

(You could add the program to the test suite.)

expression e << Dereference
| Texp_setinstvar (pth,_,_,e) ->
(*
G |- e: m[Dereference] (*TODO: why is this a dereference?*)

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

As with Texp_setfield, this really should be a Dereference, so we can drop the TODO.

Here's a program for the test suite that fails with a segmentation fault if Dereference is weakened to Return:

    class c = object
      val mutable f = 0.0
      method m =
        let rec x = (f <- y; ()) and y = 2.0 in f
    end
    let _ = print_float (new c)#m
begin match Typeopt.classify_lazy_argument e with
(*
G |- e: m[Delay]
---------------- (modulo some subtle compiler optimizations)

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

I'm expecting this to break when somebody implements (e.g.) the lazy value optimization for unboxed constructors. (But perhaps having the common code in Typeopt will be enough to prevent that.)

This comment has been minimized.

Copy link
@gasche

gasche Sep 2, 2018

Author Member

Yes, I'm expecting such future changes to return one of the known-to-be-shortcut variants (for unboxed constructors, typically the classifier of the underlying argument), or to add a new variants to be handled properly here.

| Tpat_alias (pat, _, _) -> is_destructuring_pattern pat
| Tpat_var (_id, _) -> false
| Tpat_alias (pat, _, _) ->
is_destructuring_pattern pat

This comment has been minimized.

Copy link
@yallop

yallop Aug 31, 2018

Member

This looks like unnecessary churn.

@gasche gasche force-pushed the gasche:alban-rec-check branch from 06fef59 to 4fce5b5 Sep 2, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 3, 2018

@yallop: Thanks for the review! I did a pass over the patchset and took your comments into account.

There is one thing that we discussed with @Cemoixerestre but didn't have to work on: I believe that we can state (and prove easily by induction) meta-theorems of the following form: when Gamma |- e : Delay, then Gamma maps all the variables that do occur in e to Delay, and the others to Ignore. Similarly, when Gamma |- e : Dereference, then Gamma maps the occurring variables to Dereference and the other to Ignore. I plan to convince myself that these two meta-theorems do hold, and then add them in comments. The Delay theorem makes it easy to convince oneself that recursive functions work just fine, and the Dereference one gives a sort of upper bound on accepted terms.

@gasche gasche force-pushed the gasche:alban-rec-check branch from 4fce5b5 to b49a930 Sep 3, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

@yallop: if you believe that the PR is now in a mergeable state, it could be nice to "approve" it -- I think I could ask a maintainer to merge it then.

@yallop

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

@gasche: there are some comments that don't have replies: here, here, here, and here. Could you please either leave a comment or mark the conversations as "resolved"? (I'm assuming that you've simply missed them, which is unfortunately very easy to do with GitHub's interface, because one of the comments is about a typo, which should be uncontroversial.)

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

Thanks! I missed them indeed. I will go over them now.

@gasche gasche force-pushed the gasche:alban-rec-check branch from b49a930 to ca67995 Sep 10, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

I rebased the PR with a more substantial high-level comment at the top of the file, which I would like to be at least as clear as the summary of this PR, without the historical aspect (old and new system) which is not relevant in the versioned codebase.

@gasche gasche force-pushed the gasche:alban-rec-check branch from 7ddbc8f to cd2489c Sep 10, 2018

@yallop

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

I rebased the PR with a more substantial high-level comment at the top of the file,

Thanks. I think the new comment is a very useful addition.

@yallop

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

(I'll approve once the bug that's causing Travis to fail is fixed.)

@gasche gasche force-pushed the gasche:alban-rec-check branch from cd2489c to 84bebad Sep 10, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

Duh... this should be fixed now.

@gasche gasche force-pushed the gasche:alban-rec-check branch from 84bebad to 2c7a0b8 Sep 10, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

(just fixed the check-typo check)

@yallop

yallop approved these changes Sep 10, 2018

@damiendoligez
Copy link
Member

left a comment

Approving based on @yallop's review.

I have a few superficial remarks.

Show resolved Hide resolved typing/rec_check.ml
Show resolved Hide resolved typing/rec_check.ml Outdated
Show resolved Hide resolved typing/rec_check.ml
Show resolved Hide resolved typing/rec_check.ml
(inspect (option expression env eo)))
(*
G |- e: m[Dereference]
---------------------- (plus weird 'eo' option)

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Sep 13, 2018

Member

"weird"? That word doesn't make you look confident...

This comment has been minimized.

Copy link
@gasche

gasche Sep 13, 2018

Author Member

It doesn't, because we aren't. The code must be safe because Dereference is the most conservative possible choice for eo, but I still have mostly no idea what this eo thing does (despite having looked at the compilation scheme). It's right that people hitting this case would be less-confident and would want (if they are changing it) to check things by themselves.

This comment has been minimized.

Copy link
@yallop

yallop Sep 13, 2018

Member

but I still have mostly no idea what this eo thing does

I'm not sure how much "mostly" covers here, but the eo is for representing calls to a superclass method through an ancestor variable, like this:

class b = object
  method m = 0
end
class d = object
  inherit b as parent   (* 'parent' is an ancestor *)
  method m = parent#m  (* Call to method 'm' in class 'b' *)
end

@gasche gasche force-pushed the gasche:alban-rec-check branch from 2c7a0b8 to a89714a Sep 13, 2018

@gasche

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

@damiendoligez Thanks! I took the comments into account.

Cemoixerestre and others added some commits Jun 28, 2018

New implementation of rec_check.ml (WIP)
This work was conducted during an internship at INRIA Saclay, under
the supervision of Gabriel Scherer in the Parsifal team.

A summary is included below. For more details, see

  https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value

WIP:
----

This commit corresponds to the state of the work exactly at the end of
Alban's internship, rebased by Gabriel Scherer on trunk.
The testsuite passes, but there remain some language constructs that
are not supported by the new check implementation.

We believe that existing OCaml programs (which typically do not make
advanced use of recursive value declarations) would compile correctly
with this WIP version, but of course we plan to complete the work in
the following commits.

Summary:
--------

Rec_check is based on the idea of assigning "access modes" to variables

    m ::= Dereference | Guarded | Unguarded | Delayed

Before this commit, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes

    u ::= (x : m)*

and contexts Γ mapping variables to uses

    Γ ::= (x : u)*

The check relied on a judgment of the form

    Γ ⊢ e : u

where, in the typing algorithm, `Γ` and `e` are the inputs, and `u` is
the output — this could be written `-Γ ⊢ e : +u`.

After this commit, the implementation uses a simpler notion of context
mapping variables to mods

    Γ ::= (x : m)*

and the judgment has the simpler structure

     Γ ⊢ e : m

but now `m` is an input to the judgment, indicating in which usage
context `e` is checked, and `Γ` is the output: `+Γ ⊢ e : -m`.
rec_check.ml: fix a soundness bug with record override { foo with ... }
(The bug is already present in 4.06 and 4.07.)
rec_check.ml: write the term judgments in a more abstract style
We could factorize the binder judgments in a related way, but it would
require a different set of higher-order operator, and I thought that
the lower number of occurrences did not justify the extra complexity.
Changes for the let-rec patchset + general commit-message explanation
This work was conducted by Alban Reynaud during an internship at INRIA
Saclay, with the help and under the supervision of Gabriel Scherer in
the Parsifal team.

A summary is included below. For more details, see

  https://github.com/Cemoixerestre/Internship-2018-ocaml-recursive-value

or the comments in the code itself, which has evolved a bit from
Alban's internship report above.

Summary:
--------

Rec_check is based on the idea of assigning "access modes" to variables

    m ::= Dereference (* full access and inspection of the value *)
        | Unguarded (* the value result is directly returned *)
        | Guard (* the value is stored under a data constructor *)
        | Delayed (* the value is not needed at definition time *)
        | Unused (* the value is not used at all *)

Before this patchset, the implementation (contributed by Jeremy Yallop)
was formulated as a type-checker manipulating "uses", which are mappings
from free variables to modes

    u ::= (x : m)*

and contexts Γ mapping variables to uses

    Γ ::= (x : u)*

The check relied on a judgment of the form

    Γ ⊢ e : u

where, in the typing algorithm, `Γ` and `e` are the inputs, and `u` is
the output — this could be written `-Γ ⊢ e : +u`.

After this patchset, the implementation uses a simpler notion of context
mapping variables to mods

    Γ ::= (x : m)*

and the judgment has the simpler structure

     Γ ⊢ e : m

but now `m` is an input to the judgment, indicating in which usage
context `e` is checked, and `Γ` is the output: `+Γ ⊢ e : -m`.

Other fixes:
------------

This patchset also fixes a soundness bug around `{ foo with ... }`
which is already present in 4.06 and 4.07.

Another soundness bug found during this work was submitted and fixed
independently as GPR #1915.

This work also led to the refactoring of GPR #1922, and the refactoring
of the testsuite in GPR #1937.
rec_check.ml: high-level documentation and reordering
We provide an ocamldoc-style module comment to explain the general
lines of the recursive check. This complements existing comments about
the term- and binding-judgments that are more implementation-oriented.

It was natural to reorder the code so that the "static or dynamic
size" code is apart from the "usage of recursive variables" code.

Finally, I also added copyright headers to reflect the now shared
authorship of the code. (Alban has signed a CLA.)

@gasche gasche merged commit beec967 into ocaml:trunk Sep 14, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.