Propagate the classification from Rec_check #12551

lthls · 2023-09-12T13:52:27Z

This is the first step outlined in #12549. It propagates the classification computed during Rec_check to the places where recursive bindings are compiled.
It is mostly equivalent to the previous code, with two differences:

The computation of expression sizes is disabled for bindings classified as Dynamic. This is mostly irrelevant, but for expressions such as let a = ... in let rec b = (fun x -> x) (Some a) in ... the current compilation scheme would have classified b as Dynamic in Rec_check but simplifications would have turned the definition back into a block of known size. Since the compilation scheme for values of unknown size is actually better than the one for values of known size (when it is correct) this is actually an improvement.
The check for recursive values is moved to Translcore in the second commit. This changes a few test outputs as some illegal definitions now produce other warnings before the error, but the changes looked reasonable to me. If this proves controversial, I could go back to the approach I initially used and kept in the first commit, which is to use a global hash table (populated during typing and read during translation to lambda).

The vast majority of the changes are just mechanical changes to propagate the classifications properly through the middle-end.

yallop · 2023-09-12T14:57:36Z

The check for recursive values is moved to Translcore in the second commit.

Personally, I think it's nicer (so far as practicable) to finish detection of ill-formed programs before translation begins.

stedolan · 2023-09-12T17:34:32Z

Personally, I think it's nicer (so far as practicable) to finish detection of ill-formed programs before translation begins.

I agree, for two reasons:

ocamlc -i does not run Transl, so will incorrectly report success if checks are delayed past typing.
It is difficult to maintain soundness when these checks are done at Transl time, as compilation can turn ill-formed source programs into well-formed but unsound Lambda. (Bugs along these lines were one motivation for Rec_check: e.g. Unsoundness with GADTs and let rec #7215 )
(I don't think this is happening here: the change in this PR looks fine. But this is why I'm worried about how the code will evolve if the check moves back to Transl)

lthls · 2023-09-12T19:38:24Z

I tend to agree about keeping the check during typing, but the hash table hack that I used in my first commit was unpleasant so I'd rather not restore it.

One possibility would be to put the check back in its previous form, returning a boolean, but export the classification function independently so that it can be called again during translation to Lambda. I'm a bit uncomfortable with the idea of doing a computation once and hoping that doing it again later would give the same result though; this looks a bit too close to what we're doing at the moment and we know it is causing issues.

gasche · 2023-09-12T19:58:29Z

For the current discussion: for me the obvious way to transfer information from type-checking to lambda would be to add the data to the typedtree -- instead of populating a table on the side. Have you considered it?

For the bigger picture: thanks! I am interested in helping this work move forward, but right now I am obsessed with the pattern-matching compiler so I will not review it right away.

lthls · 2023-09-13T09:35:14Z

For the current discussion: for me the obvious way to transfer information from type-checking to lambda would be to add the data to the typedtree -- instead of populating a table on the side. Have you considered it?

Yes, I have. But I didn't find an obvious place to put the annotation (inside the pattern ? the value bindings ? the Texp_let constructor ? the recursive flag ?), and the current code generates bindings before checking them, so I would have to either make the corresponding field mutable or discard the old bindings and generate new, updated ones.
If someone can tell me which would be the preferred way I can implement it.

For the bigger picture: thanks! I am interested in helping this work move forward, but right now I am obsessed with the pattern-matching compiler so I will not review it right away.

No problem. I can work on some of the other items from #12549 even if this is not merged.

gasche · 2023-09-13T11:53:38Z

To compile Pexp_let(Recursive, bindings, body), we first type-check each binding independently (and the body), and then we call Rec_check on the list of typed bindings (I call a group of mutually-recursive bindings a "nest of recursive definitions", or "recursive nest"), and then we generate the Texp_let(Recursive, typed_bindings, typed_body) node.

I would find it natural to have Rec_check compute some "information about the recursive nest", and then include that information as an extra argument to Texp_let. (This can be done before or after typing the body.) You could also split the information into each (typed) binding, and so generate a new typed_bindings list annotated with extra metadata, and include that as the Texp_let argument. I haven't looked at the code enough to tell if the metadata you are storing feels more per-binding or more nest-global. Your choice.

gasche

I like what I see with this new approach, which seems much cleaner to me than your previous approach to this PR (what do you think?). See the inline comment below for a small irritating suggestion.

I find it difficult to have a good picture of the impact in terms of behavior difference caused by this patch. In theory (if the logic in rec_check and in the backends was perfectly consistent) it should not change anything. In practice it may introduce bugs when the frontend is wrong, it may remove bugs where one of the backend was wrong, and it may turn some compiler-produces-unsound-code situations into compiler-produces-statically-rejected-code situations. This seems overall positive, and I trust your intuition that the change makes the behavior better overall, but I could not gain this certainty myself by just looking at the diff.

gasche · 2023-09-22T15:14:12Z

lambda/lambda.ml

@@ -297,7 +297,8 @@ type lambda =
  | Lfunction of lfunction
  | Llet of let_kind * value_kind * Ident.t * lambda * lambda
  | Lmutlet of value_kind * Ident.t * lambda * lambda
-  | Lletrec of (Ident.t * lambda) list * lambda
+  | Lletrec of
+      (Ident.t * Typedtree.recursive_binding_kind * lambda) list * lambda


I would be in favor of making this a record: you are already touching all those places so this will not make your patch more invasive, and it is likely to help in the future.

I will try this suggestion. I think the only place where it could cause problems is is tmc.ml, but hopefully my GADT hack will prove robust enough to handle that.

lthls · 2023-09-22T17:32:35Z

I find it difficult to have a good picture of the impact in terms of behavior difference caused by this patch. In theory (if the logic in rec_check and in the backends was perfectly consistent) it should not change anything. In practice it may introduce bugs when the frontend is wrong, it may remove bugs where one of the backend was wrong, and it may turn some compiler-produces-unsound-code situations into compiler-produces-statically-rejected-code situations. This seems overall positive, and I trust your intuition that the change makes the behavior better overall, but I could not gain this certainty myself by just looking at the diff.

I could probably have been a bit more explicit. With this patch, the only actual difference is that when Rec_check classifies a binding as Dynamic, the compilers stop trying to compute a size and assume that it's correct to compile it as a non-recursive binding.
This patch is not supposed to fix any bugs (but you can have a look at the more complex #12596 for that if you want), it only enables a communication channel between Rec_check and the compilers that future work will rely upon.
Having all this mechanical, boring part out of the way before the real work starts seemed like a good idea.

typing/typecore.ml

gasche · 2023-09-23T09:58:56Z

lambda/tmc.ml

  ctx, bindings

-and traverse_binding outer_ctx inner_ctx (var, def) =
+and traverse_binding :
+  type a. a binding_kind -> context -> context -> a -> a list =


Note: instead of a GADT, you could do this with a record:

type 'a decomposed_binding = { var: Ident.t; def: lambda; recompose : Ident.t * lambda -> 'a; } val traverse_binding : context -> context -> 'a decomposed_binding -> 'a list

I am not sure that this would be better, but at least the type involved are a bit simpler. (Your choice.)

With that presentation you cannot really use your approach of reusing the same recursive kind in the no-transform case, and adding Static in the transform case -- the recompose function will do only one or the other. I think that always using the same recursive kind should be correct -- the TRMC transform of a non-recursive function should be two non-recursive functions.

gasche · 2023-09-23T17:01:06Z

Just to be clear: this is approved but I would like to give @lthls a chance to use a record type for recursive bindings before merging.

lthls · 2023-09-25T14:06:44Z

I have implemented the suggestion to use records, it does look a bit nicer. Ideally, after all the let-rec PRs are merged it should become just { id : Ident.t; def : lfunction }, but for now it should be fine.
I need to fix the Changes entry, and I was considering making a single entry for all the let-rec PRs (so the following PRs will only add to the same entry). I think it would only be problematic if we didn't manage to merge all of the PRs for 5.2, but given that I've got all the difficult code ready and we still have time before the freeze, we should be safe.

gasche

I wonder what you think about the record. I like it, but I think that you could do even better with a judicious use of if, or maybe even better with a map_rec_binding helper function.

There is always a tension between listing all the fields, which forces us to revisit the line on each modification to the record type (the product counterpart of exhaustive pattern matching, which is not in general a bad thing), and using with or other forms that silently keep non-mentioned fields unchanged. I think in many cases it is clear that we are just mapping on the code subterm, and in this case a new-field-agnostic form will be nicer for future changes.

gasche · 2023-09-25T14:08:52Z

lambda/lambda.ml

        let id', l = bind id l in
-        ((id', clas, def) :: ids' , l)
+        ({ id = id'; rkind; def } :: ids' , l)


I would prefer a { binding with id = id' } approach, which requires no changes when adding new fields in the future.

gasche · 2023-09-25T14:09:24Z

lambda/lambda.ml

@@ -840,7 +845,7 @@ let subst update_env ?(freshen_bound_variables = false) s input_lam =
        let id = try Ident.Map.find id l with Not_found -> id in
        Lifused (id, subst s l e)
  and subst_list s l li = List.map (subst s l) li
-  and subst_decl s l (id, clas, exp) = (id, clas, subst s l exp)
+  and subst_decl s l { id; rkind; def } = { id; rkind; def = subst s l def }


or

and subst_decl s l decl = { decl with def = subst s l decl.def }

gasche · 2023-09-25T14:09:47Z

lambda/lambda.ml

@@ -885,7 +890,9 @@ let shallow_map f = function
  | Lmutlet (k, v, e1, e2) ->
      Lmutlet (k, v, f e1, f e2)
  | Lletrec (idel, e2) ->
-      Lletrec (List.map (fun (v, clas, e) -> (v, clas, f e)) idel, f e2)
+      Lletrec
+        (List.map (fun { id; rkind; def } -> { id; rkind; def = f def }) idel,


Again I would use { r with def = f r.def } here.

gasche · 2023-09-25T14:10:10Z

lambda/simplif.ml

-              eliminate_ref id e2)
+      let bindings =
+        List.map (fun { id = v; rkind; def } ->
+            { id = v; rkind; def = eliminate_ref id def })


I think that this is (fun r -> { r with def = eliminate_ref id r.def }), which avoids the subtle issue of not shadowing id when opening the record.

gasche · 2023-09-25T14:10:58Z

lambda/simplif.ml

-        simplif ~try_depth body)
+      let bindings =
+        List.map (fun { id; rkind; def } ->
+            { id; rkind; def = simplif ~try_depth def })


... maybe there should be a map_rec_binding : (lambda -> lambda) -> rec_binding -> rec_binding function in Lambda?

Or maybe we could rewrite all of these Simplif functions in terms of the existing Lambda iterators.

gasche · 2023-09-25T14:15:48Z

No opinion on the Changes entry, do as you prefer. If we start running late in the season we can always reword the entry. (My approach is to have an entry that describes the work done so far, and update the entry text if necessary as further PRs come in. But again, do as you want.)

given that I've got all the difficult code ready and we still have time before the freeze, we should be safe

Good for me: if you start feeling pressure to make it to 5.2, I can convince you that reviewing my pattern-matching PRs is in your own best interest. You would be a fine reviewer with #12534 for example.

gasche · 2023-09-25T15:53:27Z

(Squashed and) merged. Thanks!

lthls added 3 commits September 12, 2023 13:58

Propagate Rec_check classifications

79c23fd

Move Rec_check calls to Translcore

7d50a87

Changes + long lines

d8043d6

lthls mentioned this pull request Sep 22, 2023

Compile recursive bindings in Lambda #12596

Merged

Store Rec_check result in the typedtree

733f69d

gasche approved these changes Sep 22, 2023

View reviewed changes

Fixes

7ec6779

gasche reviewed Sep 22, 2023

View reviewed changes

typing/typecore.ml Outdated Show resolved Hide resolved

List.map

c79884f

gasche reviewed Sep 23, 2023

View reviewed changes

gasche mentioned this pull request Sep 21, 2023

tests/weak-ephe-final/finaliser_handover is flaky #12345

Closed

lthls added 3 commits September 25, 2023 15:55

Use a record for recursive bindings

38e010f

Update dependencies

d843ce2

Long line

84edf6c

gasche reviewed Sep 25, 2023

View reviewed changes

lthls added 2 commits September 25, 2023 16:58

Suggestions from review

bc3c0c7

Changes

dd0ef01

gasche merged commit 04e0e5d into ocaml:trunk Sep 25, 2023
8 of 9 checks passed

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Oct 8, 2023

Propagate the classification from Rec_check (ocaml#12551)

993ac04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate the classification from Rec_check #12551

Propagate the classification from Rec_check #12551

lthls commented Sep 12, 2023

yallop commented Sep 12, 2023

stedolan commented Sep 12, 2023

lthls commented Sep 12, 2023

gasche commented Sep 12, 2023

lthls commented Sep 13, 2023

gasche commented Sep 13, 2023 •

edited

gasche left a comment •

edited

gasche Sep 22, 2023

lthls Sep 22, 2023

lthls commented Sep 22, 2023

gasche Sep 23, 2023

gasche commented Sep 23, 2023

lthls commented Sep 25, 2023

gasche left a comment

gasche Sep 25, 2023

gasche Sep 25, 2023

gasche Sep 25, 2023

gasche Sep 25, 2023

gasche Sep 25, 2023

lthls Sep 25, 2023

gasche commented Sep 25, 2023

gasche commented Sep 25, 2023

Propagate the classification from Rec_check #12551

Propagate the classification from Rec_check #12551

Conversation

lthls commented Sep 12, 2023

yallop commented Sep 12, 2023

stedolan commented Sep 12, 2023

lthls commented Sep 12, 2023

gasche commented Sep 12, 2023

lthls commented Sep 13, 2023

gasche commented Sep 13, 2023 • edited

gasche left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lthls commented Sep 22, 2023

Choose a reason for hiding this comment

gasche commented Sep 23, 2023

lthls commented Sep 25, 2023

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche commented Sep 25, 2023

gasche commented Sep 25, 2023

gasche commented Sep 13, 2023 •

edited

gasche left a comment •

edited