Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate the classification from Rec_check #12551

Merged
merged 11 commits into from
Sep 25, 2023
Merged

Conversation

lthls
Copy link
Contributor

@lthls lthls commented Sep 12, 2023

This is the first step outlined in #12549. It propagates the classification computed during Rec_check to the places where recursive bindings are compiled.
It is mostly equivalent to the previous code, with two differences:

  • The computation of expression sizes is disabled for bindings classified as Dynamic. This is mostly irrelevant, but for expressions such as let a = ... in let rec b = (fun x -> x) (Some a) in ... the current compilation scheme would have classified b as Dynamic in Rec_check but simplifications would have turned the definition back into a block of known size. Since the compilation scheme for values of unknown size is actually better than the one for values of known size (when it is correct) this is actually an improvement.
  • The check for recursive values is moved to Translcore in the second commit. This changes a few test outputs as some illegal definitions now produce other warnings before the error, but the changes looked reasonable to me. If this proves controversial, I could go back to the approach I initially used and kept in the first commit, which is to use a global hash table (populated during typing and read during translation to lambda).

The vast majority of the changes are just mechanical changes to propagate the classifications properly through the middle-end.

@yallop
Copy link
Member

yallop commented Sep 12, 2023

The check for recursive values is moved to Translcore in the second commit.

Personally, I think it's nicer (so far as practicable) to finish detection of ill-formed programs before translation begins.

@stedolan
Copy link
Contributor

Personally, I think it's nicer (so far as practicable) to finish detection of ill-formed programs before translation begins.

I agree, for two reasons:

  1. ocamlc -i does not run Transl, so will incorrectly report success if checks are delayed past typing.

  2. It is difficult to maintain soundness when these checks are done at Transl time, as compilation can turn ill-formed source programs into well-formed but unsound Lambda. (Bugs along these lines were one motivation for Rec_check: e.g. Unsoundness with GADTs and let rec #7215 )
    (I don't think this is happening here: the change in this PR looks fine. But this is why I'm worried about how the code will evolve if the check moves back to Transl)

@lthls
Copy link
Contributor Author

lthls commented Sep 12, 2023

I tend to agree about keeping the check during typing, but the hash table hack that I used in my first commit was unpleasant so I'd rather not restore it.

One possibility would be to put the check back in its previous form, returning a boolean, but export the classification function independently so that it can be called again during translation to Lambda. I'm a bit uncomfortable with the idea of doing a computation once and hoping that doing it again later would give the same result though; this looks a bit too close to what we're doing at the moment and we know it is causing issues.

@gasche
Copy link
Member

gasche commented Sep 12, 2023

For the current discussion: for me the obvious way to transfer information from type-checking to lambda would be to add the data to the typedtree -- instead of populating a table on the side. Have you considered it?

For the bigger picture: thanks! I am interested in helping this work move forward, but right now I am obsessed with the pattern-matching compiler so I will not review it right away.

@lthls
Copy link
Contributor Author

lthls commented Sep 13, 2023

For the current discussion: for me the obvious way to transfer information from type-checking to lambda would be to add the data to the typedtree -- instead of populating a table on the side. Have you considered it?

Yes, I have. But I didn't find an obvious place to put the annotation (inside the pattern ? the value bindings ? the Texp_let constructor ? the recursive flag ?), and the current code generates bindings before checking them, so I would have to either make the corresponding field mutable or discard the old bindings and generate new, updated ones.
If someone can tell me which would be the preferred way I can implement it.

For the bigger picture: thanks! I am interested in helping this work move forward, but right now I am obsessed with the pattern-matching compiler so I will not review it right away.

No problem. I can work on some of the other items from #12549 even if this is not merged.

@gasche
Copy link
Member

gasche commented Sep 13, 2023

To compile Pexp_let(Recursive, bindings, body), we first type-check each binding independently (and the body), and then we call Rec_check on the list of typed bindings (I call a group of mutually-recursive bindings a "nest of recursive definitions", or "recursive nest"), and then we generate the Texp_let(Recursive, typed_bindings, typed_body) node.

I would find it natural to have Rec_check compute some "information about the recursive nest", and then include that information as an extra argument to Texp_let. (This can be done before or after typing the body.) You could also split the information into each (typed) binding, and so generate a new typed_bindings list annotated with extra metadata, and include that as the Texp_let argument. I haven't looked at the code enough to tell if the metadata you are storing feels more per-binding or more nest-global. Your choice.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what I see with this new approach, which seems much cleaner to me than your previous approach to this PR (what do you think?). See the inline comment below for a small irritating suggestion.

I find it difficult to have a good picture of the impact in terms of behavior difference caused by this patch. In theory (if the logic in rec_check and in the backends was perfectly consistent) it should not change anything. In practice it may introduce bugs when the frontend is wrong, it may remove bugs where one of the backend was wrong, and it may turn some compiler-produces-unsound-code situations into compiler-produces-statically-rejected-code situations. This seems overall positive, and I trust your intuition that the change makes the behavior better overall, but I could not gain this certainty myself by just looking at the diff.

lambda/lambda.ml Outdated
@@ -297,7 +297,8 @@ type lambda =
| Lfunction of lfunction
| Llet of let_kind * value_kind * Ident.t * lambda * lambda
| Lmutlet of value_kind * Ident.t * lambda * lambda
| Lletrec of (Ident.t * lambda) list * lambda
| Lletrec of
(Ident.t * Typedtree.recursive_binding_kind * lambda) list * lambda
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor of making this a record: you are already touching all those places so this will not make your patch more invasive, and it is likely to help in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this suggestion. I think the only place where it could cause problems is is tmc.ml, but hopefully my GADT hack will prove robust enough to handle that.

@lthls
Copy link
Contributor Author

lthls commented Sep 22, 2023

I find it difficult to have a good picture of the impact in terms of behavior difference caused by this patch. In theory (if the logic in rec_check and in the backends was perfectly consistent) it should not change anything. In practice it may introduce bugs when the frontend is wrong, it may remove bugs where one of the backend was wrong, and it may turn some compiler-produces-unsound-code situations into compiler-produces-statically-rejected-code situations. This seems overall positive, and I trust your intuition that the change makes the behavior better overall, but I could not gain this certainty myself by just looking at the diff.

I could probably have been a bit more explicit. With this patch, the only actual difference is that when Rec_check classifies a binding as Dynamic, the compilers stop trying to compute a size and assume that it's correct to compile it as a non-recursive binding.
This patch is not supposed to fix any bugs (but you can have a look at the more complex #12596 for that if you want), it only enables a communication channel between Rec_check and the compilers that future work will rely upon.
Having all this mechanical, boring part out of the way before the real work starts seemed like a good idea.

typing/typecore.ml Outdated Show resolved Hide resolved
ctx, bindings

and traverse_binding outer_ctx inner_ctx (var, def) =
and traverse_binding :
type a. a binding_kind -> context -> context -> a -> a list =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: instead of a GADT, you could do this with a record:

type 'a decomposed_binding = {
  var: Ident.t;
  def: lambda;
  recompose : Ident.t * lambda -> 'a;
}

val traverse_binding : context -> context -> 'a decomposed_binding -> 'a list

I am not sure that this would be better, but at least the type involved are a bit simpler. (Your choice.)

With that presentation you cannot really use your approach of reusing the same recursive kind in the no-transform case, and adding Static in the transform case -- the recompose function will do only one or the other. I think that always using the same recursive kind should be correct -- the TRMC transform of a non-recursive function should be two non-recursive functions.

@gasche
Copy link
Member

gasche commented Sep 23, 2023

Just to be clear: this is approved but I would like to give @lthls a chance to use a record type for recursive bindings before merging.

@lthls
Copy link
Contributor Author

lthls commented Sep 25, 2023

I have implemented the suggestion to use records, it does look a bit nicer. Ideally, after all the let-rec PRs are merged it should become just { id : Ident.t; def : lfunction }, but for now it should be fine.
I need to fix the Changes entry, and I was considering making a single entry for all the let-rec PRs (so the following PRs will only add to the same entry). I think it would only be problematic if we didn't manage to merge all of the PRs for 5.2, but given that I've got all the difficult code ready and we still have time before the freeze, we should be safe.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what you think about the record. I like it, but I think that you could do even better with a judicious use of if, or maybe even better with a map_rec_binding helper function.

There is always a tension between listing all the fields, which forces us to revisit the line on each modification to the record type (the product counterpart of exhaustive pattern matching, which is not in general a bad thing), and using with or other forms that silently keep non-mentioned fields unchanged. I think in many cases it is clear that we are just mapping on the code subterm, and in this case a new-field-agnostic form will be nicer for future changes.

lambda/lambda.ml Outdated
let id', l = bind id l in
((id', clas, def) :: ids' , l)
({ id = id'; rkind; def } :: ids' , l)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a { binding with id = id' } approach, which requires no changes when adding new fields in the future.

lambda/lambda.ml Outdated
@@ -840,7 +845,7 @@ let subst update_env ?(freshen_bound_variables = false) s input_lam =
let id = try Ident.Map.find id l with Not_found -> id in
Lifused (id, subst s l e)
and subst_list s l li = List.map (subst s l) li
and subst_decl s l (id, clas, exp) = (id, clas, subst s l exp)
and subst_decl s l { id; rkind; def } = { id; rkind; def = subst s l def }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or

and subst_decl s l decl = { decl with def = subst s l decl.def }

lambda/lambda.ml Outdated
@@ -885,7 +890,9 @@ let shallow_map f = function
| Lmutlet (k, v, e1, e2) ->
Lmutlet (k, v, f e1, f e2)
| Lletrec (idel, e2) ->
Lletrec (List.map (fun (v, clas, e) -> (v, clas, f e)) idel, f e2)
Lletrec
(List.map (fun { id; rkind; def } -> { id; rkind; def = f def }) idel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I would use { r with def = f r.def } here.

eliminate_ref id e2)
let bindings =
List.map (fun { id = v; rkind; def } ->
{ id = v; rkind; def = eliminate_ref id def })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is (fun r -> { r with def = eliminate_ref id r.def }), which avoids the subtle issue of not shadowing id when opening the record.

simplif ~try_depth body)
let bindings =
List.map (fun { id; rkind; def } ->
{ id; rkind; def = simplif ~try_depth def })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... maybe there should be a map_rec_binding : (lambda -> lambda) -> rec_binding -> rec_binding function in Lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we could rewrite all of these Simplif functions in terms of the existing Lambda iterators.

@gasche
Copy link
Member

gasche commented Sep 25, 2023

No opinion on the Changes entry, do as you prefer. If we start running late in the season we can always reword the entry. (My approach is to have an entry that describes the work done so far, and update the entry text if necessary as further PRs come in. But again, do as you want.)

given that I've got all the difficult code ready and we still have time before the freeze, we should be safe

Good for me: if you start feeling pressure to make it to 5.2, I can convince you that reviewing my pattern-matching PRs is in your own best interest. You would be a fine reviewer with #12534 for example.

@gasche gasche merged commit 04e0e5d into ocaml:trunk Sep 25, 2023
8 of 9 checks passed
@gasche
Copy link
Member

gasche commented Sep 25, 2023

(Squashed and) merged. Thanks!

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants