Matching: refactor to clarify the argument-passing structure #13084

gasche · 2024-04-08T19:42:47Z

This refactoring PR is born out of a discussion with @ncik-roberts in #12555 (comment) . "Arguments" in pattern-matching compilation are expressions that access parts of the scrutinee. They can contain complex expressions, in particular field projects of the form x.f, but never x.f.g -- the projection necessarily operates on a variable, or sometimes a tuple of variables in weird corner cases of do_for_multiple_match. There is a specific moment in pattern-matching compilation that turns complex expressions into variables, and then the rest of the pipeline assumes that it gets a variable, and this is important for correctness, but not obvious to see in the code at all.

The present PR uses finer-grained types to statically encode this structure of arguments, continuing on a tradition started with @trefis a few years ago to clarify the pattern-matching compiler. There is no change to the behavior, and the diff consists almost entirely of very minor simplifications, but it becomes much easier to reason about this transition from arbitrary expressions to simple arguments in the code -- and I will rely on this property in a follow-up PR to fix The Pattern-Matching Bug.

My favorite parts of the diff are the following:

-                args =
-                  ( match args with
-                  | _ :: r -> r
-                  | _ -> assert false
-                  );
+                args = args.rest;

 and compile_match_simplified ~scopes repr partial ctx
-    (m : Simple.clause pattern_matching) =
-  match m with
-  | { cases = []; args = [] } -> comp_exit ctx m
-  | { args = ((Lvar v as arg), str) :: argl } ->
-      bind_match_arg str v arg (
-        let args = (arg, Alias) :: argl in
-        let m = { m with args } in
-        let first_match, rem = split_and_precompile_simplified m in
-        combine_handlers ~scopes repr partial ctx first_match rem
-      )
-  | _ -> assert false
+    (m : (simple_args, Simple.clause) pattern_matching) =
+  let first_match, rem = split_and_precompile_simplified m in
+  combine_handlers ~scopes repr partial ctx first_match rem

 and do_compile_matching ~scopes repr partial ctx pmh =
   match pmh with
   | Pm pm -> (
-      let arg =
-        match pm.args with
-        | (first_arg, _) :: _ -> first_arg
-        | _ ->
-            (* We arrive in do_compile_matching from:
-               - compile_matching
-               - recursive call on PmVars
-               The first one explicitly checks that [args] is nonempty, the
-               second one is only generated when the inner pm first looks at
-               a variable (i.e. there is something to look at).
-            *)
-            assert false
-      in
+      let arg = arg_of_simple_arg (fst pm.args.first) in

The worst part of the diff is in do_for_multiple_match (a small hack due to toplevel_handler not being polymorphic over the argument structure), but it's basically fine.

A second commit also simplifies some functions that would take both a dedicated ~arg argument (representing the first argument known to be present) and an args vector that is now known to be simple; there is no need to track ~arg specifically as it can now be recovered from args.first.

(cc @trefis and @lthls who are on fire with pattern-matching refactoring reviewing)

ncik-roberts

Looks great to me — simple_arg would have helped me unconfuse myself in my review of the mentioned PR.

ncik-roberts · 2024-04-09T02:04:38Z

lambda/matching.ml

+(** The argument in first position is about to be matched upon,
+    it has already been bound to a variable
+    or it is a tuple of variables in the weird do_for_multiple_match case. *)


I find that simple_args is missing documentation explaining why it exists as an alternative to args. I would suggest introducing this comment with a little more framing:

(** [simple_args] is a more specific version of [args], to be used when... (and the rest of the comment as-is). *)

I would argue that another relevant property of simple_args is that all of the lambda values it represents are pure. That's what allows us to drop the mutability on the floor in make_line_matching and make_matching (and my misunderstanding of this fact led to the conversation that you mention in the PR text).

I suggest mentioning this in the comment. Or, if you think of a good way to mention the purity in the relevant type names, even better! (I couldn't think of a good way to do this naming.)

I added more comments along the lines you suggested.

lambda/matching.ml

ncik-roberts · 2024-04-09T03:08:33Z

lambda/matching.ml

-    (m : Simple.clause pattern_matching) =
-  match m with
-  | { cases = []; args = [] } -> comp_exit ctx m
-  | { args = ((Lvar v as arg), str) :: argl } ->
-      bind_match_arg str v arg (
-        let args = (arg, Alias) :: argl in
-        let m = { m with args } in
-        let first_match, rem = split_and_precompile_simplified m in
-        combine_handlers ~scopes repr partial ctx first_match rem
-      )
-  | _ -> assert false


I'm not sure I understand why this PR allows this simplification or makes it more obviously correct. Does the new simple_arg shape tell us something more specific about m.args? Or was this simplification possible before?

(Separately, I'll note that bind_match_arg is now called only once, and so could be inlined if you found that more readable.)

Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now. The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.

One aspect of the change that is subtle is that we now allow the Tuple case to reach here, but in fact I believe that this case cannot happen. (So we could add an assert false if we wanted to, as we do here.) I think that the Tuple case can only flow through the precompilation logic, never to this actual compilation code due to the different compile_flattened code path. I tried to check this statically, but the cost in complexity is quite high (polymorphic variants, polymorphic recursion, oh my!) and I did not finish it. I prefer the new version which might implicitly generate less-good-than-we-expected code if a Tuple argument somehow found its way here.

This seems fine then.

Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now.
Agreed, this is clear.

The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.

It seems to me this was already true as of this refactor: 5bffef5. The call to bind_match_arg is guarded by a match on Lvar v. I'm saying this just to check my understanding, not to request changes.

Yes, I think that I could have simplified the compile_match_simplified part of the change by removing the bind logic at that point. I feel more comfortable with the change now that the types make it clear that the argument list has already been "split" by the time we reach this point in the code -- this is a sharp distinction with compile_match_nonempty above.

The "arguments" in pattern-matching compilation are a list of scrutinees that are being matched upon, that can most of the time be formed of arbitrary expressions. The first argument is bound to a variable when we are about to generate code to switch on the corresponding column, that is, in the `compile_match_{nonempty,simplified}` functions. When new arguments are generated by applying projections to an existing argument, it is (almost) always on this bound-variable form. In other words, most arguments can be complex expressions, except the first argument in some parts of the pattern-matching logic. The code relies on this simple-argument structure for the first argument, but would not encode the corresponding invariants in the type. This is what the current commit does; we can get rid of some runtime assertions, and simplify code that was more complex than necessary -- in `compile_match_simplified` in particular.

We don't need to pass an `~arg` parameter to functions that also take a `split_args` argument, as they respect the invariant that this `arg` argument is always the expression form of the first argument.

ncik-roberts

I think the changes look good, but a maintainer will need to approve.

trefis

Approving on behalf of Nick Roberts.

The changes look good overall. I spent a few minutes trying to make the changes around toplevel_handlers less ugly, and now understand why you did things this way. ( #sad )

I also find the Tuple of lambda a bit distasteful, but not sure what could be done about it.
Anyway, this PR is an improvement.

gasche · 2024-04-09T12:36:39Z

Thanks both! Merging.

gasche mentioned this pull request Apr 8, 2024

Pattern matching with mutable and lazy patterns is unsound #7241

Open

ncik-roberts reviewed Apr 9, 2024

View reviewed changes

gasche force-pushed the matching-structure-arguments branch 2 times, most recently from 222c026 to 7b661ba Compare April 9, 2024 07:19

gasche added 3 commits April 9, 2024 09:25

matching: simplify some ~arg-taking functions

77c70ad

We don't need to pass an `~arg` parameter to functions that also take a `split_args` argument, as they respect the invariant that this `arg` argument is always the expression form of the first argument.

Changes

6dc533f

gasche force-pushed the matching-structure-arguments branch from 7b661ba to 6dc533f Compare April 9, 2024 07:26

ncik-roberts approved these changes Apr 9, 2024

View reviewed changes

trefis approved these changes Apr 9, 2024

View reviewed changes

gasche merged commit b30d7d3 into ocaml:trunk Apr 9, 2024
14 checks passed

gasche mentioned this pull request Apr 18, 2024

The Pattern-Matching Bug: special handling of final exits #13076

Merged

gasche mentioned this pull request Apr 26, 2024

Matching minor refactorings #13129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching: refactor to clarify the argument-passing structure #13084

Matching: refactor to clarify the argument-passing structure #13084

gasche commented Apr 8, 2024 •

edited

ncik-roberts left a comment

ncik-roberts Apr 9, 2024

ncik-roberts Apr 9, 2024

gasche Apr 9, 2024

ncik-roberts Apr 9, 2024

ncik-roberts Apr 9, 2024

gasche Apr 9, 2024

ncik-roberts Apr 9, 2024

gasche Apr 9, 2024

ncik-roberts left a comment

trefis left a comment

gasche commented Apr 9, 2024

Matching: refactor to clarify the argument-passing structure #13084

Matching: refactor to clarify the argument-passing structure #13084

Conversation

gasche commented Apr 8, 2024 • edited

ncik-roberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncik-roberts left a comment

Choose a reason for hiding this comment

trefis left a comment

Choose a reason for hiding this comment

gasche commented Apr 9, 2024

gasche commented Apr 8, 2024 •

edited