Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching: refactor to clarify the argument-passing structure #13084

Merged
merged 3 commits into from Apr 9, 2024

Conversation

gasche
Copy link
Member

@gasche gasche commented Apr 8, 2024

This refactoring PR is born out of a discussion with @ncik-roberts in #12555 (comment) . "Arguments" in pattern-matching compilation are expressions that access parts of the scrutinee. They can contain complex expressions, in particular field projects of the form x.f, but never x.f.g -- the projection necessarily operates on a variable, or sometimes a tuple of variables in weird corner cases of do_for_multiple_match. There is a specific moment in pattern-matching compilation that turns complex expressions into variables, and then the rest of the pipeline assumes that it gets a variable, and this is important for correctness, but not obvious to see in the code at all.

The present PR uses finer-grained types to statically encode this structure of arguments, continuing on a tradition started with @trefis a few years ago to clarify the pattern-matching compiler. There is no change to the behavior, and the diff consists almost entirely of very minor simplifications, but it becomes much easier to reason about this transition from arbitrary expressions to simple arguments in the code -- and I will rely on this property in a follow-up PR to fix The Pattern-Matching Bug.

My favorite parts of the diff are the following:

-                args =
-                  ( match args with
-                  | _ :: r -> r
-                  | _ -> assert false
-                  );
+                args = args.rest;
 and compile_match_simplified ~scopes repr partial ctx
-    (m : Simple.clause pattern_matching) =
-  match m with
-  | { cases = []; args = [] } -> comp_exit ctx m
-  | { args = ((Lvar v as arg), str) :: argl } ->
-      bind_match_arg str v arg (
-        let args = (arg, Alias) :: argl in
-        let m = { m with args } in
-        let first_match, rem = split_and_precompile_simplified m in
-        combine_handlers ~scopes repr partial ctx first_match rem
-      )
-  | _ -> assert false
+    (m : (simple_args, Simple.clause) pattern_matching) =
+  let first_match, rem = split_and_precompile_simplified m in
+  combine_handlers ~scopes repr partial ctx first_match rem
 and do_compile_matching ~scopes repr partial ctx pmh =
   match pmh with
   | Pm pm -> (
-      let arg =
-        match pm.args with
-        | (first_arg, _) :: _ -> first_arg
-        | _ ->
-            (* We arrive in do_compile_matching from:
-               - compile_matching
-               - recursive call on PmVars
-               The first one explicitly checks that [args] is nonempty, the
-               second one is only generated when the inner pm first looks at
-               a variable (i.e. there is something to look at).
-            *)
-            assert false
-      in
+      let arg = arg_of_simple_arg (fst pm.args.first) in

The worst part of the diff is in do_for_multiple_match (a small hack due to toplevel_handler not being polymorphic over the argument structure), but it's basically fine.

A second commit also simplifies some functions that would take both a dedicated ~arg argument (representing the first argument known to be present) and an args vector that is now known to be simple; there is no need to track ~arg specifically as it can now be recovered from args.first.

(cc @trefis and @lthls who are on fire with pattern-matching refactoring reviewing)

Copy link
Contributor

@ncik-roberts ncik-roberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me — simple_arg would have helped me unconfuse myself in my review of the mentioned PR.

Comment on lines 996 to 998
(** The argument in first position is about to be matched upon,
it has already been bound to a variable
or it is a tuple of variables in the weird do_for_multiple_match case. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that simple_args is missing documentation explaining why it exists as an alternative to args. I would suggest introducing this comment with a little more framing:

(** [simple_args] is a more specific version of [args], to be used when...
    (and the rest of the comment as-is).
*)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that another relevant property of simple_args is that all of the lambda values it represents are pure. That's what allows us to drop the mutability on the floor in make_line_matching and make_matching (and my misunderstanding of this fact led to the conversation that you mention in the PR text).

I suggest mentioning this in the comment. Or, if you think of a good way to mention the purity in the relevant type names, even better! (I couldn't think of a good way to do this naming.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more comments along the lines you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

lambda/matching.ml Outdated Show resolved Hide resolved
Comment on lines -3479 to -3489
(m : Simple.clause pattern_matching) =
match m with
| { cases = []; args = [] } -> comp_exit ctx m
| { args = ((Lvar v as arg), str) :: argl } ->
bind_match_arg str v arg (
let args = (arg, Alias) :: argl in
let m = { m with args } in
let first_match, rem = split_and_precompile_simplified m in
combine_handlers ~scopes repr partial ctx first_match rem
)
| _ -> assert false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this PR allows this simplification or makes it more obviously correct. Does the new simple_arg shape tell us something more specific about m.args? Or was this simplification possible before?

(Separately, I'll note that bind_match_arg is now called only once, and so could be inlined if you found that more readable.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now. The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.

One aspect of the change that is subtle is that we now allow the Tuple case to reach here, but in fact I believe that this case cannot happen. (So we could add an assert false if we wanted to, as we do here.) I think that the Tuple case can only flow through the precompilation logic, never to this actual compilation code due to the different compile_flattened code path. I tried to check this statically, but the cost in complexity is quite high (polymorphic variants, polymorphic recursion, oh my!) and I did not finish it. I prefer the new version which might implicitly generate less-good-than-we-expected code if a Tuple argument somehow found its way here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine then.

Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now.
Agreed, this is clear.

The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.

It seems to me this was already true as of this refactor: 5bffef5. The call to bind_match_arg is guarded by a match on Lvar v. I'm saying this just to check my understanding, not to request changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that I could have simplified the compile_match_simplified part of the change by removing the bind logic at that point. I feel more comfortable with the change now that the types make it clear that the argument list has already been "split" by the time we reach this point in the code -- this is a sharp distinction with compile_match_nonempty above.

@gasche gasche force-pushed the matching-structure-arguments branch 2 times, most recently from 222c026 to 7b661ba Compare April 9, 2024 07:19
The "arguments" in pattern-matching compilation are a list of
scrutinees that are being matched upon, that can most of the time be
formed of arbitrary expressions. The first argument is bound to a
variable when we are about to generate code to switch on the
corresponding column, that is, in the
`compile_match_{nonempty,simplified}` functions. When new arguments
are generated by applying projections to an existing argument, it
is (almost) always on this bound-variable form. In other words, most
arguments can be complex expressions, except the first argument in
some parts of the pattern-matching logic.

The code relies on this simple-argument structure for the first
argument, but would not encode the corresponding invariants in the
type. This is what the current commit does; we can get rid of some
runtime assertions, and simplify code that was more complex than
necessary -- in `compile_match_simplified` in particular.
We don't need to pass an `~arg` parameter to functions that also take a
`split_args` argument, as they respect the invariant that this `arg`
argument is always the expression form of the first argument.
@gasche gasche force-pushed the matching-structure-arguments branch from 7b661ba to 6dc533f Compare April 9, 2024 07:26
Copy link
Contributor

@ncik-roberts ncik-roberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good, but a maintainer will need to approve.

Copy link
Contributor

@trefis trefis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on behalf of Nick Roberts.

The changes look good overall. I spent a few minutes trying to make the changes around toplevel_handlers less ugly, and now understand why you did things this way. ( #sad )

I also find the Tuple of lambda a bit distasteful, but not sure what could be done about it.
Anyway, this PR is an improvement.

@gasche
Copy link
Member Author

gasche commented Apr 9, 2024

Thanks both! Merging.

@gasche gasche merged commit b30d7d3 into ocaml:trunk Apr 9, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants