New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching: refactor to clarify the argument-passing structure #13084
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me — simple_arg
would have helped me unconfuse myself in my review of the mentioned PR.
lambda/matching.ml
Outdated
(** The argument in first position is about to be matched upon, | ||
it has already been bound to a variable | ||
or it is a tuple of variables in the weird do_for_multiple_match case. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that simple_args
is missing documentation explaining why it exists as an alternative to args
. I would suggest introducing this comment with a little more framing:
(** [simple_args] is a more specific version of [args], to be used when...
(and the rest of the comment as-is).
*)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that another relevant property of simple_args
is that all of the lambda values it represents are pure. That's what allows us to drop the mutability on the floor in make_line_matching
and make_matching
(and my misunderstanding of this fact led to the conversation that you mention in the PR text).
I suggest mentioning this in the comment. Or, if you think of a good way to mention the purity in the relevant type names, even better! (I couldn't think of a good way to do this naming.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more comments along the lines you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
(m : Simple.clause pattern_matching) = | ||
match m with | ||
| { cases = []; args = [] } -> comp_exit ctx m | ||
| { args = ((Lvar v as arg), str) :: argl } -> | ||
bind_match_arg str v arg ( | ||
let args = (arg, Alias) :: argl in | ||
let m = { m with args } in | ||
let first_match, rem = split_and_precompile_simplified m in | ||
combine_handlers ~scopes repr partial ctx first_match rem | ||
) | ||
| _ -> assert false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this PR allows this simplification or makes it more obviously correct. Does the new simple_arg
shape tell us something more specific about m.args
? Or was this simplification possible before?
(Separately, I'll note that bind_match_arg
is now called only once, and so could be inlined if you found that more readable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now. The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind
does not matter.
One aspect of the change that is subtle is that we now allow the Tuple
case to reach here, but in fact I believe that this case cannot happen. (So we could add an assert false
if we wanted to, as we do here.) I think that the Tuple case can only flow through the precompilation logic, never to this actual compilation code due to the different compile_flattened
code path. I tried to check this statically, but the cost in complexity is quite high (polymorphic variants, polymorphic recursion, oh my!) and I did not finish it. I prefer the new version which might implicitly generate less-good-than-we-expected code if a Tuple
argument somehow found its way here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine then.
Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now.
Agreed, this is clear.
The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.
It seems to me this was already true as of this refactor: 5bffef5. The call to bind_match_arg
is guarded by a match on Lvar v
. I'm saying this just to check my understanding, not to request changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that I could have simplified the compile_match_simplified
part of the change by removing the bind
logic at that point. I feel more comfortable with the change now that the types make it clear that the argument list has already been "split" by the time we reach this point in the code -- this is a sharp distinction with compile_match_nonempty
above.
222c026
to
7b661ba
Compare
The "arguments" in pattern-matching compilation are a list of scrutinees that are being matched upon, that can most of the time be formed of arbitrary expressions. The first argument is bound to a variable when we are about to generate code to switch on the corresponding column, that is, in the `compile_match_{nonempty,simplified}` functions. When new arguments are generated by applying projections to an existing argument, it is (almost) always on this bound-variable form. In other words, most arguments can be complex expressions, except the first argument in some parts of the pattern-matching logic. The code relies on this simple-argument structure for the first argument, but would not encode the corresponding invariants in the type. This is what the current commit does; we can get rid of some runtime assertions, and simplify code that was more complex than necessary -- in `compile_match_simplified` in particular.
We don't need to pass an `~arg` parameter to functions that also take a `split_args` argument, as they respect the invariant that this `arg` argument is always the expression form of the first argument.
7b661ba
to
6dc533f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes look good, but a maintainer will need to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving on behalf of Nick Roberts.
The changes look good overall. I spent a few minutes trying to make the changes around toplevel_handlers
less ugly, and now understand why you did things this way. ( #sad )
I also find the Tuple of lambda
a bit distasteful, but not sure what could be done about it.
Anyway, this PR is an improvement.
Thanks both! Merging. |
This refactoring PR is born out of a discussion with @ncik-roberts in #12555 (comment) . "Arguments" in pattern-matching compilation are expressions that access parts of the scrutinee. They can contain complex expressions, in particular field projects of the form
x.f
, but neverx.f.g
-- the projection necessarily operates on a variable, or sometimes a tuple of variables in weird corner cases ofdo_for_multiple_match
. There is a specific moment in pattern-matching compilation that turns complex expressions into variables, and then the rest of the pipeline assumes that it gets a variable, and this is important for correctness, but not obvious to see in the code at all.The present PR uses finer-grained types to statically encode this structure of arguments, continuing on a tradition started with @trefis a few years ago to clarify the pattern-matching compiler. There is no change to the behavior, and the diff consists almost entirely of very minor simplifications, but it becomes much easier to reason about this transition from arbitrary expressions to simple arguments in the code -- and I will rely on this property in a follow-up PR to fix The Pattern-Matching Bug.
My favorite parts of the diff are the following:
The worst part of the diff is in
do_for_multiple_match
(a small hack due totoplevel_handler
not being polymorphic over the argument structure), but it's basically fine.A second commit also simplifies some functions that would take both a dedicated
~arg
argument (representing the first argument known to be present) and anargs
vector that is now known to be simple; there is no need to track~arg
specifically as it can now be recovered fromargs.first
.(cc @trefis and @lthls who are on fire with pattern-matching refactoring reviewing)