Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parmatch.exhaust: single-row optimization #9514

Merged
merged 1 commit into from
Jun 8, 2020

Conversation

gasche
Copy link
Member

@gasche gasche commented Apr 29, 2020

This PR implements an optimization to pattern-matching exhaustivity checking that compresses exhaustivity counter-examples on certain "single-row" patterns which may occur naturally in user code written to avoid fragile pattern matching. For example:

let map4 f = function
  | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
  | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ | None) -> None

The first line matches "all Some", the second line matches "at least one None" in a non-fragile way. (This is a bit silly for option which is not going to gain a new constructor overnight, but you get the idea for user-defined datatypes.)

Now let's consider that you make a mistake and use Some _ instead of (Some _ | None) in the last element of the tuple:

let wrong_map4 f = function
  | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
  | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ (* None missing*)) -> None

In the current trunk the counter-example generator will generate 2^(N-1) counter-examples, corresponding to all possible explosions of the or-patterns in the second row.

let wrong_map4 f = function
    | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
    | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ (* None missing*)) -> None
  ;;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _, Some _, Some _, None)|(Some _, Some _, None, None)|
(Some _, None, Some _, None)|(Some _, None, None, None)|
(None, Some _, Some _, None)|(None, Some _, None, None)|
(None, None, Some _, None)|(None, None, None, None))

In particular there is a risk of exponential blowup of the exhaustiveness checker or fragility checker, which is the cause of #9498 (a Stack Overflow in user-written code that exhibits exactly this pattern). The mistake (forgetting the None in the last element of the tuple) is not necessary to experience the blowup, it also occurs when warning "+4" is activated or in related examples involving GADTs (see #9511 (comment)).

With the present PR, a new optimization is added to the exhaustivity checker that matches a "single-row" scenario (we are asking for exhaustivity counter-examples to a matrix with a single row) that in particular captures this class of user examples. There is now a single counter-example generated, because the or-patterns are not exploded:

let wrong_map4 f = function
    | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
    | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ (* None missing*)) -> None
  ;;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _|None), (Some _|None), (Some _|None), None)

In particular, this PR fixes #9498.

You may be wondering how this PR compares to #9511 which I also submitted this morning and also fixes #9498 ?

  • The two changes are orthogonal: Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511 is good at avoiding blowups when many counter-examples are generated (by being more lazy in how we traverse them), this PR removes the blowup in a particular case. Other cases will still blow up, so I think we still want to have Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511, but in this particular case the counter-example generated are much nicer with this PR. Presently I think that the best would be to have both PRs merged.

  • Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511 is the result of a discussion with @maranget, @trefis and @Octachron, and in particular I am sure that the technical approach is sound. I am confident about the present PR but @maranget may still find a reason why it is in fact completely wrong (that's what he does). @trefis also wanted to keep the exhaustivity-checker (critical for soundness) as simple as possible, this would also be a reason to not merge the present PR.

  • On the other hand, Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511 makes the choice to drop counter-examples in some cases, which may be controversial with some users or maintainers. This PR only acts on a particular case, and then it clearly improves the counter-example provided.

  • The present PR is less invasive than the previous one. Assuming that they are both approved and merged, this one is easier to backport. (However, I think that from the point of view of Bucklescript, the other PR gives the nicer guarantee that many other blowup cases will be handled better, so you probably still want both.)

@gasche gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch from 66f323e to d196a96 Compare May 1, 2020 06:54
typing/parmatch.ml Outdated Show resolved Hide resolved
@stedolan
Copy link
Contributor

This looks very nice, and the implementation is straightforward.

However, I'm a bit disappointed by what happens in the motivating example. This patch improves

Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _, Some _, Some _, None)|(Some _, Some _, None, None)|
(Some _, None, Some _, None)|(Some _, None, None, None)|
(None, Some _, Some _, None)|(None, Some _, None, None)|
(None, None, Some _, None)|(None, None, None, None))

to

((Some _|None), (Some _|None), (Some _|None), None)

but what I'd really like to see here is:

(_, _, _, None)

Would you consider replacing:

  let with_p =
    exhaust ext [ps] (n - 1)
    |> rmap (fun row -> p :: row) in

with the following?

  let with_p =
    exhaust ext [ps] (n - 1)
    |> rmap (fun row -> omega :: row) in

If there is a counterexample to ps, then _ , ps is a counterexample to p, ps, regardless of what p is. (However, if the first column is a GADT match then _, ps may not be typeable, which is annoying. Is it easy to detect whether a column contains a GADT match?)

@gasche
Copy link
Member Author

gasche commented May 19, 2020

Thanks @stedolan for the feedback.

I started by writing a long explanation for why I'm not fond of your suggestion, but in the end I decided that I like it: it gives more compact counter-examples, and it is symmetric (we also have omegas at the end, after the first difference).

Note that in #9511 I proposed ordering the default-case counterexamples before the constructor-cases counterexamples, which corresponds to reordering with_p and without_p in this optimized version, which means that the with_p case we are discussing would show up less often. For example, if the input is (None, None, None), then with the reordering the first counter-example would be (Some _, _, _) rather than (None, None, Some _) (my version) or (_, _, Some _) (your suggestion).

@gasche gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch from d196a96 to 631b960 Compare May 23, 2020 14:31
@gasche
Copy link
Member Author

gasche commented May 23, 2020

I decided to go ahead and change the output order from

p :: counter-example(ps)
counter-example(p) :: omegas

to

counter-example(p) :: omegas
omega :: counter-example(ps)

The change of p into omega is @stedolan's suggestion. I realize now that in the common case it does not, in fact, lose any precision: if the user only looks at the first counter-example, then they will not see this case (and the difference) unless counter-example(p) is empty, in which case p is in fact equivalent to omega. This is exactly what happens in the (None | Some _) example above.

@gasche
Copy link
Member Author

gasche commented May 24, 2020

See more discussion of counter-example ordering at #9511 (comment) . It turns out that in the non-single-row case, you do not want to show the default/wildcard group first. I don't know what this means about @stedolan's suggestion here.

@gasche
Copy link
Member Author

gasche commented May 31, 2020

Now that #9511 is merged in trunk, I rebased this PR on top of it and looked more at the impact on the testsuite examples.

In the current state, the PR follows @stedolan's suggestion (using _ instead of p) and orders the wildcard-group counter-examples before the p/_ counter-examples. Notice that #9511 does not start with the wildcard-group counter-examples, so there is a difference of ordering behavior between the single-row case and the standard case. Before looking at the testsuite examples, my opinion was that the difference was justified as starting with the wildcard group, and then using _ instead of p, would lead to better single-row counter-examples.

My conclusion is that @stedolan's suggestion and the different ordering do not work well in general, even though they work well on the example we discussed and in the general class of fragile-avoidance examples we discussed. I will thus rebase this PR to remove the two changes: ensure we provide the same counter-example order as the standard path, and show p instead of _.

Here are some relevant examples, presented as diffs to the compiler behavior provoked by the current PR following @stedolan suggestion and my counter-example ordering (these are diffs from running promote on the current state of the PR):

 File "robustmatch.ml", lines 74-76, characters 4-20:
 74 | ....match r1, r2, a with
 75 |     | _, R2, "coucou" -> ()
 76 |     | R1, _, _ -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R2, R2, "")
+(R2, _, "")

In this example, both counter-examples are equivalent, but I find the previous counter-example easier to read. Indeed, we can use it to follow through the pattern-matching, and reach a given pattern that is not general enough (namely "coucou"), which is harder when _ is displayed. (Note: it's not obvious when there is a single row here, it occurs only in a submatrix after splitting the first column in the R2 case.)

There are many other examples like this one, for example:

 File "robustmatch.ml", lines 156-159, characters 4-20:
 156 | ....match r1, r2, a with
 157 |     | R1, _, A -> ()
 158 |     | _, R2, X -> ()
 159 |     | R1, _, _ -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R2, R2, Y)
+(R2, _, Y)

My intuition is that each counter-example corresponds to a particular sub-pattern that would have to be generalized to cover said counter-example, and that precisely pointing this sub-pattern is useful. (Maybe we should consider being explicit about this, and showing the subpattern / location in the error ?) With the current output, users need to follow the counter-example to find this subpattern, and having the example be precise is helpful in general.

Then there are other examples where the problem is not _, but the ordering with the default submatrix first. Consider:

 File "robustmatch.ml", lines 228-230, characters 4-37:
 228 | ....match r1, r2, a with
 229 |     | R1, _, (3, "") -> ()
 230 |     | _, R2, (1, "coucou", 'a') -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R1, R1, (3, "*"))
+(R1, R1, (0, _))

Here I believe that the previous counter-example was better (even though the new one shows the "leftmost error" as I argued previously); here this is exactly the same reasoning as in the discussion of #9511: to fix the code we probably want to start tweaking the (R1, (3, ...)) case (by generalizing the pattern or adding new clauses) before handling the non-3 cases.

Finally, there is an example in the testsuite where the use of _ instead of p affects GADT behavior in a bad way: we get a non-exhaustivity warning that is technically incorrect, because it shows a counter-example that is ill-typed.

 let rec rule : type a b.
   (pval, closed, (a,b) tarr) lam -> (pval, closed, a) lam -> b rlam =
   fun v1 v2 ->
   match v1, v2 with
   | Lam(x,body), v ->
       begin
         match subst body (Bind (x, v, Id)) with Ex term ->
         match mode term with
         | Pexp -> Inl term
         | Pval -> Inr term
       end
   | Const (IntTo b, f), Const (IntR, x) ->
       Inr (Const (b, f x))
 ;;
 [%%expect{|
 type closed = rnil
 type 'a rlam = ((pexp, closed, 'a) lam, (pval, closed, 'a) lam) sum
+Lines 8-17, characters 2-26:
+ 8 | ..match v1, v2 with
+ 9 |   | Lam(x,body), v ->
+10 |       begin
+11 |         match subst body (Bind (x, v, Id)) with Ex term ->
+12 |         match mode term with
+13 |         | Pexp -> Inl term
+14 |         | Pval -> Inr term
+15 |       end
+16 |   | Const (IntTo b, f), Const (IntR, x) ->
+17 |       Inr (Const (b, f x))
+Warning 8: this pattern-matching is not exhaustive.
+Here is an example of a case that is not matched:
+(Const (_, _), Const (IntTo _, _))
 val rule :
   (pval, closed, ('a, 'b) tarr) lam -> (pval, closed, 'a) lam -> 'b rlam =
   <fun>
 |}];;

The example is a mouthful and lacks context, but the point is that putting the proposed counter-example in the code with a refutation works (and giving it a right-hand-side results in an "unreachable pattern" warning, at least in Merlin). I believe that this happens because the type-checker does not explode the wildcard patterns enough to reject the counter-example (it has fuel and gives up at some point). If using p instead of _, we preserve the amount of explosions that already existed in the user code, so those do not consume any fuel, and the counter-example is discarded by the type-checker.

@gasche gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch 2 times, most recently from fd74e42 to b82930f Compare May 31, 2020 08:37
@gasche
Copy link
Member Author

gasche commented May 31, 2020

I just updated the PR to revert to the original behavior ((p before not-p), (p rather than omega)), rebased on top of trunk. This is ready to review and discuss again.

(cc @stedolan and possibly @trefis, @maranget)

best way to make a non-fragile distinction between "all As" and
"at least one B".
*)
List.to_seq [Some p; None] |> Seq.flat_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that this roundabout code is the easiest way of concatenating two lazy sequences. (It looks fine for now, but the existence of such code suggests that Seq is missing some features)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Seq.append : 'a t -> 'a t -> 'a t since 4.11, but I wish to delay the computation of the second sequence. We would need a Seq.append_lazy : 'a t -> (unit -> 'a t) -> 'a t. The present code structure is slightly less clear, but (1) it mirrors the (equally unclear) structure of the non-single-row case, and (2) it made it very easy during my experiment to change iteration order. so there is something to it.

@gasche
Copy link
Member Author

gasche commented Jun 1, 2020

@trefis in our discussion, you mentioned that you were worried about any complexity increase in this part (which is critical for soundness). Would you like to have more time to review this aspect of the PR?

@gasche
Copy link
Member Author

gasche commented Jun 1, 2020

P.S.: Thanks @stedolan for your review!

@stedolan
Copy link
Contributor

stedolan commented Jun 1, 2020

Code looks good, it's a clear improvement over what's there, so merge away.

For the record, though, I think the most recent changes made the output worse, and I think this is due to an interesting difference in how we read exhaustivity warnings. Here's one of your examples:

 match r1, r2, a with
 | R1, _, A -> ()
 | _, R2, X -> ()
 | R1, _, _ -> ()

If I understand correctly, your intuition is that an exhaustivity issue is probably caused by one of the cases being insufficiently general. You would like a counterexample that looks like one of the cases, to help you pinpoint which case should be generalised. So, you prefer R2, R2, Y over R2, _, Y as the former more strongly resembles case 2, the one that you would generalise.

My intuition is that an exhaustivity issue is probably caused by there not being enough cases. I would like a counterexample that covers as much as possible of the unmatched space, so that I don't have to add very many new cases in order to achieve exhaustivity. So, I'd prefer R2, _, Y over R2, R2, Y, as the former gets me closer to an exhaustive match.

(I am not saying that one of these intuitions is more valid than the other, but it's curious that they point opposite ways for how to report the issue)

@gasche
Copy link
Member Author

gasche commented Jun 1, 2020

I agree with the difference, modulo the fact that in most cases I do not expect to make a given case more general, but I expect to add a missing case "right after this case" (inserting a new clause right below), because the two are logically related. Consider for example (outside the single-row case):

type t = A | B | C

function
| A, _ -> ...
| B, A -> ...
| B, B -> ...
| C, _ -> ...

for me it is clear that we need to add a new case B, C right after B, B. This is immediately obvious (to me) if the counter-example is printed as B, C, but it is not if it is printed as _, C, although the two are (in presence of the already existing clauses) equivalent as patterns.

(See testsuite and code comments for an explanation.)
@gasche gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch from b82930f to a3b66f1 Compare June 7, 2020 06:04
@gasche
Copy link
Member Author

gasche commented Jun 7, 2020

Rebased, change entry added. I will merge if the CI comes back green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compiler stackoverflow on pattern match against record
2 participants