Parmatch.exhaust: single-row optimization #9514

gasche · 2020-04-29T14:31:39Z

This PR implements an optimization to pattern-matching exhaustivity checking that compresses exhaustivity counter-examples on certain "single-row" patterns which may occur naturally in user code written to avoid fragile pattern matching. For example:

let map4 f = function
  | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
  | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ | None) -> None

The first line matches "all Some", the second line matches "at least one None" in a non-fragile way. (This is a bit silly for option which is not going to gain a new constructor overnight, but you get the idea for user-defined datatypes.)

Now let's consider that you make a mistake and use Some _ instead of (Some _ | None) in the last element of the tuple:

let wrong_map4 f = function
  | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
  | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ (* None missing*)) -> None

In the current trunk the counter-example generator will generate 2^(N-1) counter-examples, corresponding to all possible explosions of the or-patterns in the second row.

let wrong_map4 f = function
    | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
    | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ (* None missing*)) -> None
  ;;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _, Some _, Some _, None)|(Some _, Some _, None, None)|
(Some _, None, Some _, None)|(Some _, None, None, None)|
(None, Some _, Some _, None)|(None, Some _, None, None)|
(None, None, Some _, None)|(None, None, None, None))

In particular there is a risk of exponential blowup of the exhaustiveness checker or fragility checker, which is the cause of #9498 (a Stack Overflow in user-written code that exhibits exactly this pattern). The mistake (forgetting the None in the last element of the tuple) is not necessary to experience the blowup, it also occurs when warning "+4" is activated or in related examples involving GADTs (see #9511 (comment)).

With the present PR, a new optimization is added to the exhaustivity checker that matches a "single-row" scenario (we are asking for exhaustivity counter-examples to a matrix with a single row) that in particular captures this class of user examples. There is now a single counter-example generated, because the or-patterns are not exploded:

let wrong_map4 f = function
    | (Some x1, Some x2, Some x3, Some x4) -> Some (f x1 x2 x3 x4)
    | (Some _ | None), (Some _ | None), (Some _ | None), (Some _ (* None missing*)) -> None
  ;;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _|None), (Some _|None), (Some _|None), None)

In particular, this PR fixes #9498.

You may be wondering how this PR compares to #9511 which I also submitted this morning and also fixes #9498 ?

The two changes are orthogonal: Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511 is good at avoiding blowups when many counter-examples are generated (by being more lazy in how we traverse them), this PR removes the blowup in a particular case. Other cases will still blow up, so I think we still want to have Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511, but in this particular case the counter-example generated are much nicer with this PR. Presently I think that the best would be to have both PRs merged.
Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511 is the result of a discussion with @maranget, @trefis and @Octachron, and in particular I am sure that the technical approach is sound. I am confident about the present PR but @maranget may still find a reason why it is in fact completely wrong (that's what he does). @trefis also wanted to keep the exhaustivity-checker (critical for soundness) as simple as possible, this would also be a reason to not merge the present PR.
On the other hand, Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511 makes the choice to drop counter-examples in some cases, which may be controversial with some users or maintainers. This PR only acts on a particular case, and then it clearly improves the counter-example provided.
The present PR is less invasive than the previous one. Assuming that they are both approved and merged, this one is easier to backport. (However, I think that from the point of view of Bucklescript, the other PR gives the nicer guarantee that many other blowup cases will be handled better, so you probably still want both.)

typing/parmatch.ml

stedolan · 2020-05-19T09:37:14Z

This looks very nice, and the implementation is straightforward.

However, I'm a bit disappointed by what happens in the motivating example. This patch improves

Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _, Some _, Some _, None)|(Some _, Some _, None, None)|
(Some _, None, Some _, None)|(Some _, None, None, None)|
(None, Some _, Some _, None)|(None, Some _, None, None)|
(None, None, Some _, None)|(None, None, None, None))

to

((Some _|None), (Some _|None), (Some _|None), None)

but what I'd really like to see here is:

(_, _, _, None)

Would you consider replacing:

  let with_p =
    exhaust ext [ps] (n - 1)
    |> rmap (fun row -> p :: row) in

with the following?

  let with_p =
    exhaust ext [ps] (n - 1)
    |> rmap (fun row -> omega :: row) in

If there is a counterexample to ps, then _ , ps is a counterexample to p, ps, regardless of what p is. (However, if the first column is a GADT match then _, ps may not be typeable, which is annoying. Is it easy to detect whether a column contains a GADT match?)

gasche · 2020-05-19T15:24:46Z

Thanks @stedolan for the feedback.

I started by writing a long explanation for why I'm not fond of your suggestion, but in the end I decided that I like it: it gives more compact counter-examples, and it is symmetric (we also have omegas at the end, after the first difference).

Note that in #9511 I proposed ordering the default-case counterexamples before the constructor-cases counterexamples, which corresponds to reordering with_p and without_p in this optimized version, which means that the with_p case we are discussing would show up less often. For example, if the input is (None, None, None), then with the reordering the first counter-example would be (Some _, _, _) rather than (None, None, Some _) (my version) or (_, _, Some _) (your suggestion).

gasche · 2020-05-23T14:47:25Z

I decided to go ahead and change the output order from

p :: counter-example(ps)
counter-example(p) :: omegas

to

counter-example(p) :: omegas
omega :: counter-example(ps)

The change of p into omega is @stedolan's suggestion. I realize now that in the common case it does not, in fact, lose any precision: if the user only looks at the first counter-example, then they will not see this case (and the difference) unless counter-example(p) is empty, in which case p is in fact equivalent to omega. This is exactly what happens in the (None | Some _) example above.

gasche · 2020-05-24T10:02:58Z

See more discussion of counter-example ordering at #9511 (comment) . It turns out that in the non-single-row case, you do not want to show the default/wildcard group first. I don't know what this means about @stedolan's suggestion here.

gasche · 2020-05-31T08:16:47Z

Now that #9511 is merged in trunk, I rebased this PR on top of it and looked more at the impact on the testsuite examples.

In the current state, the PR follows @stedolan's suggestion (using _ instead of p) and orders the wildcard-group counter-examples before the p/_ counter-examples. Notice that #9511 does not start with the wildcard-group counter-examples, so there is a difference of ordering behavior between the single-row case and the standard case. Before looking at the testsuite examples, my opinion was that the difference was justified as starting with the wildcard group, and then using _ instead of p, would lead to better single-row counter-examples.

My conclusion is that @stedolan's suggestion and the different ordering do not work well in general, even though they work well on the example we discussed and in the general class of fragile-avoidance examples we discussed. I will thus rebase this PR to remove the two changes: ensure we provide the same counter-example order as the standard path, and show p instead of _.

Here are some relevant examples, presented as diffs to the compiler behavior provoked by the current PR following @stedolan suggestion and my counter-example ordering (these are diffs from running promote on the current state of the PR):

 File "robustmatch.ml", lines 74-76, characters 4-20:
 74 | ....match r1, r2, a with
 75 |     | _, R2, "coucou" -> ()
 76 |     | R1, _, _ -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R2, R2, "")
+(R2, _, "")

In this example, both counter-examples are equivalent, but I find the previous counter-example easier to read. Indeed, we can use it to follow through the pattern-matching, and reach a given pattern that is not general enough (namely "coucou"), which is harder when _ is displayed. (Note: it's not obvious when there is a single row here, it occurs only in a submatrix after splitting the first column in the R2 case.)

There are many other examples like this one, for example:

 File "robustmatch.ml", lines 156-159, characters 4-20:
 156 | ....match r1, r2, a with
 157 |     | R1, _, A -> ()
 158 |     | _, R2, X -> ()
 159 |     | R1, _, _ -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R2, R2, Y)
+(R2, _, Y)

My intuition is that each counter-example corresponds to a particular sub-pattern that would have to be generalized to cover said counter-example, and that precisely pointing this sub-pattern is useful. (Maybe we should consider being explicit about this, and showing the subpattern / location in the error ?) With the current output, users need to follow the counter-example to find this subpattern, and having the example be precise is helpful in general.

Then there are other examples where the problem is not _, but the ordering with the default submatrix first. Consider:

 File "robustmatch.ml", lines 228-230, characters 4-37:
 228 | ....match r1, r2, a with
 229 |     | R1, _, (3, "") -> ()
 230 |     | _, R2, (1, "coucou", 'a') -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R1, R1, (3, "*"))
+(R1, R1, (0, _))

Here I believe that the previous counter-example was better (even though the new one shows the "leftmost error" as I argued previously); here this is exactly the same reasoning as in the discussion of #9511: to fix the code we probably want to start tweaking the (R1, (3, ...)) case (by generalizing the pattern or adding new clauses) before handling the non-3 cases.

Finally, there is an example in the testsuite where the use of _ instead of p affects GADT behavior in a bad way: we get a non-exhaustivity warning that is technically incorrect, because it shows a counter-example that is ill-typed.

 let rec rule : type a b.
   (pval, closed, (a,b) tarr) lam -> (pval, closed, a) lam -> b rlam =
   fun v1 v2 ->
   match v1, v2 with
   | Lam(x,body), v ->
       begin
         match subst body (Bind (x, v, Id)) with Ex term ->
         match mode term with
         | Pexp -> Inl term
         | Pval -> Inr term
       end
   | Const (IntTo b, f), Const (IntR, x) ->
       Inr (Const (b, f x))
 ;;
 [%%expect{|
 type closed = rnil
 type 'a rlam = ((pexp, closed, 'a) lam, (pval, closed, 'a) lam) sum
+Lines 8-17, characters 2-26:
+ 8 | ..match v1, v2 with
+ 9 |   | Lam(x,body), v ->
+10 |       begin
+11 |         match subst body (Bind (x, v, Id)) with Ex term ->
+12 |         match mode term with
+13 |         | Pexp -> Inl term
+14 |         | Pval -> Inr term
+15 |       end
+16 |   | Const (IntTo b, f), Const (IntR, x) ->
+17 |       Inr (Const (b, f x))
+Warning 8: this pattern-matching is not exhaustive.
+Here is an example of a case that is not matched:
+(Const (_, _), Const (IntTo _, _))
 val rule :
   (pval, closed, ('a, 'b) tarr) lam -> (pval, closed, 'a) lam -> 'b rlam =
   <fun>
 |}];;

The example is a mouthful and lacks context, but the point is that putting the proposed counter-example in the code with a refutation works (and giving it a right-hand-side results in an "unreachable pattern" warning, at least in Merlin). I believe that this happens because the type-checker does not explode the wildcard patterns enough to reject the counter-example (it has fuel and gives up at some point). If using p instead of _, we preserve the amount of explosions that already existed in the user code, so those do not consume any fuel, and the counter-example is discarded by the type-checker.

gasche · 2020-05-31T08:39:02Z

I just updated the PR to revert to the original behavior ((p before not-p), (p rather than omega)), rebased on top of trunk. This is ready to review and discuss again.

(cc @stedolan and possibly @trefis, @maranget)

stedolan · 2020-06-01T14:05:25Z

typing/parmatch.ml

+     best way to make a non-fragile distinction between "all As" and
+     "at least one B".
+  *)
+  List.to_seq [Some p; None] |> Seq.flat_map


It's unfortunate that this roundabout code is the easiest way of concatenating two lazy sequences. (It looks fine for now, but the existence of such code suggests that Seq is missing some features)

There is Seq.append : 'a t -> 'a t -> 'a t since 4.11, but I wish to delay the computation of the second sequence. We would need a Seq.append_lazy : 'a t -> (unit -> 'a t) -> 'a t. The present code structure is slightly less clear, but (1) it mirrors the (equally unclear) structure of the non-single-row case, and (2) it made it very easy during my experiment to change iteration order. so there is something to it.

gasche · 2020-06-01T14:15:08Z

@trefis in our discussion, you mentioned that you were worried about any complexity increase in this part (which is critical for soundness). Would you like to have more time to review this aspect of the PR?

gasche · 2020-06-01T14:18:27Z

P.S.: Thanks @stedolan for your review!

stedolan · 2020-06-01T14:23:32Z

Code looks good, it's a clear improvement over what's there, so merge away.

For the record, though, I think the most recent changes made the output worse, and I think this is due to an interesting difference in how we read exhaustivity warnings. Here's one of your examples:

 match r1, r2, a with
 | R1, _, A -> ()
 | _, R2, X -> ()
 | R1, _, _ -> ()

If I understand correctly, your intuition is that an exhaustivity issue is probably caused by one of the cases being insufficiently general. You would like a counterexample that looks like one of the cases, to help you pinpoint which case should be generalised. So, you prefer R2, R2, Y over R2, _, Y as the former more strongly resembles case 2, the one that you would generalise.

My intuition is that an exhaustivity issue is probably caused by there not being enough cases. I would like a counterexample that covers as much as possible of the unmatched space, so that I don't have to add very many new cases in order to achieve exhaustivity. So, I'd prefer R2, _, Y over R2, R2, Y, as the former gets me closer to an exhaustive match.

(I am not saying that one of these intuitions is more valid than the other, but it's curious that they point opposite ways for how to report the issue)

gasche · 2020-06-01T14:36:24Z

I agree with the difference, modulo the fact that in most cases I do not expect to make a given case more general, but I expect to add a missing case "right after this case" (inserting a new clause right below), because the two are logically related. Consider for example (outside the single-row case):

type t = A | B | C

function
| A, _ -> ...
| B, A -> ...
| B, B -> ...
| C, _ -> ...

for me it is clear that we need to add a new case B, C right after B, B. This is immediately obvious (to me) if the counter-example is printed as B, C, but it is not if it is printed as _, C, although the two are (in presence of the already existing clauses) equivalent as patterns.

(See testsuite and code comments for an explanation.)

gasche · 2020-06-07T06:04:28Z

Rebased, change entry added. I will merge if the CI comes back green.

gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch from 66f323e to d196a96 Compare May 1, 2020 06:54

stedolan requested changes May 19, 2020

View reviewed changes

typing/parmatch.ml Outdated Show resolved Hide resolved

gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch from d196a96 to 631b960 Compare May 23, 2020 14:31

gasche mentioned this pull request May 30, 2020

Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511

Merged

gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch 2 times, most recently from fd74e42 to b82930f Compare May 31, 2020 08:37

stedolan reviewed Jun 1, 2020

View reviewed changes

stedolan approved these changes Jun 1, 2020

View reviewed changes

Parmatch.exhaust: single-row optimization

a3b66f1

(See testsuite and code comments for an explanation.)

gasche force-pushed the parmatch-exhaust-singlecol-optim-trunk branch from b82930f to a3b66f1 Compare June 7, 2020 06:04

gasche merged commit e717512 into ocaml:trunk Jun 8, 2020

gasche mentioned this pull request Sep 27, 2023

Type-checking slow for certain nested GADT patterns #12612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parmatch.exhaust: single-row optimization #9514

Parmatch.exhaust: single-row optimization #9514

gasche commented Apr 29, 2020

stedolan commented May 19, 2020

gasche commented May 19, 2020

gasche commented May 23, 2020 •

edited

gasche commented May 24, 2020

gasche commented May 31, 2020

gasche commented May 31, 2020

stedolan Jun 1, 2020

gasche Jun 1, 2020

gasche commented Jun 1, 2020

gasche commented Jun 1, 2020

stedolan commented Jun 1, 2020

gasche commented Jun 1, 2020 •

edited

gasche commented Jun 7, 2020

Parmatch.exhaust: single-row optimization #9514

Parmatch.exhaust: single-row optimization #9514

Conversation

gasche commented Apr 29, 2020

stedolan commented May 19, 2020

gasche commented May 19, 2020

gasche commented May 23, 2020 • edited

gasche commented May 24, 2020

gasche commented May 31, 2020

gasche commented May 31, 2020

stedolan Jun 1, 2020

Choose a reason for hiding this comment

gasche Jun 1, 2020

Choose a reason for hiding this comment

gasche commented Jun 1, 2020

gasche commented Jun 1, 2020

stedolan commented Jun 1, 2020

gasche commented Jun 1, 2020 • edited

gasche commented Jun 7, 2020

gasche commented May 23, 2020 •

edited

gasche commented Jun 1, 2020 •

edited