Parmatch: make `exhaust` (exhaustivity and fragility checking) lazy #9511

gasche · 2020-04-28T20:51:21Z

This PR fixes #9498 (a Stack Overflow when checking for fragile pattern matching in a hand-written user program) and subsumes #9499 and #9502, which were earlier approaches that made witness/counter-example generation partially lazy.

(cc @maranget, @trefis and @Octachron, with which I discussed the design space; we decided on this approach together, see #9502 (comment) )

Design

With this PR, the exhaust function is completely lazy (it returns a Seq.t of non-matched patterns), and we only keep the first output that passes the type-checker (the pred function provided by Typecore). Before, the type-checke would decide (through the type_pat splitting mode) whether to keep only the first example (Backtrack_or) or to keep more examples (Refine_or); in particular, in some cases we show simpler, shorter reports to users, with only one counter-example to exhaustivity instead of an or-pattern of many examples.

Before:

# function (true,true,true,true) -> ();;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((true, true, true, false)|(true, true, false, _)|(true, false, _, _)|
(false, _, _, _))

After:

Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
(true, true, true, false)

We could decide to show more than one counter-example, but there isn't a clear choice of bound/cutoff (if we show all of them, we are back to an exponential behavior); I don't think this would be a good idea.

On the other hand, we could decide to show the wildcard-group counter-example before the constructor-group counter-examples, which would show (false, _, _, _) in this case. This seems nicer to me (I would assume that wildcard-group counter-examples are typically broader: "the difference more to the left") and I would be interested in @maranget's opinion, it sounds like something he would have intuitions about.

Performance impact

Besides fixing a real-world bug (a stack overflow on a human-written program, #9498), basically the hope is that this is the end-game in terms of avoiding or-pattern blowups resulting from the Parmatch.exhaust function. On the family of exponential examples provided by @maranget in #9499 (comment), all examples now run in linear time (instantaneous).

For previous iterations of this PR, I suspected that they could increase compilation time in common cases (it used an lazy sequence but would often force it fully in the end). This new version should be just as fast or faster than trunk in the common case, because we stop the traversal after the first valid counter-example is found. (If there is only one example, then the cost is very small anyway; if there are many, we save time by giving less work to the type-checker.) I expect that the difference will not be noticeable on most programs.

Edit: I realized later that this reasoning does not hold in the case of exhaustiveness checking. In the common case, a function definitive exhaustive, which means that we have to traverse all counter-example candidates to check that none of them are valid (so in particular we fully force the sequence). In this common use-case we do expect a constant-factor slowdown due to the use of a lazy rather than strict sequence; but in practice I was unable to observe a noticeable performance difference, even on hand-crafted worst-case examples.

Code style

I rewrote the PR with non-invasiveness in mind, and also the goal of not using any additional function missing from the Seq module. In particular I inlined my pop : 'a Seq.t -> ('a * 'a Seq.t) option function, and I rewrote one part in a slightly less natural form to use just Seq.flat_map instead of Seq.append. This avoids the question of where to store the Seq additions (locally in the function, or at the beginning of Parmatch, or in utils/misc.ml?), and should make it easier for our Bucklescript friends to rebase to old OCaml versions if they want to.

gasche · 2020-04-28T21:04:30Z

The "end-game" comment was maybe a bit tacky. There are situations involving GADTs where an exponential number of counter-example candidates are considered and all discarded (so looking for the first valid counter-example), and the type-checking time remains exponential in this case.

For example (I will add it to the testsuite):

type _ t =
  | A : int t
  | B : int t
  | C : bool t

type force = Int of int t

let test = function
  Int (A|B), Int (A|B), Int (A|B), Int (A|B), Int (A|B),
  Int (A|B), Int (A|B), Int (A|B), Int (A|B), Int (A|B),
  Int (A|B), Int (A|B)
-> ()

On this family of programs, exhaustivity checking is still exponential. On this example (N=12), this PR is twice faster than trunk (which is a good sign that it is safe for compile-time in the general case). On larger N values the PR terminates (slowly) while trunk does a Stack Overflow.

gasche · 2020-04-28T21:43:33Z

In typing-warnings/exhaustiveness.ml, @garrigue wrote a test that explicitly mentions that several counter-examples (or-ed together) are shown to the user:

(* Warn about all relevant cases when possible *)
let f = function
  | None, None -> 1
  | Some _, Some _ -> 2;;

(*
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
((Some _, None)|(None, Some _))
*)

With the present PR, we only have one counter-example shown. Is this a problem?

Note that, already in trunk, you only see one example if a GADT is involved:

type _ option_gadt =
| None : 'a option_gadt
| Some : 'a -> 'a option_gadt;;

let f = function
  | None, None -> 1
  | Some _, Some _ -> 2;;
(*
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
(Some _, None)
*)

bobzhang · 2020-04-29T01:45:17Z

@gasche thanks for the quick fix.
With regard to rebase, Seq is not a big issue, feel free to use it as you like. (we plan to upgrade in 2021, we did once in 2019, so we probably will rebase it).
I would prefer wildcard-group counter-examples first (if it does not complicate the patch), maybe showing 4 or 5 would make refactoring a bit easier, since you typically add more cases during refactoring

exhaustiveness.ml contains a mix of tricky examples (checking correctness of the typer) and "slow" examples (that have exponential checking time due to a search explosion). Everything is run using expect_test, which uses the bytecode type-checker, so the "slow" examples are very slow. This PR separates out the "slow" example in their own file exhaustiveness_slow.ml which is not a reference test but checked using ocamlc.opt. On my machine make one DIR=tests/typing-warnings goes from 14s to 2.3s. This would be important in the future if we want to add more "slow" tests (in particular this would be useful for ocaml#9511). (No change entry needed.)

exhaustiveness.ml contains a mix of tricky examples (checking correctness of the typer) and "slow" examples (that have exponential checking time due to a search explosion). Everything is run using expect_test, which uses the bytecode type-checker, so the "slow" examples are very slow. This PR separates out the "slow" example in their own file exhaustiveness_slow.ml which is not an expect-style test but checked using ocamlc.opt. On my machine make one DIR=tests/typing-warnings goes from 14s to 2.3s. This would be important in the future if we want to add more "slow" tests (in particular this would be useful for ocaml#9511). (No change entry needed.)

trefis · 2020-04-29T13:11:28Z

@gasche: this might be a naïve question (I haven't dived back into the actual code yet) but I would have thought that since parmatch doesn't generate or-patterns anymore, Typecore.type_pat could be simplified (i.e. all the logic around Refining_or vs Splitting_or could disappear).

Is that not the case, or have you just not got round to it yet?

gasche · 2020-04-29T13:28:44Z

Parmatch does generate or-patterns, but not at the toplevel anymore. In particular, exploding a wildcard results in an or-pattern there. I think that we could review the type_pat mode logic to see if there are things that can be simplified, but I have not done it yet, and I am not that hopeful. Two additional reasons not to hurry:

If we decide that we want to keep showing more counter-examples when it is easy (and not expensive) to do so (for example to keep @garrigue's testcase unchanged), one way to do it is to orify-many candidates by group of N, and send those to the type-checker together. (We could also keep sending each candidate to the type-checker independently and keeping the first N valid ones, this is what I initially implemented, it has a higher risk of exploding, and the pre-filter batching has the nice property of coinciding with the (fairly ad-hoc) trunk behavior in many cases.)
There are other improvements to exhaust we could consider that keep emitting or-patterns in candidates. I mentioned the single-row optimization in our meeting yesterday, I have a branch almost-done that does it, on (A|B),(A|B),A it would generate a single counter-example (A|B),(A|B),B. Slick!

gasche · 2020-04-29T13:30:23Z

(Clarification: I think that currently the only or-patterns generated in depth are those that come from wildcard explosion, they are generated in Parmatch by ppat_of_type and cie (look for remaining callers of orify_many). They are not initially part of the candidate, but appear during type_pat transformations.)

trefis · 2020-04-29T14:01:40Z

Thanks for the explanation :)

gasche · 2020-05-13T07:42:37Z

@garrigue this PR ensures that only a single counter-example is shown to the user. This is good for performance and it follows the recommendation of @maranget. If you object to this change, could you make it clear? Otherwise we should probably assume that it is fine -- and go ahead if the PR is approved.

garrigue · 2020-05-13T14:30:53Z

I have no strict opinion about the number of counter-examples shown to the user.
It is nicer to show more counter-examples, as this gives a better idea of how to fix the code, but if the cost is to high...
This said, in the case of GADTs it is not clear to me how stopping on the first counter-example helps: if there is no valid counter example, one will have to enumerate all potential counter-examples anyway, so there will be no performance improvement.

gasche · 2020-05-13T14:58:21Z

Indeed, we only save checking counter-examples after the first valid one.

For exhaustiveness checking, most programs are exhaustive and will not have any counter-example, so there is no gain ... except we avoid stack overflows by (1) being tail-recursive in generation and (2) sending smaller candidates to the type-checker.

On the other hand, for fragility checking, most programs are not fragile and thus return many counter-examples; there will be a gain, possibly a large one.

gasche · 2020-05-13T15:21:26Z

It is nicer to show more counter-examples, as this gives a better idea of how to fix the code, but if the cost is to high...

We could probably decide to show a few more counter-examples, but the question then is how many we show. 3 ? 5 ? All of them until we exhaust a fixed computation time ? The nice thing with just 1 is that it seems to be a canonical answer. It is also what the type-checker does when it is in backtracking mode -- so currently the number of counter-examples depends on the use of GADTs or not.

garrigue · 2020-05-13T15:35:22Z

Note that redundancy checking also expects that a counter-example will be found, and one should suffice. So I agree that in general there is no need to generate all counter-examples.

A question could still be whether the cost of doing so when we are checking specifically exhaustiveness is really problematic, compared to the benefit of having a comprehensive list of cases. I suppose this is completely independent of the generation part.

gasche · 2020-05-13T15:40:22Z

For any fixed number N of counter-examples we decide to show, we can build an artificial example that will blowup but would not blowup with less counter-examples. But we have no reason to believe that, except for maybe N=2, there are non-artificial situations like this in the wild. The question is whether we have a good reason to choose any specific N other than 1. If we don't know how many we want, then maybe 1 is enough. If you can make an argument that 3 is better than 1, but 4 is not better than 3, I'm happy with 3. Or 2. Right now the only choice I know how to justify is 1.

gasche · 2020-05-13T15:41:23Z

I suppose this is completely independent of the generation part.

Yes. It's trivial to tweak this PR to show N counter-examples instead of 1.

gasche · 2020-05-14T08:34:59Z

(With the current state of my understanding, the only justified choice is N=1 (and Luc supports this specific choice), so I think it would be reasonable to merge this PR if it gets properly reviewed/approved. We can of course change the number of counter-example later if we get new information.)

garrigue · 2020-05-14T11:40:03Z

My answer may have been unclear, but I never suggested using a fixed N other than 1.
I was just wondering whether, in some situations it wouldn't be nice to have all the counter-examples, but the criterion is not clear, so this can wait for sometime later.

gasche · 2020-05-14T11:49:29Z

I see, thanks for the clarification.

I am still wondering about whether we could/should change the order of counter-example to use the wildcard/default-group first (whether we want to show the counter-example with the leftmost or the rightmost difference).

One argument in favor of showing more counter-examples: in exhaustivity checking, if we assume that the user is going to fix the counter-example given (and maybe generalize a bit to related ones), they will re-run the type-checker again and the following counter-examples will be forced. For example, if we show one counter-example at a time and they iterate (recompile and fix) three times, we have the same explosions that we would have had with N=3 anyway. (If the example can explode because of GADTs, then this explosion is inevitable once the pattern becomes exhaustive.)

gasche · 2020-05-24T10:00:16Z

I tried to change the counter-example order to include the wildcard/default-group counter-examples first, and then the constructor group. My intuition was that it would be better because it shows the counter-example with "leftmost" difference first.

In fact I don't like the result, at least the impact on the testsuite. Many cases look like this:

 151 | ....match r1, r2, a with
 152 |     | R1, _, A -> ()
 153 |     | _, R2, X -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R1, R1, B)
+(R2, R2, Y)

When I see the match above, obviously both cases are missing, but if I was to fix the code by adding new clauses for them I would put the (R1, R1, B) clause first, after the other (R1, _, A) clause (actually I would use (R1, _, B)), and add (R2, R2, Y) as the last clause. Intuitively I prefer if the compiler shows me example in the order in which I want to add them in the source, and in some sense this corresponds to showing example from the "more specific" to the "more general" -- contrary to my initial intuition.

Another example of (arguably) regression:

 2 | ..match x, y with
 3 |   | _, A z -> z
 4 |   | _, B z -> if z then 1 else 2
 5 |   | _, C z -> truncate z
 6 |   | TE TC, D [|1.0|] -> 14
 7 |   | TA, D 0 -> -1
 8 |   | TA, D z -> z
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(TE TC, D [| 0. |])
+(TB, D _)

This does not mean that the current strategy is always the right one, of course. Sometimes the new order would result in a better result, or at least a non-worse result:

 176 | ....match r1, r2, a with
 177 |     | _, R1, 0 -> ()
 178 |     | R2, _, [||] -> ()
 179 |     | _, R1, 1 -> ()
 Warning 8: this pattern-matching is not exhaustive.
 Here is an example of a case that is not matched:
-(R2, R2, [| _ |])
+(R1, R1, 2)

Here I would in fact rather add the (R1, R1, 2) (or maybe (_, R1, 2)) clause first, so the new counter-example is better. This is a consequence of the fact that the first clause starts by matching on the second value; in this case the leftmost difference is in fact the more specific, rather than the more general. This is symmetric to the previous examples, so in theory neither situation is going to occur more often than the other. But in practice I think that people typically write their pattern-matching with a left-to-right order in mind (they expect to match on the "first" (leftmost) value first), and of course for GADTs this order is favored by the type-checker. So showing the rightmost difference first is going to be "more specific" more often, and thus it is the better default strategy.

The ideal ordering of counter-example would be the order that follows the source program: we could order the submatrices by the order of first appearance in the clauses. The output would be very nice, but the implementation could be a bit unpleasant; and it is important to keep this soundness-critical part of the codebase nice. I will think more about it.

P.S.: @trefis warned me wisely when I opened the PR (26 days ago) that we don't want to get caught deeply in the details of counter-example ordering, and that is a risk of the path this PR was taking with the cut-to-1 strategy. Spot on! But then this PR has remained open on Github and in my head, and no one else did the work of helping ourselves move on, so here we are.

gasche · 2020-05-24T17:46:20Z

Of course I got obsessed with the question of the perfect ordering of counter-examples. (@trefis: I believe this is relevant to your interest in Merlin clause completion.)

I just pushed a new commit that ensures that, during the computation of specialized submatrices, we preserve source order: the sub-rows of the specialized submatrices are in source order, and the submatrices themselves appear in the order of their first row in the source.

This gives very nice ordering properties for the generated counter-examples. For example, consider

function
| true, true -> true
| false, false -> false

The type-checker would previously first suggest (false, true) and then (true, false). But it is more natural to use the opposite order, as the new commit does : true first because it comes first in the source in the first column. This corresponds to the following "natural" completion of the program.

function
| true, true -> true
| true, false -> ?
| false, false -> false
| false, true -> ?

Note: I first tried something more complex and more delicate, which is to also respect the ordering of the default submatrix relative to the specialized submatrices. This actually gives bad results, in the cases I looked at it is not what we want. (There are more explanations in the commit message if for some strange reason you are curious about this.) I now believe that the default submatrix should always go last.

trefis

This all looks correct and like a nice improvement.

I really liked the effect of the last commit.

trefis · 2020-05-28T08:36:44Z

typing/parmatch.ml

@@ -2211,7 +2220,11 @@ let check_unused pred casel =
          let qs = [q] in
            begin try
              let pss =
-                  get_mins le_pats (List.filter (compats qs) pref) in
+                (* prev was accumulated in reverse order;
+                   restore source order to get ordered counter-examples *)


I don't think the order matters here, we're just trying to decide whether q is useful or not.

If you comment the List.rev here, the following happens:

type ('env, 'a) var = | Zero : ('a * 'env, 'a) var | Succ : ('env, 'a) var -> ('b * 'env, 'a) var ;; type ('env, 'a) typ = | Tint : ('env, int) typ | Tbool : ('env, bool) typ | Tvar : ('env, 'a) var -> ('env, 'a) typ ;; let f : type env a. (env, a) typ -> (env, a) typ -> int = fun ta tb -> match ta, tb with | Tint, Tint -> 0 | Tbool, Tbool -> 1 | Tvar var, tb -> 2 | _ -> . (* error *) ;; [%%expect{| Line 15, characters 5-6: 15 | | _ -> . (* error *) ^ Error: This match case could not be refuted. - Here is an example of a value that would reach it: (Tint, Tvar Zero) + Here is an example of a value that would reach it: (Tbool, Tvar Zero) |}];;

trefis · 2020-05-28T08:54:06Z

typing/parmatch.ml

+          List.map (fun constr_mat -> Some constr_mat) constrs @ [None]
+          |> List.to_seq
+          |> Seq.flat_map
+            (function
+              | Some constr_mat -> try_non_omega constr_mat
+              | None -> try_omega ())


I found this a bit hard to get my head around, harder than I'd like it to be at least.
The following seems simpler to me

Seq.append (Seq.flat_map try_non_omega (List.to_seq constrs)) (try_omega ())

Though I understand it implies more computations early on (but does it really matter?)

Note: I'm fine with leaving the code as is.

I care about delaying computations as much as we (easily) can. Should I use [ `Specialized of foo | `Default ] instead of options?

This solves exponential-blowup issue with the strict traversal and/or strict witness computation in cases where an exponential number of counter-examples is generated. This fixes Stack Overflow and exponential-time issues on examples using or-patterns heavily, including one that naturally found its way in real-world user code (see the following testsuite commit). We now systematically keep only one counter-example, instead of letting the type-checker decide whether to discard counter-examples (in Backtrack_or mode) or to preserve them (in Refine_or mode). Note: in the exhaustivity warning, there are sub-messages printed to indicate that: - the exhaustivity counter-example is related to an extensible type, or - that a when-guarded clause does match the counter-example In both cases the warning is there to explain the counter-example(s) shown (not a property of all counter-examples); keeping at most one valid counter-example means that they will be printed less often, but it is the correct/intended behavior in that case.

We produce exhaustivity counter-example in the order of the specialized submatrices. Having submatrices in source order gives the nice behavior that the clause that would naturally been inserted first in the source is given first as a counter-example. Consider for example: function | true, true -> true | false, false -> false The two counter-examples are (true, false) and (false, true). Before this patch, (false, true) would be shown first. After this patch, (true, false) is shown first. This corresponds to the following natural completion order: function | true, true -> true | true, false -> ? | false, false -> false | false, true -> ? On the other hand, the ordering of the default submatrix, with respect to the specialized submatrices, is not preserved -- it is always ordered last. One could intuitively expect the default submatrix to appear in the position of the first omega row in the source. We tried this, and it is not a good idea: - the change is much more invasive as the interface of `build_specialized_submatrices` has to change - the behavior of the result is in fact unpleasant; it is not intuitive to order counter-examples in this way. For example, consider: function | _, None -> false | true, Some true -> false The two exhaustivity counter-examples are (true, Some false) and (false, Some _). The second comes from the default submatrix: morally it is (_, Some _), with "some other constructor missing from the column" instead of the first _. There is no reason to suppose that the user would want to place this (_, Some _) or (false, Some _) counter-example first in the completed code; indeed, this intuition would suggest writing an exhaustive covering of patterns of the form (_, foo), inserted after the first clause, but then the other clauses below become unnecessary! When an omega patterns appears high in the column like this, it is usually because there is a very specific matching condition to the rest of its row, that justifies some shortcutting behavior. The program is typically *not* completed by adding more specific matching conditions.

gasche · 2020-05-30T17:03:40Z

I am eager to close this particular branch of work (although we still need to decide the fate of #9514 for this), so I decided to go ahead and merge the branch as approved. (In addition to @maranget's approval of principle in a discussion along with @trefis and @Octachron when the PR was written.)

This was referenced Apr 28, 2020

parmatch: use a Seq.t for exhaustivity witnesses #9499

Closed

parmatch: cut the list of valid exhaustivity witnesses #9502

Closed

gasche force-pushed the parmatch-exhaust-lazy branch 3 times, most recently from 6636f0c to a7065aa Compare April 28, 2020 20:57

gasche force-pushed the parmatch-exhaust-lazy branch from a7065aa to 40d366b Compare April 28, 2020 22:15

gasche mentioned this pull request Apr 29, 2020

testsuite: in typing-warnings, use .opt binaries for the "slow" tests #9512

Closed

gasche mentioned this pull request Apr 29, 2020

Parmatch.exhaust: single-row optimization #9514

Merged

gasche force-pushed the parmatch-exhaust-lazy branch from 40d366b to 2d34dc2 Compare May 24, 2020 17:40

trefis approved these changes May 28, 2020

View reviewed changes

gasche added 4 commits May 30, 2020 17:10

add regression tests for fragile-matching stack overflows

a8adec1

Changes entry

f8e24bd

gasche force-pushed the parmatch-exhaust-lazy branch from 2d34dc2 to d8acfa9 Compare May 30, 2020 15:12

gasche merged commit 5c9d5db into ocaml:trunk May 30, 2020

johnhaley81 mentioned this pull request Jul 10, 2020

Fix stack overflow shakacode/re-formality#87

Merged

gasche mentioned this pull request Sep 27, 2023

Type-checking slow for certain nested GADT patterns #12612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parmatch: make `exhaust` (exhaustivity and fragility checking) lazy #9511

Parmatch: make `exhaust` (exhaustivity and fragility checking) lazy #9511

gasche commented Apr 28, 2020 •

edited

gasche commented Apr 28, 2020

gasche commented Apr 28, 2020

bobzhang commented Apr 29, 2020

trefis commented Apr 29, 2020

gasche commented Apr 29, 2020

gasche commented Apr 29, 2020

trefis commented Apr 29, 2020

gasche commented May 13, 2020

garrigue commented May 13, 2020

gasche commented May 13, 2020

gasche commented May 13, 2020 •

edited

garrigue commented May 13, 2020

gasche commented May 13, 2020 •

edited

gasche commented May 13, 2020

gasche commented May 14, 2020

garrigue commented May 14, 2020

gasche commented May 14, 2020

gasche commented May 24, 2020 •

edited

gasche commented May 24, 2020 •

edited

trefis left a comment

trefis May 28, 2020

gasche May 28, 2020

trefis May 28, 2020

gasche May 28, 2020

gasche commented May 30, 2020 •

edited

Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511

Parmatch: make exhaust (exhaustivity and fragility checking) lazy #9511

Conversation

gasche commented Apr 28, 2020 • edited

Design

Performance impact

Code style

gasche commented Apr 28, 2020

gasche commented Apr 28, 2020

bobzhang commented Apr 29, 2020

trefis commented Apr 29, 2020

gasche commented Apr 29, 2020

gasche commented Apr 29, 2020

trefis commented Apr 29, 2020

gasche commented May 13, 2020

garrigue commented May 13, 2020

gasche commented May 13, 2020

gasche commented May 13, 2020 • edited

garrigue commented May 13, 2020

gasche commented May 13, 2020 • edited

gasche commented May 13, 2020

gasche commented May 14, 2020

garrigue commented May 14, 2020

gasche commented May 14, 2020

gasche commented May 24, 2020 • edited

gasche commented May 24, 2020 • edited

trefis left a comment

Choose a reason for hiding this comment

trefis May 28, 2020

Choose a reason for hiding this comment

gasche May 28, 2020

Choose a reason for hiding this comment

trefis May 28, 2020

Choose a reason for hiding this comment

gasche May 28, 2020

Choose a reason for hiding this comment

gasche commented May 30, 2020 • edited

Parmatch: make `exhaust` (exhaustivity and fragility checking) lazy #9511

Parmatch: make `exhaust` (exhaustivity and fragility checking) lazy #9511

gasche commented Apr 28, 2020 •

edited

gasche commented May 13, 2020 •

edited

gasche commented May 13, 2020 •

edited

gasche commented May 24, 2020 •

edited

gasche commented May 24, 2020 •

edited

gasche commented May 30, 2020 •

edited