-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parmatch.exhaust: single-row optimization #9514
Parmatch.exhaust: single-row optimization #9514
Conversation
66f323e
to
d196a96
Compare
This looks very nice, and the implementation is straightforward. However, I'm a bit disappointed by what happens in the motivating example. This patch improves
to
but what I'd really like to see here is:
Would you consider replacing:
with the following?
If there is a counterexample to ps, then _ , ps is a counterexample to p, ps, regardless of what p is. (However, if the first column is a GADT match then _, ps may not be typeable, which is annoying. Is it easy to detect whether a column contains a GADT match?) |
Thanks @stedolan for the feedback. I started by writing a long explanation for why I'm not fond of your suggestion, but in the end I decided that I like it: it gives more compact counter-examples, and it is symmetric (we also have omegas at the end, after the first difference). Note that in #9511 I proposed ordering the default-case counterexamples before the constructor-cases counterexamples, which corresponds to reordering |
d196a96
to
631b960
Compare
I decided to go ahead and change the output order from
to
The change of |
See more discussion of counter-example ordering at #9511 (comment) . It turns out that in the non-single-row case, you do not want to show the default/wildcard group first. I don't know what this means about @stedolan's suggestion here. |
Now that #9511 is merged in trunk, I rebased this PR on top of it and looked more at the impact on the testsuite examples. In the current state, the PR follows @stedolan's suggestion (using My conclusion is that @stedolan's suggestion and the different ordering do not work well in general, even though they work well on the example we discussed and in the general class of fragile-avoidance examples we discussed. I will thus rebase this PR to remove the two changes: ensure we provide the same counter-example order as the standard path, and show Here are some relevant examples, presented as diffs to the compiler behavior provoked by the current PR following @stedolan suggestion and my counter-example ordering (these are diffs from running File "robustmatch.ml", lines 74-76, characters 4-20:
74 | ....match r1, r2, a with
75 | | _, R2, "coucou" -> ()
76 | | R1, _, _ -> ()
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
-(R2, R2, "")
+(R2, _, "") In this example, both counter-examples are equivalent, but I find the previous counter-example easier to read. Indeed, we can use it to follow through the pattern-matching, and reach a given pattern that is not general enough (namely There are many other examples like this one, for example: File "robustmatch.ml", lines 156-159, characters 4-20:
156 | ....match r1, r2, a with
157 | | R1, _, A -> ()
158 | | _, R2, X -> ()
159 | | R1, _, _ -> ()
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
-(R2, R2, Y)
+(R2, _, Y) My intuition is that each counter-example corresponds to a particular sub-pattern that would have to be generalized to cover said counter-example, and that precisely pointing this sub-pattern is useful. (Maybe we should consider being explicit about this, and showing the subpattern / location in the error ?) With the current output, users need to follow the counter-example to find this subpattern, and having the example be precise is helpful in general. Then there are other examples where the problem is not File "robustmatch.ml", lines 228-230, characters 4-37:
228 | ....match r1, r2, a with
229 | | R1, _, (3, "") -> ()
230 | | _, R2, (1, "coucou", 'a') -> ()
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
-(R1, R1, (3, "*"))
+(R1, R1, (0, _)) Here I believe that the previous counter-example was better (even though the new one shows the "leftmost error" as I argued previously); here this is exactly the same reasoning as in the discussion of #9511: to fix the code we probably want to start tweaking the Finally, there is an example in the testsuite where the use of let rec rule : type a b.
(pval, closed, (a,b) tarr) lam -> (pval, closed, a) lam -> b rlam =
fun v1 v2 ->
match v1, v2 with
| Lam(x,body), v ->
begin
match subst body (Bind (x, v, Id)) with Ex term ->
match mode term with
| Pexp -> Inl term
| Pval -> Inr term
end
| Const (IntTo b, f), Const (IntR, x) ->
Inr (Const (b, f x))
;;
[%%expect{|
type closed = rnil
type 'a rlam = ((pexp, closed, 'a) lam, (pval, closed, 'a) lam) sum
+Lines 8-17, characters 2-26:
+ 8 | ..match v1, v2 with
+ 9 | | Lam(x,body), v ->
+10 | begin
+11 | match subst body (Bind (x, v, Id)) with Ex term ->
+12 | match mode term with
+13 | | Pexp -> Inl term
+14 | | Pval -> Inr term
+15 | end
+16 | | Const (IntTo b, f), Const (IntR, x) ->
+17 | Inr (Const (b, f x))
+Warning 8: this pattern-matching is not exhaustive.
+Here is an example of a case that is not matched:
+(Const (_, _), Const (IntTo _, _))
val rule :
(pval, closed, ('a, 'b) tarr) lam -> (pval, closed, 'a) lam -> 'b rlam =
<fun>
|}];; The example is a mouthful and lacks context, but the point is that putting the proposed counter-example in the code with a refutation works (and giving it a right-hand-side results in an "unreachable pattern" warning, at least in Merlin). I believe that this happens because the type-checker does not explode the wildcard patterns enough to reject the counter-example (it has fuel and gives up at some point). If using |
fd74e42
to
b82930f
Compare
best way to make a non-fragile distinction between "all As" and | ||
"at least one B". | ||
*) | ||
List.to_seq [Some p; None] |> Seq.flat_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that this roundabout code is the easiest way of concatenating two lazy sequences. (It looks fine for now, but the existence of such code suggests that Seq is missing some features)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is Seq.append : 'a t -> 'a t -> 'a t
since 4.11, but I wish to delay the computation of the second sequence. We would need a Seq.append_lazy : 'a t -> (unit -> 'a t) -> 'a t
. The present code structure is slightly less clear, but (1) it mirrors the (equally unclear) structure of the non-single-row case, and (2) it made it very easy during my experiment to change iteration order. so there is something to it.
@trefis in our discussion, you mentioned that you were worried about any complexity increase in this part (which is critical for soundness). Would you like to have more time to review this aspect of the PR? |
P.S.: Thanks @stedolan for your review! |
Code looks good, it's a clear improvement over what's there, so merge away. For the record, though, I think the most recent changes made the output worse, and I think this is due to an interesting difference in how we read exhaustivity warnings. Here's one of your examples:
If I understand correctly, your intuition is that an exhaustivity issue is probably caused by one of the cases being insufficiently general. You would like a counterexample that looks like one of the cases, to help you pinpoint which case should be generalised. So, you prefer My intuition is that an exhaustivity issue is probably caused by there not being enough cases. I would like a counterexample that covers as much as possible of the unmatched space, so that I don't have to add very many new cases in order to achieve exhaustivity. So, I'd prefer (I am not saying that one of these intuitions is more valid than the other, but it's curious that they point opposite ways for how to report the issue) |
I agree with the difference, modulo the fact that in most cases I do not expect to make a given case more general, but I expect to add a missing case "right after this case" (inserting a new clause right below), because the two are logically related. Consider for example (outside the single-row case):
for me it is clear that we need to add a new case |
(See testsuite and code comments for an explanation.)
b82930f
to
a3b66f1
Compare
Rebased, change entry added. I will merge if the CI comes back green. |
This PR implements an optimization to pattern-matching exhaustivity checking that compresses exhaustivity counter-examples on certain "single-row" patterns which may occur naturally in user code written to avoid fragile pattern matching. For example:
The first line matches "all
Some
", the second line matches "at least oneNone
" in a non-fragile way. (This is a bit silly for option which is not going to gain a new constructor overnight, but you get the idea for user-defined datatypes.)Now let's consider that you make a mistake and use
Some _
instead of(Some _ | None)
in the last element of the tuple:In the current trunk the counter-example generator will generate 2^(N-1) counter-examples, corresponding to all possible explosions of the or-patterns in the second row.
In particular there is a risk of exponential blowup of the exhaustiveness checker or fragility checker, which is the cause of #9498 (a Stack Overflow in user-written code that exhibits exactly this pattern). The mistake (forgetting the
None
in the last element of the tuple) is not necessary to experience the blowup, it also occurs whenwarning "+4"
is activated or in related examples involving GADTs (see #9511 (comment)).With the present PR, a new optimization is added to the exhaustivity checker that matches a "single-row" scenario (we are asking for exhaustivity counter-examples to a matrix with a single row) that in particular captures this class of user examples. There is now a single counter-example generated, because the or-patterns are not exploded:
In particular, this PR fixes #9498.
You may be wondering how this PR compares to #9511 which I also submitted this morning and also fixes #9498 ?
The two changes are orthogonal: Parmatch: make
exhaust
(exhaustivity and fragility checking) lazy #9511 is good at avoiding blowups when many counter-examples are generated (by being more lazy in how we traverse them), this PR removes the blowup in a particular case. Other cases will still blow up, so I think we still want to have Parmatch: makeexhaust
(exhaustivity and fragility checking) lazy #9511, but in this particular case the counter-example generated are much nicer with this PR. Presently I think that the best would be to have both PRs merged.Parmatch: make
exhaust
(exhaustivity and fragility checking) lazy #9511 is the result of a discussion with @maranget, @trefis and @Octachron, and in particular I am sure that the technical approach is sound. I am confident about the present PR but @maranget may still find a reason why it is in fact completely wrong (that's what he does). @trefis also wanted to keep the exhaustivity-checker (critical for soundness) as simple as possible, this would also be a reason to not merge the present PR.On the other hand, Parmatch: make
exhaust
(exhaustivity and fragility checking) lazy #9511 makes the choice to drop counter-examples in some cases, which may be controversial with some users or maintainers. This PR only acts on a particular case, and then it clearly improves the counter-example provided.The present PR is less invasive than the previous one. Assuming that they are both approved and merged, this one is easier to backport. (However, I think that from the point of view of Bucklescript, the other PR gives the nicer guarantee that many other blowup cases will be handled better, so you probably still want both.)