Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parmatch column coherence #1550

Merged
merged 5 commits into from Jan 3, 2018

Conversation

Projects
None yet
2 participants
@trefis
Copy link
Contributor

trefis commented Dec 29, 2017

Spin off of #1518 following discussions with Gabriel and Luc.

This version of the patch introduce an explicit coherence check at various points in parmatch.
There is a somewhat lengthy description of the check in the code itself, so I won't go into the details here (improvements to that documentation are very welcome though!).

A perhaps more readable version of the diff (as it ignores whitespaces changes) on parmatch.ml can be seen here.

A note on pattern typing

Currently patterns are typechecked from "left to right", that is

type _ t = I : int t | C : char t

let f (type a) (t : a t) (a : a) =
  match t, a with
  | I, 3 -> ()
  | C, 'a' -> ()

is well typed, but the following is not:

let f (type a) (t : a t) (a : a) =
  match a, t with
  | 3, I -> ()
  | 'a', C -> ()

From the point of view of typing alone, one could imagine lifting that restriction at some point in the future and allowing both versions to typecheck.

In pratice however, in their current state, neither parmatch nor matching would survive such a change.

Incidentally Parmatch.every_satisfiable was reversing the order of columns between being called and calling satisfiable.
You'll notice the List.rev, hidden in the commit introducing the coherence check, fixing that problem.

Limitations of the current coherence check

In various places I give a brief explanation of what can happen when we accept a column as coherent while it is in fact ill-typed.
The potential outcomes include false positives and negatives.

However:

  • for the checks which have an impact on the soundness (i.e. exhaustivity and usefulness) we always error on the safe side: we never say something is exhaustive or unused when it in fact isn't.
  • the proposed coherence check catches all the cases (and some more) which resulted in assertion failures/fatal errors in parmatch. The issues that are not detected by the current check were already silently accepted before (see for example the test added by the first commit whose result is left unchanged by the other patches).
@gasche

gasche approved these changes Jan 1, 2018

when an "incoherence" is not detected by this check.
*)

let first_column_is_coherent matrix =

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

This may be a detail, but I think it would be slightly nicer to have a coherent_heads h1 h2 binary function that checks whether two discriminant heads are coherent with each other, and expose this function in the module -- I think it may be of use later. Then you can implement the coherence check with a fold_left, or with this fairly amusing trick

let coherent_column = function
| [] -> true
| r::rs ->
  List.for_all2 (fun (h1, _) (h2, _) -> coherent_heads h1 h2)
    (r::rs) (rs @ [r])

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

Your trick is cute, but:

  • I think mine is slightly more efficient
  • I don't think yours works: what happens if the column is (1 _ 'a' _)?

As for the coherent_heads function, you just want my local same_kind lifted out (making its "discr_pat" a parameter)?
There is no need for it yet, I'd rather keep things as is and make the function globally available only when (if ever) we happen to need it.

This comment has been minimized.

@gasche

gasche Jan 2, 2018

Member

Indeed, the trick is wrong for the reason you give, the relation is not transitive. This raises a question: is it actually the case that only checking coherence of the first non-any pattern with all others is enough? Should we test coherence between all couples of patterns? Or is the relation transitive on all non-any patterns?

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

is it actually the case that only checking coherence of the first non-any pattern with all others is enough?

Yes.
(the proof is left as an exercise for the careful reviewer)

| Const_nativeint _, Const_nativeint _
| Const_float _, Const_float _
| Const_string _, Const_string _ -> true
| _, _ -> false

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

Could this be made non-fragile, by listing the alternatives at least in the first one?

incompatibility: clearly this is fine.
- if we end up returning [true] then we're saying that [qs] is useful while
it is not. This is sad but not the end of the world, we're just allowing dead
code to survive.

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

I think that the correct check is not of whether pss is coherent, but whether pss followed by qs is coherent. See remarks below.

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

That seems right. I changed all such places.

satisfiable pss (simple_match_args p omega @ qs))
constrs
else
if not (first_column_is_coherent pss) then (

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

Here q is omega, so coherent pss is the same as coherent (q :: pss), and this test is correct -- but a comment could point that out.

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

I'm starting to feel like we have enough comments that really simple ones (like this one would be) can actually be omitted (I believe that, as is, one can fairly quickly recover the reason for not checking q :: pss)

This comment has been minimized.

@gasche

gasche Jan 2, 2018

Member

Your code, your choice.

| {pat_desc=Tpat_variant (l,_,r)}::_ when is_absent l r -> false
| q::qs ->
let pss = simplify_first_col pss in
let q0 = discr_pat q pss in
satisfiable (build_specialized_submatrix ~extend_row:(@) q0 pss)
(simple_match_args q0 q @ qs)
if not (first_column_is_coherent pss) then (

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

Here I think that we should check first_column_is_coherent (q0 :: pss) -- that's the idea anyway, I suspect that the types don't match and that the API would have to change slightly.

(list_satisfying_vectors
(build_specialized_submatrix ~extend_row:(@) q0 pss)
(simple_match_args q0 q @ qs))
if not (first_column_is_coherent pss) then (

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

again we should check also if q0 is coherent with the rest.

matrix are stable. *)
List.fold_left (fun acc (_, { varsets; _ }) ->
List.fold_left IdSet.union acc varsets
) IdSet.empty rs

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

I have worked on a better implementation of this on my own (a distinguished All value), as I thought that this would be outside the scope of your PR. I hope to send a PR for this soon.

This comment has been minimized.

@gasche

gasche Jan 1, 2018

Member

This is now pushed as 63ed2cc, please take a look, feel free to ask for changes, and eventually to integrate into the present PR if you wish to -- I can rebase my own on top of the final version of this commit.

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

The idea of your implementation seems nice indeed (I haven't actually reviewed the actual code just yet), but IMO they are independent (which is a good thing) and should be reviewed and merged separately (that is, I don't want to "merge" the GPRs together before actually merging into trunk).

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 1, 2018

Approved in general, but see comments.

trefis added a commit to trefis/ocaml that referenced this pull request Jan 2, 2018

@@ -1656,7 +1656,7 @@ let rec every_satisfiables pss qs = match qs.active with
let pss = simplify_first_col pss in
(* The handling of incoherent matrices is kept in line with
[satisfiable] *)
if not (first_column_is_coherent pss) then
if not (first_column_is_coherent ((uq, make_row rem) :: pss)) then

This comment has been minimized.

@gasche

gasche Jan 2, 2018

Member

maybe we could write (all_coherent (first_column pss)) and (all_coherent (uq :: first_column pss)) to avoid having to come up with the rest of the row that we don't actually care about?

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

Ah, that's what you meant by "the API might need some adjusting"!

Yes, I too find my current implementation a bit distasteful and yours seems prettier. But I'm also worried that yours means iterating over the rows once more, and allocating a fresh list with as many elements as there are rows, whereas with the current implementation we don't need to iterate and don't allocate much.

I'm tempted to keep my implementation because of these (probably unwarranted) performance concerns. But I'll welcome your input on that issue and am ready to change my mind if you feel differently.

This comment has been minimized.

@gasche

gasche Jan 2, 2018

Member

I do think that the performance concern is unwarranted (the compiler already allocates, this case of the function already allocates, and we have no latency constraints), but from your answers regarding matrix_stable_vars it looks like I should send a cleanup PR after this one is merged anyway, so I don't mind proposing some API changes at the same time.

This comment has been minimized.

@trefis

trefis Jan 2, 2018

Author Contributor

Alright, sounds good to me :)
I will leave things as is for now then.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 2, 2018

Btw @gasche , can you suggest/push a Changes entry?

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 2, 2018

Changes, in section "Working version > Internal/compiler-libs changes"

- GPR#1550: make pattern-matching warnings more robust to ill-typed columns
  (Thomas Refis, with help from Gabriel Scherer and Luc Maranget)
@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 2, 2018

Thanks, I added the Changes entry (but I put it in the bugfixes section).

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 2, 2018

If you're not interested in (1) being perfectionist over the coherence API in this PR or (2) removing the matrix_stable_vars part, then I think there is nothing left to change/discuss for the PR. My approval still stand, please rebase as you feel is appropriate and merge (or ask me to if you prefer).

@trefis trefis force-pushed the trefis:parmatch-column-coherence branch from ea42840 to 83a4701 Jan 3, 2018

trefis added a commit to trefis/ocaml that referenced this pull request Jan 3, 2018

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 3, 2018

I rebased, squashed a few of the commits together, and added a "style commit" on top doing the API changes proposed (i.e. the split of first_column_coherence into first_column and all_coherent).
I also renamed my local same_kind (in all_coherent) to coherent_heads which you seemed to like better.

As for the matrix_stable_vars part, I didn't change it, I still think it's better left as part of your GPR.

N.B. I left the style commit on its own so it's easier to review, but I think it should eventually be squash into the "explicitly check coherence of first column" one. I'll be happy to squash it and merge the PR in trunk once you've read it and confirmed one last time.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 3, 2018

Yes, I think the API change makes for nicer code, thanks!

One minor question: all_coherent (q :: first_column pss) raises the question of what is the domain of all_coherent (and thus of the coherent_heads function): does it expected discriminators, as always returned by simplify_first_col, or arbitrary "simplified patterns" (without Tpar_var and Tpat_or)? If I read the code correctly, q above may be a general simple pattern rather than discriminators (its sub-arguments are non-omegas), so I think the natural answer is "simplified patterns" -- which suggests that the name suggestions I made is not the best, but oh well.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 3, 2018

does it expected discriminators, as always returned by simplify_first_col, or arbitrary "simplified patterns" (without Tpar_var and Tpat_or)? If I read the code correctly, q above may be a general simple pattern rather than discriminators (its sub-arguments are non-omegas), so I think the natural answer is "simplified patterns" -- which suggests that the name suggestions I made is not the best, but oh well.

Indeed just "simplified" patterns, as documented.
Note: I didn't know about the name "discriminators", we usually use "normalized" in parmatch (and I believe matching) to mean "where the arguments have been replaced by omega".
In any case, I think the name of the helper function is fine: it only looks at the head of the patterns.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 3, 2018

Feel free to merge.

trefis added some commits Dec 7, 2017

@trefis trefis force-pushed the trefis:parmatch-column-coherence branch from 83a4701 to 9185e77 Jan 3, 2018

@trefis trefis merged commit 64fb4f5 into ocaml:trunk Jan 3, 2018

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 3, 2018

Thanks, I squashed the style commit and merged.

@trefis trefis deleted the trefis:parmatch-column-coherence branch Jan 3, 2018

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 21, 2018

We may need to get this in 4.06 anyway, see MPR#7713.

@@ -249,9 +249,9 @@ let simple_match p1 p2 =
| Tpat_variant(l1, _, _), Tpat_variant(l2, _, _) ->
l1 = l2
| Tpat_constant(c1), Tpat_constant(c2) -> const_compare c1 c2 = 0
| Tpat_tuple _, Tpat_tuple _ -> true
| Tpat_lazy _, Tpat_lazy _ -> true
| Tpat_record _ , Tpat_record _ -> true

This comment has been minimized.

@gasche

gasche Jan 26, 2018

Member

Note: we could check Array.length lbl1.lbl_all = Array.length lbl2.lbl_all here as you do for the coherence check (or some richer check in general). (Currently the tuples are checked but not the records.) If one of the two pattern is empty (no label), they may match.

end
| Tpat_tuple l1, Tpat_tuple l2 -> List.length l1 = List.length l2
| Tpat_record ((_, lbl1, _) :: _, _), Tpat_record ((_, lbl2, _) :: _, _) ->
Array.length lbl1.lbl_all = Array.length lbl2.lbl_all

This comment has been minimized.

@gasche

gasche Jan 26, 2018

Member

By the way, why are we checking the length only and not the lbl_all value itself?

Array.length lbl1.lbl_all = Array.length lbl2.lbl_all
| Tpat_any, _
| _, Tpat_any
| Tpat_record ([], _), Tpat_record ([], _)

This comment has been minimized.

@gasche

gasche Jan 26, 2018

Member

Actually I think that this line is buggy: an empty record pattern is coherent with any other record pattern, not just empty ones.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 26, 2018

The check could be made stricter in various places, and it would probably be better. It wasn't done because it wasn't "necessary". There is a lot of room for improvement.

For the empty record pattern, I disagree with you.
Currently an empty record pattern won't typecheck anyway, so the question is hypothetical. But assuming we allow them, I would expect an empty record pattern to match values of an empty record type, and that's it.
Perhaps that's not how things will happen, perhaps it will indeed mean "I know I'm getting a record, but I don't care about its content". But saying "this line is buggy" seems a bit strong given the current situation.
If you want, you could map Tpat_record ([], _), _ | _, Tpat_record ([], _) to assert false and we'll revisit in due time. But I think the current situation is probably fine/better.

(* only omegas on the column: the column is coherent. *)
true
| discr_pat ->
List.for_all (coherent_heads discr_pat) column

This comment has been minimized.

@gasche

gasche Jan 26, 2018

Member

Note: this line is wrong if we accept the empty record pattern at any record type (because then the relation is not transitive anymore).

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 26, 2018

We did another review of the PR with @maranget, with the merge to the 4.06 branch in mind. We have carefully reviewed and believe that the PR is correct. The review turned out two things:

  • Matching and coherence checks on (non-empty) records could be refined. This does not require any change for 4.06, but it is a change we could consider for trunk
  • The handling of empty record patterns in the coherence check is wrong. This is of no consequence for the compiler given that empty record patterns are rejected by the type-checker anyway, so the PR is safe, but it would still be nicer to remove the faulty line or fix it ((Tpat_record ([], _), Tpat_record _) | (Tpat_record _, Tpat_record ([], _))). It probably wasn't a very good idea to introduce empty records in the code in the first place.
@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 26, 2018

@trefis, would you consider submitting a rebased version of this PR against the 4.06 branch?

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Jan 26, 2018

@trefis, would you consider submitting a rebased version of this PR against the 4.06 branch?

Will do.
Of course this GPR is based on top of the big rewrite of parmatch, which I assume we're not going to backport, so it'll have to be adapted. You'll have to carefully review one more time.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Jan 26, 2018

I'm not sure which way is best, but it may be necessary to try to rebase to see how well it goes before making a decision...

@trefis trefis referenced this pull request Jan 29, 2018

Closed

Backport GPR#1550 on 4.06 #1587

gasche added a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

gasche added a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

gasche added a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

gasche added a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

parmatch: explicitely check coherence of first column
Also extends the testsuite.

originally cherry-picked from fa5a4bb
(ocaml#1550), with a careful rebase by Thomas Refis to account for the fact
that the refactoring GPR#1448, which the original patch relied on, is
not present in the 4.06 branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.