Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refreshing parmatch #1488

Merged
merged 31 commits into from Dec 4, 2017

Conversation

Projects
None yet
5 participants
@trefis
Copy link
Contributor

trefis commented Nov 23, 2017

tl;dr: adding comments, renaming things and deduplicating code

A previous version of this work has already been reviewed by @gasche and @maranget, the comments that follow (apart from some obvious exceptions) are meant for them and might not make sense without a log of the previous discussions.
That being said the commits should all be somewhat atomic and a majority of them contain no meaningful code change (so they should be fairly easy to read).

External review required

Some commits in the list would be better reviewed by @garrigue than by Gabriel or Luc.
Jacques could you have a look at the following commits:

  • "update comment": the previous comment was stale, but perhaps I misread the condition and wrote a wrong comment. Or perhaps it could be written/explained better.
  • "remove stale adhoc fix for printf.ml": I'm betting that this was needed before the switch to GADTs, although I cannot easily verify that.
    The stdlib, and the testsuite compile fine without that line. So I just removed it.
    @gasche would like to keep the line "for safety", but I still would like to see it removed. We have currently no example of code requiring that line (but perhaps you can produce one), and I don't think it's good to keep code that has no visible impact and that no one (at least in Europe, although I haven't asked @yallop) understands. The "good" thing is that the worst that can happen if we remove this line is that we might encounter some code that needs it, at which point the assert false in orify_many will trigger and we can then reintroduce the line and update the testsuite!
    What do you think / what can you tell us about this?

Updates w.r.t. the previous version

(this section is targeted at Gabriel and Luc)

  • I implemented the discussed change to [build_specialized_submatrices], I did that in a separate commit (build_specialized_submatrices returns the default matrix) to make it easier to review (but it should be squashed eventually).
    I also removed build_default_matrix as (as we guessed) it became unused.
    @gasche: you said you didn't remember why we didn't keep the "omega group" even when we had constructors in the first column, as we would need it if the signature is not complete. And we do indeed need it but were recomputing it (i.e. we were calling build_default_matrix).

  • @gasche you had suggested that do_match could be factorized. Were you talking about the changes that are done in "remove some code duplication"?

  • I unmerged satisfiable and satisfiables as Luc requested. But I did keep the new name (list_satisfying_vectors) for satisfiables.

I believe I addressed every other comment present in the notes you sent, but you should double check.

PS: my initial plan was to merge all of these commits, but @gasche seemed to be against it. So I delegate the decision to him.

@trefis trefis force-pushed the trefis:cleanup-parmatch-revisited branch from cee345a to 91f05d9 Nov 23, 2017

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 23, 2017

@gasche would like to keep the line "for safety", but I still would like to see it removed. [...] the worst that can happen if we remove this line is that we might encounter some code that needs it, at which point the assert false in orify_many will trigger and we can then reintroduce the line and update the testsuite!

Yes, I think that risking breaking our users' code as a way to recover information about our codebase is not a good idea. I would rather keep the line unless someone can prove (by reasoning on the code statically) that is never useful -- or @garrigue remembers why it's there and reassures us that it can safely be removed.

Maybe it's not too hard to use git blame or git log to understand when the line was added, and look at what printf.ml was like at the time to get a reproduction case. (It was probably full of Obj.magic, but even code full of Obj.magic (for example the code produced by Coq) should type-check correctly.) @trefis, have you considered this?

let rec remove_first_column = function
| (_::ps)::rem -> ps::remove_first_column rem
| _ -> []
in

This comment has been minimized.

@gasche

gasche Nov 23, 2017

Member

Is this the same thing as List.map List.tl?

This comment has been minimized.

@trefis

trefis Nov 28, 2017

Author Contributor

Not exactly: what about matrices which contains some rows but no columns?
I don't know off-hand whether we can have some in that places and the current implementation was meant to handle them, or if the code was just written this way because it was written this way.

Not sure it's worth changing.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 23, 2017

PS: my initial plan was to merge all of these commits, but @gasche seemed to be against it. So I delegate the decision to him.

To clarify, you are talking of squashing all the commits. Yes, I think it's a bad idea: countless time people thought "oh it's painful to prepare a proper patchset and it won't matter anyway", and we find the giant commit while bisecting something or blaming to understand the reason for a change, and it's a pain.

Here all your changes are reasonable as self-contained commits, there is nothing that obviously needs to be squashed into previous commits. Would you have a semi-automated way to check that the compiler remains buildable and passes the testsuite after each commit? If you do, and if it passes, then I would support merging the PR with the separate commits.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 23, 2017

(One way to do the check is to write a script that does clean; world.opt || exit 3; tests || exit 4, and then use the x/exec feature of git rebase -i trunk -- see documentation -- to run it after each commit)

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 23, 2017

The build_specialized_submatrices change is very nice, thanks!

My understanding of the discussion with @maranget is that this is good to merge modulo the two points that @garrigue should comment on. (I still think that 9cc2261 should not be in the PR by default, and its presence is what discouraged me from "approving" already.)

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Nov 24, 2017

Maybe it's not too hard to use git blame or git log to understand when the line was added, and look at what printf.ml was like at the time to get a reproduction case. (It was probably full of Obj.magic, but even code full of Obj.magic (for example the code produced by Coq) should type-check correctly.)
@trefis, have you considered this?

I had actually, blaming gave me no insights. There was no Obj.magic, I tried building the files of then with my branch, but failed to. What I didn't do though (and I don't know why now) is to remove the line in the compiler of the time, try compiling the stdlib and see what breaks. Which I just did this morning and though I do not understand why it seems the matching on abstract types was the cause of the problem (so nothing printf specific).

Here is a minimal reproduction case:

type abstract

type opt =
  | N
  | S of abstract


let name_of_input = function
  | N -> ()
  | S _ -> ()

OCaml 4.00

With the following patch applied:

diff --git a/typing/parmatch.ml b/typing/parmatch.ml
index 0873f2b6d..36f011331 100644
--- a/typing/parmatch.ml
+++ b/typing/parmatch.ml
@@ -1187,7 +1187,18 @@ let exhaust_gadt ext pss n =
     Rnone -> Rnone
   | Rsome lst ->
       (* The following line is needed to compile stdlib/printf.ml *)
-      if lst = [] then Rsome (omegas n) else
+    if lst = [] then (
+      let () =
+        let loc =
+          try let pat = List.hd (List.hd pss) in pat.pat_loc
+          with _ -> Location.none
+        in
+        Format.fprintf Format.err_formatter
+          "%a: empty witness list (will return _×%d)\n%!"
+          Location.print_error loc n
+      in
+      Rsome (omegas n)
+    ) else
       let singletons =
         List.map
           (function

Here is what I get:

$ ./boot/ocamlrun ./ocamlc -w A -nostdlib -nopervasives ./test.ml
File "./test.ml", line 9, characters 4-5:
Error: : empty witness list (will return _×1)
$

This branch

With the following diff:

diff --git a/typing/parmatch.ml b/typing/parmatch.ml
index e723c2c32..c7754a75f 100644
--- a/typing/parmatch.ml
+++ b/typing/parmatch.ml
@@ -1234,6 +1234,18 @@ let exhaust ext pss n =
   match ret with
     No_matching_value -> No_matching_value
   | Witnesses lst ->
+    if lst = [] then (
+      let () =
+        let loc =
+          try let pat = List.hd (List.hd pss) in pat.pat_loc
+          with _ -> Location.none
+        in
+        Format.fprintf Format.err_formatter
+          "%a: empty witness list (will return _×%d)\n%!"
+          Location.print_error loc n
+      in
+      Witnesses (omegas n)
+    ) else
       let singletons =
         List.map
           (function

Here is what I get:

$ ./boot/ocamlrun ./ocamlc -w A -nostdlib -nopervasives ./test.ml
$

Conclusion

I do not understand what was going wrong, but I'm still in favor of removing the line since we know what it was trying to catch. I'd be happy to update the patch to add this small example to the testsuite to make sure it is not broken in the future.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Nov 24, 2017

Yes, I think that risking breaking our users' code as a way to recover information about our codebase is not a good idea. I would rather keep the line unless someone can prove (by reasoning on the code statically) that is never useful -- or @garrigue remembers why it's there and reassures us that it can safely be removed.

Keeping cruft that nobody understand is not a good solution either. I agree we should track the original reason, but if one cannot find it and if removing the irregularity does not break any known code, I'm in favor of doing the clean up.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 24, 2017

I don't disagree. You have my encouragement to remove it if someone checks that the whole of opam builds without it. Unless this is done, please keep it in.

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Nov 28, 2017

About the controversial line | Rsome lst -> if lst = [] etc.). I have checked all occurrence of Rsome e expressions: (1) there is one base case Rsome [omegas n]; and several inductive case, with either preserve argument length (map) or increase it (append). Hence I think tat we face leftover code and that the risk of deleting this line is minimal.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 28, 2017

But Thomas above had a reproduction case where the conditional fires in OCaml 4.00. Does his repro case not get into this path anymore on trunk?

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Nov 28, 2017

@garrigue, Now let us look at function exhaust (about line 1200 in file parmatch.ml). The comment (which I wrote) does not apply to the code (which @garrigue wrote, I guess). The comment refers to the original code that relied on the described property not to performed inductive calls over specialized matrices when the constructors collected from the argument matrix do not form a complete signature. The code has probably been changed for the purpose of GADT or whatever.

If Jacques confirms that the code should stay as it is, I suggest deleting the comment.

* D non-exhaustive => we have a non-filtered value
*)
let r = exhaust_gadt ext (filter_extra pss) (n-1) in
(* As the default matrix is included in [pss] one can avoid recursive

This comment has been minimized.

@maranget

maranget Nov 28, 2017

Contributor

Here is one contradiction: the recursive calls are not avoided, Even more, they are performed in all situations, the result being bound to before.

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Nov 28, 2017

As far as I know, @gasche Thomas repro case is for OCaml 4.0, not for trunk.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Nov 28, 2017

Here all your changes are reasonable as self-contained commits, there is nothing that obviously needs to be squashed into previous commits.

All the commits do compile on their own, except for the one preceding "make depend". I should probably just squash the "make depend" commit into it.

I have checked all occurrence of Rsome e expressions: (1) there is one base case Rsome [omegas n]; and several inductive case, with either preserve argument length (map) or increase it (append).

Note that that wasn't the case in 4.04 (search for "dug").

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Nov 28, 2017

Note that that wasn't the case in 4.04 (search for "dug").

Sorry I do not see your point. Looking for dug in parmatch.ml for 4.04, I see a map over list r extracted by pattern Rsome r. Thus, a case of length conservation.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Nov 28, 2017

That’s because I should have written 4.00...
Apologies.

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Nov 28, 2017

Ok I see, Rsome [] was possible in 4.00, not now. So I'd feel relatively confident to assume this is no more possible now. At worst, I suggest an assert.

My other remark on the discrepancy between code and comment is still valid.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Nov 28, 2017

If both of you are convinced through static reasoning that this code/case cannot be executed with the current compiler codebase, then of course it is reasonable to remove it.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Nov 29, 2017

My other remark on the discrepancy between code and comment is still valid.

Indeed.

To sum up: @garrigue could you please comment on the following two points:

  • Luc pointed out that this comment doesn't match the code as recursion on specialized submatrices already happened (the result being bound to before).
    Should the comment be removed or can the code be updated?
  • I updated that comment. Does it seem correct to you?

Btw, I'm not fond of the wording of that last comment, I'd appreciate it if someone (@gasche?) could make improvement suggestions. Perhaps the comment should simply be removed and the conditional made easier to read, again, suggestions would be appreciated.

@gasche

gasche approved these changes Nov 30, 2017

Copy link
Member

gasche left a comment

I think that we shouldn't wait for Jacques feedback (which may come at an arbitrary later time) or my ability to review the comment and suggest improvements (same absence of timeline), and merge now. The risk of merging is that two marginally wrong/improvable comments stay around, but the overall effect of the patch is positive (we fix more such comments). The cost of delaying the merge is that this has to be rebased over and over again, and that everyone willing to understand the codebase today is reading the harder-to-understand version.

Here is a formal approval. I think we could merge now, but I'm waiting for at least @trefis opinion on the merge timing/strategy.

(I ignored the issue of buildability and make depend in my approval decision. If you rebase to restore the property, great, otherwise people will just have to do git bisect skip on this one.)

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Nov 30, 2017

Ok for merging, at least mark the comments as being obsolete.

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Nov 30, 2017

I'll do a proper last rebase in the next few days, squashing the few things that needs to be squashed and marking as obsolete the comment pointed out by Luc.
I'll update this PR once I'm done.

@trefis trefis force-pushed the trefis:cleanup-parmatch-revisited branch from 9c5b502 to 8096b22 Dec 1, 2017

trefis added a commit to trefis/ocaml that referenced this pull request Dec 1, 2017

parmatch: remove stale adhoc fix for printf.ml
This line was here to handle the case where "Rsome []" was being
returned, which seemed to happen (at least on OCaml 4.00) when matching
on values of abstract types (or with abstract parts).

This cannot happen anymore as all the sites returning "Rsome _" return a
non-empty list.
See ocaml#1488 (comment) for
a lengthier discussion on the matter.
@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Dec 1, 2017

This is now done. I've also update the description of the commit removing the line talking about printf to summarize the situation and link to this discussion.

@gasche: good to merge whenever.

@garrigue

This comment has been minimized.

Copy link
Contributor

garrigue commented Dec 4, 2017

@trefis
Concerning point 1 (the comment in exhaust), it should indeed be removed.
For GADTs we need a complete list of missing cases, so we cannot dispense with recursing.

Concerning point 2 (in check_unused), indeed the condition is now much more complex than just "there are no other lines". Your updated comment seems accurate.

For the discussion about the relation with a past bug inPrintf, I'm afraid I cannot comment. You should ask the other Jacques (Le Normand) @rathereasy , if he remembers...

@maranget

This comment has been minimized.

Copy link
Contributor

maranget commented Dec 4, 2017

Concerning point 1 (the comment in exhaust), it should indeed be removed. For GADTs we need a complete list of missing cases, so we cannot dispense with recursing.

Fine. Thanks Jacques.

For the discussion about the relation with a past bug inPrintf, I'm afraid I cannot comment. You should ask the other Jacques (Le Normand) @rathereasy , if he remembers...

We can now consider that the Printf comment and the if lst = [] then ... are leftover comment and code, and forget about them.

trefis added some commits Jul 11, 2017

parmatch: only guarded clauses needs to be tested after calling exhaust.
Previously, after exhaust had produced a pattern not matched by the matrix
formed of the unguarded clauses of the matching, we would try to match the
pattern with the matrix formed of the full matching.
This implies more work than necessary: we already know the unguarded clauses
don't match the pattern, so we only need to test it on the guarded ones.
parmatch: [build_specialized_submatrices] returns the default matrix …
…as well

This is a post merge cleanup. [build_specialized_submatrices] always builds the
default bmatrix as part of its work, but was previously only returning it when
there were no specialized matrices and when the [return_omega_group] parameter
was [true] (yes, this was ugly).
As it turns out, all the places calling this function also ended up building
the default matrix as a follow up.
The function now always returns the default matrix along with the (possibly
empty) list of specialized matrices.

As a consequence [build_default_matrix] became redundant and was removed.

Thanks to Gabriel for noticing this.
parmatch: [mark_partial] now works on "simplified" matrices
We avoid doing some redundant work in the process.
parmatch: more meaningful result type
The other option was to simply use [option].
parmatch: rename [satisfiables] to [list_satisfying_vectors]
[satisfiable] could be expressed in terms of [list_satisfying_vectors] as it
only cares about the existance of such a vector.
However
- it might have a non negligeable impact on performances
- satisfiable is a simple function that can be seen as a reference. There is
  value in keeping it.

@trefis trefis force-pushed the trefis:cleanup-parmatch-revisited branch from 8096b22 to 875e73b Dec 4, 2017

@trefis

This comment has been minimized.

Copy link
Contributor Author

trefis commented Dec 4, 2017

Thanks!

I rebased one last time to take into account the lasts comments. I'll merge once travis/appveyor confirm I haven't broken anything.

@trefis trefis merged commit 69ca9e2 into ocaml:trunk Dec 4, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

trefis added a commit that referenced this pull request Dec 4, 2017

parmatch: remove stale adhoc fix for printf.ml
This line was here to handle the case where "Rsome []" was being
returned, which seemed to happen (at least on OCaml 4.00) when matching
on values of abstract types (or with abstract parts).

This cannot happen anymore as all the sites returning "Rsome _" return a
non-empty list.
See #1488 (comment) for
a lengthier discussion on the matter.

@gasche gasche referenced this pull request Jul 24, 2018

Closed

Cleanup parmatch #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.