New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MPR#7822 by checking for existential overflow #1951

Closed
wants to merge 1 commit into
base: trunk
from

Conversation

Projects
None yet
3 participants
@garrigue
Contributor

garrigue commented Aug 1, 2018

@Octachron found a very nice bug, where he fools the exhaustiveness checker by going over the existential limit MPR#7822.

This PR fixes this by:

  • checking for existential overflow, both in Ctype.instance_constructor and Ctype.reify; it cause an explicit error, even when this appears during exhaustiveness/redundancy check, to avoid confusion.
  • resetting Ident.current_time on backtracking in type_pat.
  • limiting the total number of splits done during exhaustiveness/redundancy check.

It also fixes some logical bugs in type_cases. For instance, since exhaustiveness check is done after typing the right hand sides of pattern-matches, the Ident.current_time may already have gone over current_level, so one has to do the initialization again. To avoid advancing too much the counter, we also backtrack this initialization.

All seems correct, and the minichess example (pr7822.ml) now produces correct warnings, but of course it is pretty slow to compile (it reaches the splitting limit).
There could be some discussion as whether this limit should be uniform. Wouldn't it be more intuitive to stop searching after finding the first counter-example?

@trefis trefis self-assigned this Aug 1, 2018

@trefis

I believe the code is overall correct, that is: I believe the resetting of Ident's current time (which is the scary part of this PR) is indeed safe in the places where it is done.

However, I'm sad that this scary change is mixed with a bunch of other safe changes (checking for "existential overflow", limiting the number of or-pattern splitting).
I'd rather have in it a separate commit.
I'm also sad that we're doing that resetting at all, but I guess I can live with it for the moment.

Some of the changes are also not explained: why was total_pat_split introduced, and what is its relation to explode?

@@ -1131,6 +1133,9 @@ let instance_constructor ?in_pattern cstr =
let decl = new_declaration (Some expansion_scope) None in
let name = existential_name cstr existential in
let path = Path.Pident (Ident.create (get_new_abstract_name name)) in
let binding_time = Ident.current_time () in
(* be on the safe side for the begin_def's in type_cases *)
if binding_time >= !current_level - 5 then raise Too_many_existentials;

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor
  1. Why 5? I'm not sure adding more magic constants is really something we want to do. Especially not without documenting them.
  2. Not that it really matters, but it seems a weird to me that we'd check the current time after having created the ident.

This comment has been minimized.

@garrigue

garrigue Aug 2, 2018

Contributor

A cleaner approach would be to have the upper limit available at this point.
At first, I had written expansion_scope + 1000, but immediately saw that this would be wrong if we were to change the offset in type_cases (as you did minutes later).
5 is an upper bound of the number begin_def we may have in type_cases (hence the comment). The comment is here to make sure that this is checked if we change something in type_cases or type_pat. But maybe we need a comment in typecore.

This comment has been minimized.

@trefis

trefis Aug 2, 2018

Contributor

Understood, though that seems brittle.

I think the version expansion_scope + number_of_existential_per_match makes more sense to me.
That is: let's just define number_of_existential_per_match in Ctype and reuse it here and in type_cases.

I guess the remaining question is: do we want to check for >= before creating the ident, or for > after having created it.
I'd go for the former, what do you think?

@@ -1925,6 +1930,8 @@ let reify env t =
let name = match name with Some s -> "$'"^s | _ -> "$" in
let path = Path.Pident (Ident.create (get_new_abstract_name name)) in
let binding_time = Ident.current_time () in
(* be on the safe side for the begin_def's in type_cases *)
if binding_time > !current_level - 5 then raise Too_many_existentials;

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

Same comment as above, plus: if we're going to reuse magic constants, let's at least give them a name to make sure we use the same one everywhere it's needed?

@@ -1113,6 +1118,7 @@ and type_pat_aux ~exception_allowed ~constrs ~labels ~no_existentials ~mode
Parsetree.Ppat_or _ -> explode - 5

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

Isn't there some overlap between explode and total_pat_split here?

This comment has been minimized.

@garrigue

garrigue Aug 2, 2018

Contributor

explode limits the depth of the search tree, total_pat_split limits its overall size.
So total_pat_split is more directly connected to the time required, and works more like a timeout counter. Maybe we should have a warning when it is used.

This comment has been minimized.

@trefis

trefis Aug 2, 2018

Contributor

Thanks for the explanation.
I think it'd be nice to include it in the comment above type_pat.

@@ -1537,6 +1543,7 @@ let partial_pred ~lev ?mode ?explode env expected_ty constrs labels p =
let state = save_state env in
try
reset_pattern None true;
total_pat_split := 1000;

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

Again: why 1000? Is there any relation between this number and the number of existentials that are allowed to be introduced by a match?

This comment has been minimized.

@garrigue

garrigue Aug 2, 2018

Contributor

This is an arbitrary constant to make sure that search does not go on forever. There is no relation between this and existentials.
It was chosen based on the minichess example, but I'm not sure it's a good idea. In practice, it should not come into play that often.

This comment has been minimized.

@garrigue

garrigue Aug 2, 2018

Contributor

Concerning existentials, the backtracking of current_time means that we shall never overflow in practice, so the error message there is just an extra security.

This comment has been minimized.

@trefis

trefis Aug 2, 2018

Contributor

There is no relation between this and existentials.
It was chosen based on the minichess example, but I'm not sure it's a good idea.

Ack.
Do you think we could find a value that would allow to revert the diff in tests/typing-warnings/exhaustiveness.ml?

Concerning existentials, the backtracking of current_time means that we shall never overflow in practice, so the error message there is just an extra security.

I agree with that but I think it's a good idea too keep it (you never know what crazy code people might generate).

@@ -1062,6 +1066,7 @@ let all_idents_cases half_typed_cases =
exception Need_backtrack
let total_pat_split = ref 0

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

I'm wondering why this is a global ref instead of just another parameter of type_pat like we do with constrs and explode (which I believe are related to this ref).

Also, these two parameters are "documented", while total_pat_split isn't.

@@ -19,7 +19,7 @@ let f : type a b c d e f g.
a t * b t * c t * d t * e t * f t * g t * v
* (a,b,c,d) u * (e,f,g,g) u -> int =
function A, A, A, A, A, A, A, _, U, U -> 1
| _, _, _, _, _, _, _, G, _, _ -> 1
| A, A, A, _, _, _, _, G, _, _ -> 1

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

Why was that changed?

This comment has been minimized.

@garrigue

garrigue Aug 2, 2018

Contributor

Because with total_pat_split the original pattern can no longer be deemed redundant.

@@ -4196,20 +4205,23 @@ and type_cases ?exception_allowed ?in_function env ty_arg ty_res partial_flag
raise (Error (loc, env, No_value_clauses));
let partial =
if partial_flag then
check_partial ~lev env ty_arg_check loc val_cases
let state = save_state (ref env) in
let lev = if do_init then init_env () else get_current_level () in

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

I think it's worth adding a comment here explaining why we're calling init_env again. The explanation you gave in the PR description is fine, I'd just like to have it near the code.

@@ -53,6 +53,7 @@ val is_predef_exn: t -> bool
val binding_time: t -> int
val current_time: unit -> int
val set_current_time: int -> unit
val reset_current_time: int -> unit

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

Adding a comment here describing why this function exists and why it shouldn't be used (except in some very particular situations) seems needed.

This comment has been minimized.

@Octachron

Octachron Aug 1, 2018

Contributor

Maybe a more exotic and scary name is even warranted here: backtrack_time, rewind_time, time_travel ?

This comment has been minimized.

@trefis

trefis Aug 1, 2018

Contributor

Unconvinced by your particular propositions.
But if we want to do some renaming (which might be good indeed), I'd propose renaming set_current_time to advance_current_time (and reset_ to set_).

This comment has been minimized.

@garrigue

garrigue Aug 2, 2018

Contributor

@trefis: This was indeed my idea too (with the exact same names).
However it feels a bit dangerous to give the name of an existing function to a more dangerous one.

This comment has been minimized.

@trefis

trefis Aug 2, 2018

Contributor

However it feels a bit dangerous to give the name of an existing function to a more dangerous one.

Point taken.

@trefis

This comment has been minimized.

Contributor

trefis commented Aug 1, 2018

Wouldn't it be more intuitive to stop searching after finding the first counter-example?

I'm not sure what this comments refers to. Are you talking about what happens in Parmatch.check_unused?
That is: are you proposing to change the current list_satisfying_vectors |> orify_many |> pred (this is of course schematized) to find_satisfying_vector |> pred?

@garrigue

This comment has been minimized.

Contributor

garrigue commented Aug 2, 2018

Not changing the flow, but ensuring that we give a counter-example if we find one, rather than getting some kind of overflow. Note however that total_pat_split does that too.

@garrigue

This comment has been minimized.

Contributor

garrigue commented Aug 2, 2018

The inline comments not giving a coherent image, let me try to re-explain what I'm trying to do.
By the way, after more thought, this is not ready for inclusion yet. At least, the role of total_pat_split must be thought in more detail.

Concerning the various contents, let me explain why they are all needed. I could do separate commits (I'm not use to that, but I agree that here it would make sense), but I'd prefer not to have separate PRs.

  • The existential overflow check is the first natural step, we don't want that to happen, but it is also a dumb one: I choose to fail if it happens, because the behavior here is hard to explain. I checked that the test case pr7822.ml fails here. Actual both goals fail: without or with hint.
  • So to avoid the problem with existentials, we backtrack current_time. It is OK as long as no identifier leaks from the backtracked computation, which is the case for exhaustiveness/redundancy checks, which just return a syntactical pattern. However, this is hard to observe: compilation of pr7822.ml takes forever. Since this is a decidable example, it should eventually terminate, but I didn't have the patience to wait for that, and we cannot commit that to trunk.
  • Hence the introduction of total_pat_split, an artificial way to make the test case terminate. It is sound (we're just stopping the splitting earlier), but of course not complete.

Introducing a commit for the overflow check makes sense. For the backtracking, it is possible, but hard to test mechanically.

Now, what I'm planning to do for total_pat_split: currently it counts the number of wildcards exploded, which is not an accurate measure of the size of the search tree. It should actually count the number of branches in or-patterns, i.e. the total number of branches in the search tree, which should be in direct connection with the time required to check. Also, there could be two different limits, one if a solution has already been found, and we just abandon looking for other solutions, and another higher one if we're still searching for the first solution.

Last, for the overflow code in ctype.ml. Actually the two occurrences are so similar that I think they should be turned into a shared function. The "magic" 5 is not supposed to be an observable value, but just a way to be sure that we always detect the overflow.

@trefis

This comment has been minimized.

Contributor

trefis commented Aug 3, 2018

Thanks for summarizing the discussion Jacques!

Leaving aside the discussion about total_pat_split, which as said is fairly independent from the other two changes proposed in this GPR, I'd like to retract my earlier comment:

I believe the code is overall correct

For reasons that should be clear when reading MPR#7835 I'm actually not sure anymore that the backtracking of the current time is safe. I'm not a 100% confident, but it feels like I could create an example where to produce a counter example parmatch actually forces a signature substitution. In which case, ...

Secondly, the "too many existential check" doesn't actually check the number of existentials. As shown by this example.

As I said on the MPR, I think a proper solution (which I am working on) is to stop using stamps to track scopes. In which case the limit on the number of existential disappears (and so that part of the PR does as well), the raising by N thousands doesn't need to happen, so the resetting doesn't need to happen either, and there is no safety concern.

@garrigue

This comment has been minimized.

Contributor

garrigue commented Aug 3, 2018

Well, MPR#7835 contains a few overstatements, but thank you for pointing something I missed: lazy behavior in Env may cause the generation of identifiers that we shouldn't backtrack. And indeed it could probably be triggered by the exhaustiveness check (just leave a case that refers to a not yet loaded module for instance).
Of course this doesn't remove the need to check for overflow, but there seems to be no easy way out this time.
As for the fact we do not count only existentials, this is indeed true: not only identifiers could be used for other things, but reification also generates fresh identifiers, which need to be under the limit.
For the way to avoid the use of fresh identifiers there, could you please state what you have on your mind: I cannot discuss an idea without more details!
An idea I had myself since this problem appeared is that we could use the local constraints in the environment to create just a module anchor, and generate fresh types inside this module. Still need to find a clean way to use it for reification too (again a global reference...).

@garrigue

This comment has been minimized.

Contributor

garrigue commented Aug 4, 2018

By the way, this also means that we should be extremely careful of not having code in Env that uses unification, since it would invalidate Btype.backtrack too.

@garrigue

This comment has been minimized.

Contributor

garrigue commented Sep 8, 2018

Backtracking current_time is wrong, and tracking new types again was actually sufficient to solve the problem (#1997, merged in 4.07 branch).

@garrigue garrigue closed this Sep 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment