Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on ambiguous variables used in a when clause #7031

Closed
vicuna opened this issue Oct 27, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Oct 27, 2015

Original bug ID: 7031
Reporter: @gasche
Assigned to: @gasche
Status: resolved (set by @gasche on 2017-02-28T16:20:26Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.02.3
Fixed in version: 4.03.0
Category: middle end (typedtree to clambda)
Monitored by: @diml @yallop @hcarty @alainfrisch

Bug description

Martin Clochard, through the intermediary of Arthur Charguéraud,
reports a confusion arising from the following pattern-matching
clause:

( (Var x, t) | (t, Var x) ) when x = my_var -> ...

The problem is that the code author (and many readers of this code,
I checked) expect the value (Var not_my_var, Var my_var) to be matched
by this clause, while it is not.

Not matching is the expected semantics of "when" guards: they are
specified as first we match the pattern and then we check the
guard, and go to the next clause if the guard fails. On this example,
the pattern succeeds (in the first or-branch) and then the guard
fails. (The fact that or-pattern match left-to-right is explicitly
specified.)

This is however confusing, so we would like to warn in this situation
so that users are forced to think about this potential problem. In
a more general setting, we want to warn whenever the semantics of

| C[p | q] when test -> foo

is not equivalent to the distributive semantics, that is

| C[p] when test -> foo
| C[q] when test -> foo

(I'm using a pattern-context C[..] to account for the general case
where the or-pattern (p | q) may be deep inside the clause pattern.)

After discussing it with Luc and Damien, I propose the following
approximation criterion: fire the warning where both conditions below
are met:

[overlap]: there is a value that can match both sides of the
or-pattern (C[p] and C[q]). For example, if p and q start with
distinct head constructors, no confusion is possible.

[ambiguous variables]: the variables of (p | q) used by the guard
appear under different paths in p and q. For example,
((p1, Var x) | (Var x, q1)) when test x
should warn but
((p1, Var x, p2) | (q1, Var x, q2)) when test x
should not: in all values matching both p and q, the captured value
of x will be the same.

Additional information

Failed proposals:

  1. We initially considered just warning whenever an or-pattern with
    ambiguous variables (as defined above) is used. This is much too
    strong, as there is plenty of code that uses clauses like

| (Var x, t) | (t, Var x) -> subst x t prog

which is perfectly fine if an overlapping value (Var x1, Var x2) is
passed: the semantics is correct whichever branch is chosen.

Similarly, warning in all cases where or-pattern and when-guards are
combined was perceived as too strong.

  1. We also proposed removing the [ambiguous variables] condition. At
    the time or working example for why this criterion should be added was
    that the following code should not warn:

((Var x, p) | (Var x, q)) when test x

and we noted that it would be fine to warn in this case, as this would
encourage the user to make this path-sharing explicit by rewriting the
code as:

(Var x, (p | q)) when test x

Unfortunately, this transformation cannot be applied in general, see
the above example

((p1, Var x, p2) | (q1, Var x, q2)) when test x

which cannot be factorized in a semantics-preserving way.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 12, 2015

Comment author: @gasche

I was just thinking of this bug yesterday, and had a small idea for a neat implementation of the [ambiguous variable] criterion -- because it is actually not trivial to decide it.

We are trying to know whether the shared variables of (p | q) appear in the same place in both patterns. The "natural" idea is to recurse on p and q, collect the path from the root along the way, and return a map from variables to a list of reaching paths. But that seems a bit costly, and defining the type of paths will be a pain.

The idea is as follows: implement a function (intersection : pattern -> pattern -> pattern), such as (intersection p q) is the most precise pattern matching all values of both p and q. This is rather easy to define:

intersection x x = x
intersection x y = _
intersection (K p1 ... pn) (K q1 .. qn) = K (inter p1 q1) ... (inter pn qn)
intersection (K p1 ... pn) (L q1 ... qm) = _
intersection (p | q) = intersection p q

Then it suffices to check that all variables of (p | q) appear in (intersection p q). Those that do not appear there are ambiguous in the sense of [ambiguous variables].

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 12, 2015

Comment author: @gasche

In fact (intersection) is also useful to define the [overlap] criterion: we need only warn if (intersection p q) is not _ (if it is _, then [overlap] does not hold) yet misses some shared variables.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 12, 2015

Comment author: @alainfrisch

intersection (p | q) = intersection p q

I guess you meant something else here.

(intersection p q) is the most precise pattern matching all values of both p and q

I'm not sure to understand the definition. Cannot we define a function that satisfy this property as:

intersection p q = p | q

?
(which is quite useless)

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 12, 2015

Comment author: @gasche

Indeed, I meant a disjunction-free pattern. It returns the shared constructor tree prefix. For our original example,

intersection (Var x, t) (t, Var x) = (_, _)

(note that if we knew that Var is the only constructor, then an equally valid result would be (Var _, Var _), but I don't think that affects the ambiguity criterion derived from computing the intersection.)

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 28, 2015

Comment author: @gasche

I implemented the suggested implementation in

#317

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 28, 2017

Comment author: @damiendoligez

See also:
#317 - #317
#338 - #338

Since #317 was closed and #338 was merged, can we close this PR? @gasche

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 28, 2017

Comment author: @damiendoligez

Faut-il fermer cette PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.