Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern Matching Clause Order and Forwarding #387

Closed
RandomActsOfGrammar opened this issue Nov 18, 2020 · 5 comments · Fixed by #600
Closed

Pattern Matching Clause Order and Forwarding #387

RandomActsOfGrammar opened this issue Nov 18, 2020 · 5 comments · Fixed by #600
Assignees
Labels
simplify-semantics Issues whose resolution would make Silver's semantics simpler.
Projects

Comments

@RandomActsOfGrammar
Copy link
Contributor

Consider the following Silver grammar:

nonterminal B;

abstract production b1
top::B ::=
{ }

abstract production b2
top::B ::=
{ forwards to b1(); }

abstract production b3
top::B ::= x::B y::B
{ }

function matchB
String ::= b::B
{
  return case b of
         | b3(b1(), b2()) -> "b3 of b1, b2\n"
         | b3(b2(), b1()) -> "b3 of b2, b1\n"
         | _ -> "other\n"
         end;
}

The result of matchB(b3(b2(), b2())) is "b3 of b2, b1\n", which comes from matching the pattern in the second rule. We can see why we get this result by looking at (an expression equivalent to) the compiled version of this match:

case b of
| b3(a, b) -> case a of
              | b1() -> case b of
                        | b2() -> "b3 of b1, b2\n"
                        | _ -> "other\n"
                        end
              | b2() -> case b of
                        | b1() -> "b3 of b2, b1\n"
                        | _ -> "other\n"
                        end
              | _ -> "other\n"
              end
| _ -> "other\n"
end

We first match b3, then we match b2() against b1() and b2(). Because it matches b2() better, even though that is not the first pattern, we take that match. We then try to match b2() to b1(), which succeeds by forwarding.

In a way, it seems that we should take the first match rule in the original match above. While the first child of b3 matches better in the second match rule, the second child doesn't match perfectly. This is the opposite of the first match rule, where the first child doesn't match perfectly but the second child does. In some sense, both are equally good matches, overlapping in matching behavior even though they don't overlap syntactically, and therefore the first one should be taken. If we had a rule with the pattern b3(b2(), b2()) later in the patterns, we would want to take that because it matches the structure better than either of the other two.

So it seems we want to determine when two patterns match "equally well" and take the first one. However, this gets really complicated with more things being matched. Suppose we are matching a, b, c against a couple of patterns p1a, p1b, p1c and p2a, p2b, p2c. Suppose we forward zero times to match p1a, zero times to match p1b, and two times to match p1c (forwarding vector, the number of times it forwards to match each successive match, (0, 0, 2)) and we forward one time to match p2a, one time to match p2b, and zero times to match p2c (forwarding vector (1, 1, 0). Which pattern set matches the input better?

  • p1a, p1b, p1c matches better, because only one thing forwards
  • p2a, p2b, p2c matches better, because we only forward at most once in any single match

The current method of left precedence (whatever matches best furthest left) gives us a method of determining the best match without needing to consider the complexities of matching "equally well", but it can give us unintuitive results. I'm not sure if this should be considered as a bug or just a strange bit of Silver matching behavior that should be documented.


That's where I was going to end this. However, I tried a related match and found something that is at least a bit related which I'm more certain should be considered a bug. Perhaps this should go in its own issue, but I'll put it here to start.

Suppose we change our matchB function but leave the productions alone:

function matchB
String ::= b::B
{
  return case b of
         | b3(b1(), b2()) -> "b3 of b1, b2\n"
         | b3(b2(), b3(_, _)) -> "b3 of b2, b3\n"
         end;
}

What should be our result for matchB(b3(b2(), b2())) now? We have two match rules. The second one doesn't match, but the first one does match if we forward the first child, so our result should be "b3 of b1, b2\n". Silver begs to differ:

Error: pattern match failed at temp Temp.sv:118:9

An error occured.  Silver stack trace follows. (To see full traces including java elements, SILVERTRACE=1)

(DN.315): While evaling syn 'core:io' in 'core:ioval' (dd3b207)
(core.Pprint in Pprint.java:102): Error while evaluating function core:print
(temp.PmatchB in PmatchB.java:105): Error while evaluating function temp:matchB
(core.Perror in Perror.java:93): Error while evaluating function core:error
(U.70): Error: pattern match failed at temp Temp.sv:118:9

Why does this fail? Let's look at how it compiles:

case b of
| b3(a, b) -> case a of
              | b1() -> case b of
                        | b2() -> "b3 of b1, b2\n"
                        end
              | b2() -> case b of
                        | b3(_, _) -> "b3 of b2, b3\n"
                        end
              end
end

Once we have matched the b2() for the first child, we are committed to the first child being a b2(), and we don't backtrack and consider that it might match another pattern as well. Thus we get a case expression containing a matching pattern which gives a match failure error.

The issue which causes this unintuitive behavior is that we implemented pattern matching based on Phil Wadler's chapter in The Implementation of Functional Programming Languages, where if an input matches a certain constructor, it cannot match a different constructor. Since we will match through forwarding, this assumption does not hold in our case.


So @tedinski and @ericvanwyk, have I found at least one bug? Is this expected behavior and I just need to read the functional-programming part of Ted's thesis to understand what is going on here? Even if there are no bugs here, we need to work on the documentation so it describes these cases.

@tedinski
Copy link
Member

It's been too long, and I've forgotten a fair bit. I seem to recall that we had clear and desirable semantics for matching against a single value, and then I was just like "and then we can apply the standard compilation method for matching against multiple values" and figured that whatever semantics that resulted in was fine.

As for

         | b3(b1(), b2()) -> "b3 of b1, b2\n"
         | b3(b2(), b3(_, _)) -> "b3 of b2, b3\n"

failing... I think this is actually a choice in semantics, not necessarily a bug. We could argue that if b2 fails, we should go back and try b1, but I maaayyybbee vaguely recall deciding this was a losing battle? e.g. If match on that fails, you'd have to rematch on all patterns EXCEPT that... which might also lead to a combinatoric explosion in exclusions? Because b2 might forward to b4 which might forward to b5... so how you repeat the matches after failing is not a simple thing. You're generating code for "b2 failed so now lets try without" and "b2 and b4 failed, so not let's trying without" and "b4 failed, so now let's trying without" and so on for... a lot. I think you'd have to end up with every combination of including/excluding a forwarding production.

IIRC, I thought about worrying about this, but basically came back after I had an idea of what non-interference was all about, and that was basically "hey, pattern matching on forwarding productions is probably interfering anyway, so maybe it just doesn't matter that much." I don't know if that thought appeared in my thesis anywhere.

My suggestion: the most important thing here is whether we have a consistent notion of completeness with our semantics. (I... did we ever implement a pattern matching completeness check? I can't recall!) Given such a thing, I think it's less that we're surprised into finding this pattern failing to match (it would report incomplete), and more that programmers might be surprised when it insists they need to handle another case in their b3(b2(), _) branch. (Then add a little note in the documentation explaining that matching on a forwarding production does not "backtrack" in case of failure...)

I think being surprised they need to add some additional matching cases is much less of a concern compared to surprising runtime behavior. So maybe this surprising semantics isn't such a problem.

Does that help, or have I missed something?

I think it helps answer your earlier question? We probably don't want to have to deal with a notion of "more specific match" beyond our current "constructor/variable" distinction. And if we don't "backtrack" then we don't have to, I think? Hopefully?

@ericvanwyk
Copy link
Contributor

It seems to me that we should be more precise in what "matches better" means. But I think there will be cases in which two patterns match "equally well" - so how do we break such a tie? It is hard, in my mind, to argue against keeping the standard notion of "take the first pattern that matches."

@RandomActsOfGrammar
Copy link
Contributor Author

Compiling by dropping each successive constructor would result in a combinatorial explosion of matching levels, since we would need to successively drop each constructor. I'm not sure this is the wrong thing to do, though. It would make it more obvious for the user how their code should work. There are two things that might be considered problems with this combinatorial explosion:

  • It generates a large body of code: I'm not sure this is really bad, since no one needs to look at the generated code.
  • It could lead to a slowdown in execution: I'm not sure "obvious and slow" is worse than "obscure and fast".

We don't have a problem with this if we have a single match on a single level of constructors, which is probably the most common use case. The other large use is probably matching on two or more things at once (e.g. matching on a, b), which would lead to this combinatorial explosion if we are matching on two productions at a time instead of a production and a variable.

Some of the explosion could be reduced by identifying non-forwarding productions and not putting backtracking cases for other productions when matching them, since matching a non-forwarding production would mean there was no forwarding production pattern which the tree could match (e.g. if t match just, there is no other pattern in the match that t could have matched, otherwise it wouldn't have been/forwarded to just). Their bodies would still need to include any backtracking from matches before that match.

We have an issue for checking incomplete matching (#317), but it hasn't been done yet. The obvious way to implement it would catch this case. However, I'm not sure a completeness check is going to fix the problem that pattern matching works in non-obvious ways. For example, if my second matching example has a default case _ -> "other\n" as the first one does, the match is complete and we don't get an error when we run it, but we also don't get the actual pattern which matches.

I agree that we probably don't want to deal with a notion of better/worse/equal matches. I don't think there is a better way to do this than left precedence, as the current compilation does, even if that is slightly unintuitive, since I don't think there is an obvious definition of comparing matches in the general case. This behavior should be thoroughly documented.

I think comparing matches is separate from backtracking, though. The idea with backtracking would be to find a match in the non-variable patterns, rather than to find a better match.

Something we could do, which I don't think we want to do but would fit Eric's comment to "take the first pattern that matches", would be to check each pattern in order to determine if it is possible for it to match, taking the first one that is possible to match. Under this, the expression

case b3(b2(), b2()) of
| b3(b1(), b2()) -> "first"
| b3(b2(), b2()) -> "second"
end

would give the result "first" because b3(b2(), b2()) can be forwarded in a subtree to match the first pattern. This is probably at least as unintuitive as anything else we have discussed, if not more so. If we had a good static analysis for determining when patterns were out-of-order (I'm not sure we can do that with extensibility), we could take this definition and give an error for unreachable match rules when exact patterns would be caught by earlier rules, as we see in this example.

@krame505
Copy link
Member

My two cents on this - until now I thought the behavior was to just take the first pattern that could match, without any notion of a more specific match. So I generally agree with @ericvanwyk here - I don't find that behavior to be at all unintuitive.

By analogy with attributes, one might argue that the case of two patterns that match at different levels of forwarding is like having an equation both on a forwarding production and on what it forwards to - in this case we would get the value of the equation on the forwarding production. But the fact that we care about the order of rules at all sort of invalidates this line of thinking - for example

case b1() of
| _ -> "first"
| b1() -> "second"
end

should print first even though this is like having a default equation and an equation on b1 for an attribute, in which case would get the non-default equation value.

Instead I think the appropriate analogy is that each pattern is like a seperate attribute of type Maybe that has a just equation value corresponding to the specified pattern and nothing() as a default. The top-level case is like a big orElse of all the Maybe values that takes the first one to succeed. I think that this interpretation basically corresponds to the semantics that @ericvanwyk has in mind.

In fact is there a good reason to not just implement pattern matching by translation into attributes in this way? This would let us ditch the whole "primitive match" extension. I suspect the performance wouldn't be too bad, however good error messages would be tricky and the MWDA may struggle to cope with such a large number of generated attributes.

@RandomActsOfGrammar
Copy link
Contributor Author

In our meeting today we decided that the correct behavior of Silver's pattern matching is to take the first pattern which can match, whether directly or by forwarding somewhere in the structure. This makes the match

case e of
| p1 -> e1
| p2 -> e2
| p3 -> e3
end

be equivalent to

let x = e in
   case x of
   | p1 -> e1
   | _ -> case x of
          | p2 -> e2
          | _ -> case x of
                 | p3 -> e3
                 end
          end
   end
end

This makes the semantics be more in-line with what we see in functional languages like OCaml. Each pattern is compiled as if it is its own match under the current system.

This requires changing Silver's compilation of case expressions to work in order of the patterns. Primitive matching can remain as it is; the compilation of case expressions is the only thing that needs to be changed. We didn't decide what the most efficient implementation of this would be; a standard compilation would result in an explosion in generated code. We could instead turn the return type of the primitive matches into which it compiles into Maybe<a> and orelse them together, taking out of a just at the end.

We might also consider whether we can somehow warn about having forwarding patterns occur after non-forwarding patterns, since these would be useless. Some difficulty might arise with nested patterns.

Completeness checking is fine as it is, since it only checks for non-forwarding productions and ignores forwarding productions.

@krame505 krame505 added this to To do - uncategorized in Hackathon via automation Sep 8, 2021
@remexre remexre added good-first-issue simplify-semantics Issues whose resolution would make Silver's semantics simpler. and removed good-first-issue labels Sep 16, 2021
@krame505 krame505 moved this from To do - uncategorized to To do - standalone in Hackathon Sep 22, 2021
@krame505 krame505 moved this from To do - standalone to To do - larger projects in Hackathon Sep 22, 2021
@krame505 krame505 moved this from To do - larger projects to In progress in Hackathon Oct 8, 2021
Hackathon automation moved this from In progress to Done - January 2022 Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
simplify-semantics Issues whose resolution would make Silver's semantics simpler.
Projects
Hackathon
Done - January 2022
Development

Successfully merging a pull request may close this issue.

5 participants