New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pattern Matching Clause Order and Forwarding #387
Comments
It's been too long, and I've forgotten a fair bit. I seem to recall that we had clear and desirable semantics for matching against a single value, and then I was just like "and then we can apply the standard compilation method for matching against multiple values" and figured that whatever semantics that resulted in was fine. As for
failing... I think this is actually a choice in semantics, not necessarily a bug. We could argue that if IIRC, I thought about worrying about this, but basically came back after I had an idea of what non-interference was all about, and that was basically "hey, pattern matching on forwarding productions is probably interfering anyway, so maybe it just doesn't matter that much." I don't know if that thought appeared in my thesis anywhere. My suggestion: the most important thing here is whether we have a consistent notion of completeness with our semantics. (I... did we ever implement a pattern matching completeness check? I can't recall!) Given such a thing, I think it's less that we're surprised into finding this pattern failing to match (it would report incomplete), and more that programmers might be surprised when it insists they need to handle another case in their I think being surprised they need to add some additional matching cases is much less of a concern compared to surprising runtime behavior. So maybe this surprising semantics isn't such a problem. Does that help, or have I missed something? I think it helps answer your earlier question? We probably don't want to have to deal with a notion of "more specific match" beyond our current "constructor/variable" distinction. And if we don't "backtrack" then we don't have to, I think? Hopefully? |
It seems to me that we should be more precise in what "matches better" means. But I think there will be cases in which two patterns match "equally well" - so how do we break such a tie? It is hard, in my mind, to argue against keeping the standard notion of "take the first pattern that matches." |
Compiling by dropping each successive constructor would result in a combinatorial explosion of matching levels, since we would need to successively drop each constructor. I'm not sure this is the wrong thing to do, though. It would make it more obvious for the user how their code should work. There are two things that might be considered problems with this combinatorial explosion:
We don't have a problem with this if we have a single match on a single level of constructors, which is probably the most common use case. The other large use is probably matching on two or more things at once (e.g. matching on Some of the explosion could be reduced by identifying non-forwarding productions and not putting backtracking cases for other productions when matching them, since matching a non-forwarding production would mean there was no forwarding production pattern which the tree could match (e.g. if We have an issue for checking incomplete matching (#317), but it hasn't been done yet. The obvious way to implement it would catch this case. However, I'm not sure a completeness check is going to fix the problem that pattern matching works in non-obvious ways. For example, if my second matching example has a default case I agree that we probably don't want to deal with a notion of better/worse/equal matches. I don't think there is a better way to do this than left precedence, as the current compilation does, even if that is slightly unintuitive, since I don't think there is an obvious definition of comparing matches in the general case. This behavior should be thoroughly documented. I think comparing matches is separate from backtracking, though. The idea with backtracking would be to find a match in the non-variable patterns, rather than to find a better match. Something we could do, which I don't think we want to do but would fit Eric's comment to "take the first pattern that matches", would be to check each pattern in order to determine if it is possible for it to match, taking the first one that is possible to match. Under this, the expression
would give the result |
My two cents on this - until now I thought the behavior was to just take the first pattern that could match, without any notion of a more specific match. So I generally agree with @ericvanwyk here - I don't find that behavior to be at all unintuitive. By analogy with attributes, one might argue that the case of two patterns that match at different levels of forwarding is like having an equation both on a forwarding production and on what it forwards to - in this case we would get the value of the equation on the forwarding production. But the fact that we care about the order of rules at all sort of invalidates this line of thinking - for example
should print Instead I think the appropriate analogy is that each pattern is like a seperate attribute of type In fact is there a good reason to not just implement pattern matching by translation into attributes in this way? This would let us ditch the whole "primitive match" extension. I suspect the performance wouldn't be too bad, however good error messages would be tricky and the MWDA may struggle to cope with such a large number of generated attributes. |
In our meeting today we decided that the correct behavior of Silver's pattern matching is to take the first pattern which can match, whether directly or by forwarding somewhere in the structure. This makes the match
be equivalent to
This makes the semantics be more in-line with what we see in functional languages like OCaml. Each pattern is compiled as if it is its own match under the current system. This requires changing Silver's compilation of We might also consider whether we can somehow warn about having forwarding patterns occur after non-forwarding patterns, since these would be useless. Some difficulty might arise with nested patterns. Completeness checking is fine as it is, since it only checks for non-forwarding productions and ignores forwarding productions. |
Consider the following Silver grammar:
The result of
matchB(b3(b2(), b2()))
is"b3 of b2, b1\n"
, which comes from matching the pattern in the second rule. We can see why we get this result by looking at (an expression equivalent to) the compiled version of this match:We first match
b3
, then we matchb2()
againstb1()
andb2()
. Because it matchesb2()
better, even though that is not the first pattern, we take that match. We then try to matchb2()
tob1()
, which succeeds by forwarding.In a way, it seems that we should take the first match rule in the original match above. While the first child of
b3
matches better in the second match rule, the second child doesn't match perfectly. This is the opposite of the first match rule, where the first child doesn't match perfectly but the second child does. In some sense, both are equally good matches, overlapping in matching behavior even though they don't overlap syntactically, and therefore the first one should be taken. If we had a rule with the patternb3(b2(), b2())
later in the patterns, we would want to take that because it matches the structure better than either of the other two.So it seems we want to determine when two patterns match "equally well" and take the first one. However, this gets really complicated with more things being matched. Suppose we are matching
a, b, c
against a couple of patternsp1a, p1b, p1c
andp2a, p2b, p2c
. Suppose we forward zero times to matchp1a
, zero times to matchp1b
, and two times to matchp1c
(forwarding vector, the number of times it forwards to match each successive match,(0, 0, 2)
) and we forward one time to matchp2a
, one time to matchp2b
, and zero times to matchp2c
(forwarding vector(1, 1, 0)
. Which pattern set matches the input better?p1a, p1b, p1c
matches better, because only one thing forwardsp2a, p2b, p2c
matches better, because we only forward at most once in any single matchThe current method of left precedence (whatever matches best furthest left) gives us a method of determining the best match without needing to consider the complexities of matching "equally well", but it can give us unintuitive results. I'm not sure if this should be considered as a bug or just a strange bit of Silver matching behavior that should be documented.
That's where I was going to end this. However, I tried a related match and found something that is at least a bit related which I'm more certain should be considered a bug. Perhaps this should go in its own issue, but I'll put it here to start.
Suppose we change our
matchB
function but leave the productions alone:What should be our result for
matchB(b3(b2(), b2()))
now? We have two match rules. The second one doesn't match, but the first one does match if we forward the first child, so our result should be"b3 of b1, b2\n"
. Silver begs to differ:Why does this fail? Let's look at how it compiles:
Once we have matched the
b2()
for the first child, we are committed to the first child being ab2()
, and we don't backtrack and consider that it might match another pattern as well. Thus we get acase
expression containing a matching pattern which gives a match failure error.The issue which causes this unintuitive behavior is that we implemented pattern matching based on Phil Wadler's chapter in The Implementation of Functional Programming Languages, where if an input matches a certain constructor, it cannot match a different constructor. Since we will match through forwarding, this assumption does not hold in our case.
So @tedinski and @ericvanwyk, have I found at least one bug? Is this expected behavior and I just need to read the functional-programming part of Ted's thesis to understand what is going on here? Even if there are no bugs here, we need to work on the documentation so it describes these cases.
The text was updated successfully, but these errors were encountered: