Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ML-style or-patterns #371

Open
DanielFabian opened this issue Mar 27, 2021 · 5 comments
Open

Add support for ML-style or-patterns #371

DanielFabian opened this issue Mar 27, 2021 · 5 comments
Labels
enhancement New feature or request low priority

Comments

@DanielFabian
Copy link
Contributor

Currently Lean's pattern matching doesn't have ML-style or-patterns leading to somewhat repetitive of this kind:

def balanceLB {h c} : almostNode h → Int → rbnode h c → hiddenTree (h + 1) 
| LR (R a x b) y c, z, d => HR (R (B a x b) y (B c z d))
| RR a x (R b y c), z, d => HR (R (B a x b) y (B c z d))
| V axb, z, d => HB (B axb z d)

You'll notice that the RHS of the first two cases are exactly identical. I propose, we support ML-style or-patterns to allow more compact code like this:

def balanceLB {h c} : almostNode h → Int → rbnode h c → hiddenTree (h + 1) 
| LR (R a x b) y c, z, d 
| RR a x (R b y c), z, d => HR (R (B a x b) y (B c z d))
| V axb, z, d => HB (B axb z d)

The following is a proposed spec for the feature:

Currently, the Lean parser always expects a => after the LHS of a pattern match. As such there would be no ambiguity adding or-patterns by leaving out the RHS for certain patterns.

I propose, we allow either the => or a further pattern after a LHS has been completed, i.e. the correct number of arguments has been matched.

Several patterns are compatible if the following rules hold:

  • The set of matched variables is the same. In the sense that the same names are bound, in arbitrary order.
  • Bound constants do not count as variables and can be different between patterns
  • wildcard patterns do not count as variables and can be different between patterns
  • Each variable that occurs in shared patterns has to have the same type. Here "same" means definitionally equal.

This will allow us to write the following:

def fib : Nat → Nat
| 0 => 1
| 1 => 1
| (n+2) => fib n + fib (n+1)

def fib2 (n : Nat) : Nat :=
match n with
| 0 => 1
| 1 => 1
| (n+2) => fib2 n + fib2 (n+1)

on the other hand the following is forbidden:

inductive strOrNum
| S (s : String)
| I (i : Int)
open strOrNum

def ex1 := 
  match S "test" with
  | I a | S a => 3

It is forbidden, because the types of the variables a are not definitionally equal. On the other hand the following is allowed:

def ex1 := 
  match S "test" with
  | I _ | S _ => 3

Semantically, the left out pattern takes on exactly the term that's the RHS of the last merged pattern:

def fib : Nat → Nat
| 0
| 1 => 1
| (n+2) => fib n + fib (n+1)

is the same as:

def fib : Nat → Nat
| 0 => 1
| 1 => 1
| (n+2) => fib n + fib (n+1)

For pretty-printing purposes, I propose sticking to whatever was written in the source code, i.e. merge the patterns that are merged in source code and keep the ones separate that were separate.


For proof and computational purposes, the two syntaxes should be interchangeable:

(match n with
| 0 | 1 => 1
| (n+2) => fib n + fib (n+1)) = 
(match n with
| 0 => 1
| 1 => 1
| (n+2) => fib n + fib (n+1))

as such the aforementioned equality should ideally hold definitionally. This could be achieved by the former reducing to the latter.


For equation lemmas that we are going to have anyway for the match expressions, the or-patterns can have additional lemmas:

Whilst the normal match expression has lemmas for each case asserting the previous cases weren't taken. The or-patterns have theses premises combined in a disjunction.

This new equation lemma is potentially useful in proofs where one might want to treat the RHS uniformly.

I feel fairly strongly about this one, since it may well allow us to do a bunch of operations on an RHS without having to do a cases followed by applying the same exact chain of tactics many times. Consider this example:

def balance : color → node → Nat → node → node
| color.black, node.tree color.red (node.tree color.red a x b) y c, z, d
| color.black, node.tree color.red a x (node.tree color.red b y c), z, d
| color.black, a, x, node.tree color.red (node.tree color.red b y c) z d
| color.black, a, x, node.tree color.red b y (node.tree color.red c z d) => 
  node.tree color.red (node.tree color.black a x b) y (node.tree color.black c z d)
| color, a, x, b => node.tree color a x b

if we have some kind of lemma that is useful for massaging the RHS, we can uniformly rewrite the RHS for all the cases in one go and only use the premises later in the proof. At which point, we can possibly weaken the premises and unify them into fewer cases that we have to do actual work about.


Feel free to poke holes at the initial spec, I'm happy to do more homework and extend it as needed.

@leodemoura
Copy link
Member

Basic support can be implemented as a macro, i.e., a Syntax to Syntax transformation.
That is, we would expand

def fib : Nat → Nat
| 0 | 1 => 1
| (n+2) => fib n + fib (n+1)

into

def fib : Nat → Nat
| 0 => 1
| 1 => 1
| (n+2) => fib n + fib (n+1)

If we use macros, then we can also support

inductive strOrNum
| S (s : String)
| I (i : Int)
open strOrNum

def ex1 := 
  match S "test" with
  | I a | S a => toString a

The RHS would be elaborated twice. Once in a context where a : String, and another a : Int.
One issue here is the IDE integration, when we hover over a, I think the server would return information for the first alternative.
Error messages may also be confusing. We may get errors for both alternatives. For example, suppose we write

def ex1 := 
  match S "test" with
  | I a | S a => not a

We would get the type mismatch message twice.

Regarding pretty-printing, we can accomplish it by adding annotations as we do for inaccessible terms and x@ patterns.
This is compatible with the feature as a macro approach.

Several patterns are compatible if the following rules hold:

If we decide to enforce these rules, then the macro approach is not sufficient, and we would have to modify the match elaborator which is already very complicated. I don't think it would be feasible to pursue this option in the near future.

Whilst the normal match expression has lemmas for each case asserting the previous cases weren't taken. The or-patterns have these premises combined in a disjunction.

To be able to support this feature, we would have to enforce the rules you sketched above.

Feel free to poke holes at the initial spec, I'm happy to do more homework and extend it as needed.

I am wondering whether ML languages put additional "bells and whistles" on top of this feature.

@DanielFabian
Copy link
Contributor Author

DanielFabian commented Mar 28, 2021

ML languages typically do enforce the same types across variants, double checked with F# and Haskell prior to making a proposal and it's also consistent with my memory. And when I was thinking about it as a macro with double elaboration, and different types, then the IDE problem immediately shows up.

Whilst it's technically strictly more expressive as a pure macro expansion, I'm fairly confident it's less useful. Because your proof state would get jumbled or you would have to pick one.

At that point it's a more succinct syntax, yes, but you can't use it for proofs naturally.

Additionally, I conjecture that case where an RHS is syntactically the same term but with different types doesn't occur much if at all in practise. The only times where the RHS terms would be the same is applications of generic functions and compositions thereof, like your toString example above. That said, I don't remember seeing code of the kind that would collapse under this scenario ever written. I'm sure it's possible to come up with examples, but they'd be pretty rare.


If we decide to enforce these rules, then the macro approach is not sufficient, and we would have to modify the match elaborator which is already very complicated. I don't think it would be feasible to pursue this option in the near future.

I wasn't thinking about this feature in terms of a macro precisely because of enforcing some additional rules as well as the IDE. And also I'm thinking of it as a means to make proofs more powerful.

One big problem we have if we go down the macro-only route is that we cannot easily fix it with a more elaborate strategy further down the line. That is to say, as we all know, removing features from a language is a very unpopular move. And pattern matching in particular is such a central component of the language that we should take it slow and get it right before releasing.

We can add the macro to the documentation as an example for something that can be done using macros but I don't think it's a great candidate to build the language feature.

I was thinking of the feature more in terms of downcompilation of the following kind:

given the generated matcher with major premise S and minor premises A a b, B a b, C c (hopefully I'm using your terminology correctly). Here matcherFunc stands for the current construction of a match expression.

let matcherFunc : S -> T
| A a b => f a b
| B a b => f a b
| C c => g c

And given a match expression with or patterns

match x with
| A a b | B a b => f a b
| C c => g c

translates to:

match x with
| A a b => matcherFunc (A a b)
| B a b => matcherFunc (B a b)
| C c => matcherFunc (C c)

this way, we get type checking due to the type checking already done in the matcherFunc, the term size doesn't explode exponentially. We get the helper lemmas from matcherFunc and just need to do the disjunction and call of whatever the premise is in matcherFunc.

I am not perfectly sure, if we get the type compatibility (def eq) between the various a and b variables for free, but we can enforce that at generation time if need be.


I am wondering whether ML languages put additional "bells and whistles" on top of this feature.

Extensions to basic pattern matching include:

  • and patterns: they are a dual to or patterns and require the input to match multiple patterns at the same time. E.g. (var1, var2) & (0, _)
  • as patterns: they allow you to name sub expressions. E.g. (var1, var2) as tuple1
  • guard or variable patterns: they allow a pattern to match only if a predicate is true. Would in our case have to be a decidable predicate. E.g. a, b when a < b.
  • view or active patterns: they allow you to execute some computation and pattern match the result. E.g. RGB(r, g, b) where RGB is a function color -> (nat x nat x nat).

this list is taken from F#, similar extensions exist for haskell.
https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching

Of the aforementioned my guess as for the rough order of usefulness / prevalence is something like:

  • or patterns: really quite common and fairly clear semantics
  • guards: would be useful, but there's an interesting interaction between guards and exhaustiveness checking that would need some careful thinking.
  • as patterns: really quite simple. They just make things a tiny bit more compact.
  • views: very powerful since they make pattern matching effectively a plug-in thing. But also quite niche
  • and patterns: I haven't come across that much at all.

And finally, if doing an extension like this is something that there's at least interest in, I'd be very happy to spend (even considerable) time on it. I'd probably need a bit of support getting started, though.

What I don't want to do, is spend time on it, make some choices that are not compatible with your goals, make a pr and have it be rejected. That's why I'm reaching out ahead-of-time.

@leodemoura
Copy link
Member

Thanks! Your message is very useful. I agree with the points you made. I have only one technical issue that is not clear to me. I will describe it below.

What I don't want to do, is spend time on it, make some choices that are not compatible with your goals, make a pr and have it be rejected. That's why I'm reaching out ahead-of-time.

It makes sense. I am going to create a "roadmap" topic at the lean4-dev channel. It will be useful to decide when it is a good time to add new features.

I'd be very happy to spend (even considerable) time on it.

This is great. My main concern is that I think implementing this feature (as an elaborator) is a tough project for new contributors. It would require a considerable amount of synchronization with us. The current implementation is already very complicated, and as you know, we have recently fixed a few bugs there. It is also incomplete. We are not generating equations nor the elimination principles for writing proofs.

I wasn't thinking about this feature in terms of a macro precisely because of enforcing some additional rules as well as the IDE.

Makes sense.

And also I'm thinking of it as a means to make proofs more powerful.

I think this one is tricky. Lean currently generates an auxiliary function for a sequence of match left-hand-sides. The function doesn't depend on the right-hand-sides. The function depends only on the patterns, and it supports dependent elimination. For example, suppose we have the patterns

| [a, b, c] => ...
| a::as => ...
| _ => ...

we generate a function with type

  {α : Type u_1} →   
  (motive : List α → Sort u_2) →
  (x : List α) →
  ((a b c : α) → motive [a, b, c]) →
  ((a : α) → (as : List α) → motive (a :: as)) → 
  ((x : List α) → motive x) 
  → motive x :=

The function always has 4 different sections.
1- Auxiliary parameters {α : Type u_1}
2- A "motive" (motive : List α → Sort u_2) . When using the function, we set the result type we want to generate. The type may depend on the discriminant.
3- Discriminants (x : List α). In this example, we have only one. The arity of the motive is equal to the number of discriminants.
4- One argument for each alternative. Note that, each alternative produces the motive.

We are using this design in Lean4 because we can isolate the dependent pattern matching compilation from the RHS code. It prevents accidental code explosion, allows us to pretty print match-expressions back to the user, and allows us to "update" the motive. We currently update the motive in the module responsible for showing functions terminate. In Lean 4, we elaborate functions as other functional programming languages, and we have a separate module for showing whether the function terminates or not. This module traverses the function definition, and often "updates" the motive argument in the auxiliary functions created for each match expression.

For example, if we use the command

set_option pp.all true
#print fib

we obtain

   ...
   fib.match_1.{1} (fun (x : Nat) => @Nat.below.{1} (fun (x : Nat) => Nat) x → Nat) x
   ...

The Nat.below argument was introduced by the module that shows termination using structural recursion. My point here is that even for non-dependent functions such as fib, our frontend will explore the fact that the auxiliary generated function fib.match_1 performs dependent elimination. Note that Nat.below depends on the discriminant x here.

I think it would be quite hard to reflect the or patterns in these auxiliary functions because the motive depends on the discriminants. Even if we managed to do it, we would also have to update other modules such as the one that shows termination by structural recursion.

I agree with you that the macro approach has shortcomings (e.g., IDE integration) and that enforcing the rules you have described is the ideal long-term solution.

Regarding the match feature list, I think the main missing features for Lean are: guards and or patterns. We already support as patterns in Lean. They are written as xs@(a::b::as). We also have basic support for views. We can mark functions with the attribute @[matchPattern]. We use it for +. This is why we can write x+1.

Feasibility/Timing

  • The macro-based approach could be implemented now. However, I agree with you it has shortcomings and it is not worth using it as a permanent solution. As you pointed out, it is not ideal even as a temporary solution.

  • Elaboration-based approach, but duplicating RHS for or patterns. It addresses the IDE issues and allows us to enforce the rules you described above. It is compatible with the current design, but will not help us reduce the number of cases in proofs. Right now, it would be hard to develop this feature. It affects a module that is complicated and has missing features (e.g., equations and induction principles).

  • Elaboration-based approach without RHS for or patterns. I think this is infeasible to implement.

@DanielFabian
Copy link
Contributor Author

DanielFabian commented Mar 28, 2021

Thanks for the detailed reply. And I agree also regarding the feasibility and timing. In light of that, I think it best to park the issue until the equations and induction principles are complete.

Just one question of understanding. When I said to make it useful for proofs, I meant this:

let matchFunc
| A a b => f a b
| B a b => f a b
| C c => g c

dependent or not gives rise to several lemmas:

def lemma1 : ∀ x a b, x = A a b → matchFunc x = f a b 
def lemma2 : ∀ x a b, (∀ a b, x ≠ A a b) → x = B a b → matchFunc x = f a b 
def lemma3 : ∀ x c, (∀ a b, x ≠ A a b) → (∀ a b, x ≠ B a b) → x = C c → matchFunc x = g c

this should be provable by a bunch of distinction by cases on the structure of the RHS term. There isn't much clever going on, just raw unfolding.

What I was merely proposing was adding a further lemma for the or pattern:

def lemma4 : ∀ x a b, (x = A a b) ∨ ((∀ a b, x ≠ A a b) ∧ x = B a b) → matchFunc x = f a b

this last one holds purely by propositional reasoning without caring about the RHS at all.

Of course this lemma is useless in general, but it allows us to massage the RHS of this lemma, i.e. all RHS's uniformly at once.

I was not suggesting that it lowers the number of cases in general, but it still might occasionally be useful and it's cheap to prove if we already have the lemmas 1-3 proven. I hope that makes sense.

The kind of case where I see this as useful is when doing an apply where you might apply this lemma and get the disjunction into the context.

@JasonGross
Copy link

I am wondering whether ML languages put additional "bells and whistles" on top of this feature.

There's another feature (perhaps two) present in Coq and OCaml not mentioned here: the ability to put alternatives as subpatterns. For example, I can write

match foo with
| bar (baz a | qux a) => ...
end

This can be combined with as clauses in interesting ways to deal with the "same set of variables" requirements

match foo with
| bar (baz a | _ as a) => ...
| qux (succ (succ n | zero as n)) => ...
end

@leodemoura leodemoura added enhancement New feature or request low priority labels Apr 14, 2021
leodemoura added a commit that referenced this issue Mar 20, 2022
see #371

This commit does not implement all features discussed in this issue.
It has implemented it as a macro expansion. Thus, the following is
accepted
```lean
inductive StrOrNum where
  | S (s : String)
  | I (i : Int)

def StrOrNum.asString (x : StrOrNum) :=
  match x with
  | I a | S a => toString a
```
It may confuse the Lean LSP server. The `a` on `toString` shows the
information for the first alternative after expansion (i.e., `a` is an `Int`).
After expansion, we have
```
def StrOrNum.asString (x : StrOrNum) :=
  match x with
  | I a => toString a
  | S a => toString a
```
ChrisHughes24 pushed a commit to ChrisHughes24/lean4 that referenced this issue Dec 2, 2022
After leanprover#137, `length_tail` and `length_drop` live in `Init/Data/List/Lemmas`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low priority
Projects
None yet
Development

No branches or pull requests

3 participants