Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Or patterns #43

Closed
wants to merge 61 commits into from
Closed

Or patterns #43

wants to merge 61 commits into from

Conversation

osa1
Copy link
Contributor

@osa1 osa1 commented Jan 28, 2017

@cocreature
Copy link

cocreature commented Jan 28, 2017

I really like this idea and often wanted to have something like this.

I assume an or-pattern where different parts reference a different set of variables or the variables have different types will just result in an error message?

@osa1
Copy link
Contributor Author

osa1 commented Jan 28, 2017

I assume an or-pattern where different parts reference a different set of variables or the variables have different types will just result in an error message?

That's what I thought, but I think if we do something like:

for pattern in or_pattern:
  type check RHS against pattern

then we can allow different types for same variables in different patterns. Example:

data T a = C1 Int | C2 Bool

f :: T a -> String
f (C1 x | C2 x) = show x

If we type check show x for each pattern and only accept when for all patterns it typechecks I think we can accept this. Then during desugaring we can duplicate right-hand sides to get this:

f a = case a of
        C1 x -> show x
        C2 x -> show x

To keep things simple (and in within the parts of the compiler internals that I understand), I said "same set of variables of same types". We can either generalize this later or right now if type checker experts can chime in here.

One thing to note here is that is we restrict patterns to bind same set of variables of same types I think desugaring gets a lot simpler: we just generate a local function for the RHS, and call it with same set of variables in each case (similar to how fail_ functions generated by the desugarer). It gets more tricky as we generalize because we may have to pass typeclass dictionaries (and maybe some other things that I can't imagine right now).

@mitchellwrosen
Copy link

mitchellwrosen commented Jan 28, 2017

One alternative to this feature is simply a warning for the existence of a wildcard pattern match.

@osa1
Copy link
Contributor Author

osa1 commented Jan 28, 2017

One alternative to this feature is simply a warning for the existence of a wildcard pattern match.

What would be the fix for that warning? Without or patterns you just replace one problem with another (namely, wildcard patterns with repetitive and/or duplicated code).

@mitchellwrosen
Copy link

mitchellwrosen commented Jan 28, 2017

Without or patterns you just replace one problem with another

That's true, but (IMO) it's replacing a big problem (wildcard patterns) with a small(er) one (duplicating the RHS on all the patterns you want to treat uniformly).

@rwbarton
Copy link

rwbarton commented Jan 28, 2017

I would very happy to have these. Ocaml has or-patterns already (https://caml.inria.fr/pub/docs/manual-ocaml/patterns.html#sec108), so we could borrow ideas from its implementation.

A real-world use case where or-patterns would be helpful that I mentioned just the other day to someone working on improving iOS support in GHC:

picCCOpts :: DynFlags -> [String]
picCCOpts dflags
    = case platformOS (targetPlatform dflags) of
      OSDarwin
          -- Apple prefers to do things the other way round.
          -- PIC is on by default.
          -- -mdynamic-no-pic:
          --     Turn off PIC code generation.
          -- -fno-common:
          --     Don't generate "common" symbols - these are unwanted
          --     in dynamic libraries.

       | gopt Opt_PIC dflags -> ["-fno-common", "-U__PIC__", "-D__PIC__"]
       | otherwise           -> ["-mdynamic-no-pic"]
      OSMinGW32 -- no -fPIC for Windows
       | -- ...

The OSiOS case needs to be the same as the OSDarwin case, and an or-pattern (OSDarwin | OSiOS) would be the most direct way to express that. (Otherwise you need to copy the whole OSDarwin case, or refactor the guards and bodies into a new binding and use it in both the OSDarwin and OSiOS cases. If the guards weren't "complete", so that there was a chance that no guard succeeds, then you'd need to duplicate the guards and bodies individually.)

As for syntax, I'm +1 on using | (as opposed to alternatives discussed in the ticket) because it is short, and already a reserved operator. There is a parse conflict with guards (exemplified in the above code snippet) but using parentheses to disambiguate is easy enough.

As for semantics when binding variables of different types, matching on existential constructors, etc., I'd be content with an incremental approach of supporting the simple cases first; that's where most of the value is anyways.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 28, 2017

We should have done this along time ago. I use it all the time in Rust too, and miss it all the time in Haskell.

One interesting thing to note is that exhaustive lazy or patterns make sense.

@vagarenko
Copy link

vagarenko commented Jan 29, 2017

Can I use it in case or lambda case:

stringOfT :: T -> Maybe String
stringOfT x = case x of
    T1 s -> Just s
    T2{} | T3{} -> Nothing

like this?

@rwbarton
Copy link

rwbarton commented Jan 29, 2017

Can I use it in case or lambda case:

Yes although in this situation you would need parentheses around T2{} | T3{} (otherwise it parses as a guard).

An or-pattern is again a pattern so it can appear wherever a pattern can, including for example nested within another pattern (Just ('x' | 'y')).

@osa1
Copy link
Contributor Author

osa1 commented Jan 29, 2017

@vagarenko, like @rwbarton said, an or pattern can appear anywhere that a pattern can appear.

OCaml already supports or patterns in full generality (i.e. they can appear anywhere that patterns can appear). This is from Real World OCaml:

let is_ocaml_source s =
  match String.rsplit2 s ~on:'.' with
  | Some (_,("ml"|"mli")) -> true
  | _ -> false

In Rust this is not the case, the reference says "Multiple match patterns may be joined with the | operator.".


I started thinking about the implementation. As the first thing I think we may have to make significant changes in the parser. Currently patterns are subsets of expressions, so we have productions like this:

pat     :  exp          {% checkPattern empty $1 }

checkPattern transforms an expression to a pattern:

checkPattern :: SDoc -> LHsExpr RdrName -> P (LPat RdrName)
checkPattern msg e = ...

With this change patterns won't be a subset of expressions, so we may want to first parse for a pattern, and then try to transform it into an expression. Does anyone have any other ideas on this?

@rwbarton
Copy link

rwbarton commented Jan 29, 2017

Pattern syntax was never really a subset of expression syntax (especially before TypeApplications):

Prelude> f @ x

<interactive>:2:1: Pattern syntax in expression context: f@x
Prelude> ~a

<interactive>:3:1: Pattern syntax in expression context: ~a

I'm not sure whether the pattern parser reuses the expression parser out of technical necessity (e.g., we don't know up front whether we are parsing a pattern or an expression) or out of convenience. If it's the latter it might be time to create a separate pattern parser. IIRC the reuse of the expression parser already causes some oddities around the precedence of @ and/or ~.

@goldfirere
Copy link
Contributor

goldfirere commented Jan 31, 2017

It's out of necessity. If a line begins f x y, the parser doesn't know if it's a naked top-level Template Haskell splice or the beginning of a function definition.

Naked top-level splices are a misfeature, in my opinion.

@saurabhnanda
Copy link

saurabhnanda commented Jan 31, 2017

Is this proposal to force the pattern-match to match every possible ADT value explicitly? There are times where just having a wildcard _ match is a bug waiting to happen.

Actual use-case -- we start with the following ADT definition and call-sites:

data BookingStatus = Confirmed | Cancelled | Abandoned
computeRemainingSeats :: BookingStatus -> (...)
computeBilling :: BookingStatus -> (...)

Now, assume that within computeAvailability only Confirmed is explicitly matched to result in a reduction of available seats, everything else is matched by _ and results in a no-op. What happens when BookingStatus evolves to have a new value of ManualReview? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact.

Another one: assume that within computeBilling only Confirmed is explicitly matched to trigger an invoicing action, others are matched via _ to a no-op. What happens when BookingStatus evolves to have a new value of Refunded, which should trigger a credit-note action? Again, the compiler is not going to help us here.

Therefore, my question. Can we add a pragma to ADTs to disallow wildcards? eg.

data BookingStatus = Confirmed | Cancelled | Abandoned
{-# NoWildCardMatch BookingStatus #-}

Once this is done, having the or pattern will make explicit pattern matches easier to write.

@osa1
Copy link
Contributor Author

osa1 commented Jan 31, 2017

Thanks for the examples @saurabhnanda. Your examples are exactly the same as the second example I gave in the proposal.

About the pragma: it sounds like a good idea, but it's orthogonal to this proposal and can be done separately. Even without or patterns it may be useful, so I suggest creating a new proposal for that.

So no, this is not a proposal to force pattern matching on every possible ADT constructor.

@saurabhnanda
Copy link

saurabhnanda commented Jan 31, 2017

@phadej
Copy link
Contributor

phadej commented Jan 31, 2017

I don't really see the value of NoWildCardMatch pragma. Lazy programmer can workaround it by

isConfirmed :: BookingStatus -> Bool
isConfirmed Confirmed = True
isConfirmed Cancelled = False
isConfirmed Abandoned = False

andHereICanBeLazy :: BookingStatus -> IO ()
andHereICanBeLazy bs | isConfirmed = putStrLn "confirmed"
                     | otherwise   = putStrLn "not confirmed"

or in case of non-enums with YourType -> Maybe (Arg1, Arg2) or Prism.

I.e. at the end it will be about discipline in the team, so I don't see
it's good candidate for inclusion into GHC.

OTOH, feel free to experiment with writing hlint rules.
Maybe hlint doesn't support finding wildcard matches, but then experiment with haskell-src-exts,
it shouldn't be too difficult to find wildcard matches and make quick'n'dirty "type inference" (either from type signature, or other pattern match cases).

@peti
Copy link

peti commented Jan 31, 2017

Lazy programmer can workaround it by ...

It wouldn't call a programmer doing that "lazy". It's more like an hostile attacker actively trying to break the rules. Now, it's unfortunate that this issue exists, but it doesn't take away from the usefulness of NoWildCardMatch for those who want to be constructive about writing code that doesn't rely on catch-all default cases.

@simonpj
Copy link

simonpj commented Jan 31, 2017

I'm not against or-patterns, even mildly in favour.

But

we can allow different types for same variables in different patterns

Let's NOT do this . It would add a huge amount of complexity to a basically-simple feature. Each pattern in an or-group should bind the same term variables, exisitential type variables, and constraints.

Don't forget there is work to do to fix up the pattern-match overlap checker.

@qnikst
Copy link

qnikst commented Jan 31, 2017

starting from some level of complexity the good complex rule that is catch all but it may be changed in future, and workaround for bypassing the wildcard restriction feature are indistinguishable.
Also example provided by @phadej is completely OK with the NoWildcardMatch, because isConfirmed function uses it.
The only stable way of disallowing wildcard match without introducing bad side effects is to not expose internals but provide deconstructive:

data BookingStatus = Confirmed | NotConfirmed

withBookingStatus onConfirmed onNotConfirmed

Then with the change of the data type - the type of deconstructor will also change and all users will be notified.

@simonmar
Copy link

simonmar commented Jan 31, 2017

It's out of necessity. If a line begins f x y, the parser doesn't know if it's a naked top-level Template Haskell splice or the beginning of a function definition.

it was a necessity even before top-level naked splices, e.g. in the statements of do or a list comprehension there's a clash between p <- e (a bind) and just e (a guard).

@nomeata
Copy link
Contributor

nomeata commented Jul 6, 2018

Delaying this realization until the user tries to use a variable is not very nice.

Why? An error message “a_b is out of scope here. It is bound in one branch of the or-pattern at position x:y, but not the others” will quickly point the user to the problem – I don’t see a problem here.

@nomeata
Copy link
Contributor

nomeata commented Jul 6, 2018

The section “Interaction with other extensions” should discuss the interaction with ScopedTypeVariables, and with type variables bound in patterns. For example, would these be accepted?

foo :: Either a a -> …
foo (Left (x :: b) ; Right (x :: b)) = …

bar :: Either a a -> …
bar (Left (x :: b) ; Right x) = … -- with no mention of b on the RHS

It may be that a desugaring with view patterns will not be able to answer or express these questions, as we cannot return a type variable in the Just of that desugaring. (Unless the rule is “no type variables bound in or-patterns, never”. Which would be sad and probably be revised at some point.)

@nomeata
Copy link
Contributor

nomeata commented Jul 7, 2018

Another language extension interaction worth discussing: RecordWildCards. Consider this:

data T
  = C1 { a :: Int }
  | C2 { a :: Int }
  | C3 { a :: Int, b :: Bool}
  | C4 { a :: Int, b :: Char}

foo1 (C1 {...}; C2 {...}) = a
foo2 (C1 {...}; C3 {...}) = a
foo3 (C3 {...}; C4 {...}) = a
foo4 (C3 {...}; C4 {...}) = if b then a else 0

Currently, the proposal wording accepts foo1 but rejects all the others.

I am currently inclined to think that this is unnecessary strict, and am wondering if it would not be nicer if it accepts foo1, foo2 and foo3, and rejects foo4. A possible error message might be

You cannot use b here, as its type differs in different branches of the or-pattern in …:
b is bound with type Bool in …, but
b is bound with type Char in …

Of course the proposal ought not specify the wording of error messages, this is just for illustration.

This is related to my earlier point about

maybeConst1 (Just x  ; Nothing) = 42
maybeConst2 (Just _x ; Nothing) = 42
maybeConst3 (Just _  ; Nothing) = 42

One might argue that the first line is bad style (and I agree). But note that even if the proposal allows the former, then the usual “unused bindings warning mechanism” will ensure that -Wall-warning-free-code will not contain the first form,while allowing the second and the third, just as desired.

@nomeata
Copy link
Contributor

nomeata commented Jul 8, 2018

I had a long nice walk with Richard and we talked about possible ambitious ways of approaching or patterns (what if types are not equal, but subtypes? what about existential variables? what about constraints). We had many intriguing ideas, but nothing that immediately clicked or solved all the problems…

I hope that one day we figure out how to make or-pattern so strong that for all patterns p, p;p is equivalent to p (even if it involves GADts etc.), but I don’t require it for the first iteration of this feature, and will not veto a variant that only brings term variables into scope, and only if their types match.

I would still like to discuss the (not very technical, and more bikesheddingly) question of whether it is a compile error if some unused variables don’t meet this requirement.

@simonpj
Copy link

simonpj commented Jul 9, 2018

It's interesting how much is hidden inside such an apparently small innovation as or-patterns!

I'm still liking the spec "if the desugaring to view patterns typechecks, so does the or-pattern", because it is so simple and explicable. I'd love to see a direct typing rule for this; I don't think it should be too hard.

Unused variables. A variant of the design, which Joachim is implicitly suggesting, is to say that (p1 ; p2) brings into scope all the variables that are bound by both p1 and p2. That is, the two sets do not have to be identical; but only variables bound by both are brought into scope by the or-pattern.

I quite like that idea:

  • It is still readily explicable by the desugaring rule. It really does not make the spec more complicated.

  • It allows a variable to be bound in p1 and used in a view pattern in p1 without messing up the overall or-pattern (Joachim's point above).

  • It would allow dot-dot notation (which implicitly binds all the variables of the record).

So that change seems like a modest win to me. The only downside is that given

f (Just x ; Nothing) = x

you might just get "x is not in scope" (as indeed it isn't) and be puzzled. But of course extra work on the renamer could produce a more informative error message.

Scoped type variables

I can see that the specify-via-view-patterns story does not allow any scoped type variables to be brought into scope, even if they don't involve existentials. That's a shortcoming.

To me that's the strongest argument for a more direct typing rule. But unless someone wants to do that work soon, and it happens to work out really smoothly, I don't think we should let it stand in the way of doing something simpler for now.

@Centril
Copy link

Centril commented Aug 31, 2018

For reference, I have written the moral equivalent of this proposal for Rust, rust-lang/rfcs#2535.

@klapaucius
Copy link

klapaucius commented Sep 12, 2018

@mchakravarty

https://mail.haskell.org/pipermail/ghc-steering-committee/2018-June/000646.html

Fri Jun 29 03:16:30 UTC 2018
I will mark the proposal as accepted unless I hear a dissenting opinion by the end of next week.

@osa1
Copy link
Contributor Author

osa1 commented Sep 12, 2018

As said above this is not ready for merging yet. There are still a lot of things to specify.

@Centril
Copy link

Centril commented Oct 7, 2018

The equivalent Rust proposal, rust-lang/rfcs#2535, has now been accepted.

If you wish, you can make a note of this in the subsection on Rust.

I'd love to see this happen in Haskell as well. :)

@osa1
Copy link
Contributor Author

osa1 commented Oct 7, 2018

Thanks for the update. Updated the proposal.

@bravit bravit removed the Proposal label Dec 3, 2018
@osa1
Copy link
Contributor Author

osa1 commented Jan 9, 2019

I don't have time and motivation to work on this project anymore, so closing.

The hard part is defining semantics of current syntax before adding new to it. Once the semantics is there this should be a simple addition with a few tricky cases for scoped type variables and view patterns. Until then author of a proposal like this needs to define the entire language which is just too much work.

@osa1 osa1 closed this Jan 9, 2019
@osa1
Copy link
Contributor Author

osa1 commented Jan 11, 2019

Perhaps it makes sense to turn this work into a SoC project. I submitted haskell-org/summer-of-haskell#84 for this. As usual feedback is welcome.

@Pitometsu
Copy link

Pitometsu commented Apr 14, 2021

Shouldn't it be re-opened because of haskell-org/summer-of-haskell#84 (comment) ?

@simonpj
Copy link

simonpj commented Apr 14, 2021

Shouldn't it be re-opened because of haskell-org/summer-of-haskell#84 (comment) ?

Well, it needs an active champion. Anyone who wants to play that role can of course reopen it.

@LeventErkok
Copy link
Contributor

LeventErkok commented May 16, 2022

I'd very much love to see this supported in GHC. It'd make the life of the "working industry programmer" significantly better.

If it helps simplify design/implementation, you can outlaw binding any parameters in the "or" case. I.e., you can pattern-match multiple constructors, so long as no variables are bound in any of those matches. This is a simplification, and I doubt it would take away much from usability in practice, yet deliver a nice solution to practical problems. (Of course, if it doesn't add extra complication, do allow them; but it's an option perhaps for the first version of an implementation?)

@sgraf812
Copy link
Contributor

sgraf812 commented May 16, 2022

Yes; perhaps we can deliver on nested (field) matches in a future proposal.

That is, I propose the following change:

-p1 and p2 bind same set of variables.
+p1 and p2 bind no variables, constraints or dictionaries.

In particular, this change entails not having to worry about GADTs and whatnot for now. (No type refinement of the pattern variable is possible either.) I believe that if there is a semantics for or-patterns that can cover GADTs and bound variables, then this semantics can be expressed as a backwards-compatible extension to this simpler semantics.

I think we are rather late to the party:

Let's ship the MVP that catches 90% and think about the hard 10% case afterwards.

@LeventErkok (or someone who reads this!) would you be up to recycle this proposal into a new one that delivers the MVP?

@nomeata
Copy link
Contributor

nomeata commented May 16, 2022

When you say

bind no variables, constraints or dictionaries.

Do you mean that the patterns must not bind any of these, or that any bindings are ignored? For variables probably the former (i.e. requiring the programmer to write _ or {}), but for constraints there is no syntax to explicitly not bind them.

Here is an example. Allowed or not?

data T a where
  MkTInt :: T Int
  MkTBool :: T Bool

foo (MkTInt | MkTBool) = ...

@sgraf812
Copy link
Contributor

sgraf812 commented May 16, 2022

Do you mean that the patterns must not bind any of these, or that any bindings are ignored?

The latter, at least in Haskell. The former in the desugaring to Core (where it's all just variables, I guess). I should have been more clear. So your foo is allowed, but this bar would be rejected:

data U a where
  MkInt :: U Int
  MkInt2 :: U Int

bar :: U a -> a
bar (MkInt | MkInt2) = 42 :: Int

That is, an or pattern will never provide new Givens such as a ~ Int here.

Perhaps it should just be

-p1 and p2 bind same set of variables.
+p1 and p2 bind no variables.

because the proposal already states, in 1.4.1

The desugaring rule implies that none of the above [Equality constraints, dictionaries, existential types, referred to as "GADTty stuff" below] can be bound by an or pattern.

Maybe that section should be reworded in terms of "Givens"? Not sure that helps. Anyway, my changed proposal would allow GADTs in or patterns (including existentials), but the desugared Core would not bind any of the aforementioned GADTy stuff in the view pattern.

NB: A future proposal could relax this requirement and (semantically) allow more (syntactically valid) programs in the process that would be semantically incorrect in the MVP. I believe that is what the current proposal aimed at and is also what caused it to come to a halt, even after it only tried to focus on simple field binders without GADTy stuff. The details are non-trivial, especially if we want to keep the desugaring to view patterns. Maybe vars where vars is a tuple of the variables bound in the pattern is not enough to carry the Givens or existentials, so why not desugar to a view pattern over Bool.

@nomeata
Copy link
Contributor

nomeata commented May 16, 2022

Thanks guys clarifying!

@AntC2
Copy link
Contributor

AntC2 commented May 16, 2022

? I think this proposal is in the same design space as the \cases proposal, which is about to land in 9.4. The advantage with \cases is the alts can handle AFAICT per-alt binding different vars; and/or type improvements under a GADT; and different/multiple guards per alt.

As at today, I think you could write a Pattern Synonym/ViewPattern to give a uniform view over diverse constructors -- again with more flexibility of typing. Admittedly that comes at cost of a couple of extra declarations, and some rather clunky code.

@JakobBruenker
Copy link
Contributor

JakobBruenker commented May 17, 2022

@AntC2 the interaction of Or-patterns with \cases proposals should in principle be exactly the same as the interaction of Or-patterns with function definition equations with multiple patterns, like

f p1 q1 r1 = ...
f p2 q2 r2 = ...

I think in both cases (i.e. here and with \cases) you should be able to replace any of these patterns with an Or-pattern without issues.

@sgraf812
Copy link
Contributor

sgraf812 commented Jul 19, 2022

Note that #522 is an evolution of this proposal that eschews variable bindings altogether (in a forward compatible way), as motivated by #43 (comment). Feel free to offer your opinion there, since this proposal is dead as far as I can tell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet