Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional delimiters around match/try/function cases. #722

Closed
wants to merge 1 commit into from

Conversation

@yallop
Copy link
Member

commented Jul 27, 2016

[Since there was some support for this suggestion (thanks @alainfrisch, @gasche, @dbuenzli, @bobzhang!) under #715, I've opened a PR for further discussion.]

This PR adds optional delimiters around the cases in try, match, and function expressions. As an alternative to the current syntax:

match e with
  C₁ -> e₁
| C₂ -> e₂

this change gives you the option of writing your code like this:

match e with
[ C₁ -> e₁
| C₂ -> e₂ ]

What problem does this solve?

There is currently a kind of ambiguity in the syntax for nested matches. In the following example, the case for C₂ actually belongs to the inner match, despite the indentation:

match e with
  C₁ -> match e₁ with
            C₃ -> e₃
          | C₄ -> e₄
| C₂ -> e₂

Of course, this is not strictly an ambiguity in the grammar. But it is a potential pitfall. Fortunately, there is a straightforward workaround --- you can surround inner matches with begin and end or with parentheses:

match e with
  C₁ -> begin match e₁ with
            C₃ -> e₃
          | C₄ -> e₄
        end
| C₂ -> e₂

But this is a little heavy, and suggests a shortcoming in the current syntax, since the delimiters go around the match expression; they are not part of the expression itself.

With the new syntax the delimiters are part of the syntax of the expression, and the nested match may be written like this:

match e with
[ C₁ -> match e₁ with
        [ C₃ -> e₃
        | C₄ -> e₄ ]
| C₂ -> e₂ ]

A warning along the lines of @alainfrisch's proposal in #716 for nested match expressions without delimiters would work well with this change as an additional help for avoiding the problem.

What about the optional initial |?

Following @gasche's suggestion, this change supports an optional | before the first case:

match e with [
 | C₁ -> e₁
 | C₂ -> e₂
]

This has the advantage of making the cases easier to edit, since there is no special initial case.

What should the delimiters be?

This change currently uses brackets ([ ... ]) to delimit cases. But the syntax would equally well support parentheses (( ... )) or even braces ({ ... }). On balance I think the arguments are in favour of brackets.

Advantages of brackets

  • Experience suggests that having a variety of delimiters (rather than using parentheses everywhere) makes code easier to read. Some Lisp-family languages, such as Racket, support brackets as an alternative to parentheses for this reason.
  • Parentheses are already used to enclose sub-expressions and sub-patterns in OCaml. But a set of cases is not very much like either of these, since it can only be used as part of a larger construct, not in isolation. So using different delimiters seems appropriate.
  • There is already some experience with using brackets in the revised syntax.
  • As @dbuenzli points out, brackets are reminiscent of the syntax for polymorphic variant types.

Advantages of alternative delimiters

  • Parentheses have the advantage that (| is not a token, so there is no need for a space between ( and the optional opening |.
  • Braces are familiar from the switch statement in C-family languages, and from Reason.
@yallop yallop referenced this pull request Jul 27, 2016
| TRY ext_attributes seq_expr WITH opt_bar match_cases
| TRY ext_attributes seq_expr WITH match_cases
{ mkexp_attrs (Pexp_try($3, List.rev $5)) $2 }
| TRY ext_attributes seq_expr WITH LBRACKET match_cases RBRACKET

This comment has been minimized.

Copy link
@gasche

gasche Jul 27, 2016

Member

This duplicate-rule style adds some amount of redundancy (for example in attributes handling). I suppose you tried defining a rule match_cases_with_optional_brackets or whatever and it introduced conflicts?

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2016

I... It feels weird to admit it, given that purely syntactic changes and they tend to generally be unconvincing in some way (except maybe match .. with exception), but I like this proposal.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

I could try myself, but intuitively I would have thought that this introduced a conflict in the grammar since

  match e with [ A | B | ....

could be the beginning of either your new construction or a standard list pattern.

@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 27, 2016

@alainfrisch: your example is parsed correctly:

# type t = A | B | C;;
type t = A | B | C
# function [ A | B ] -> true | _ -> false;;
- : t list -> bool = <fun>
# function [ A | B -> true | _ -> false ];;
- : t -> bool = <fun>

The change doesn't introduce any grammar conflicts (which cause build failures nowadays).

@bluddy

This comment has been minimized.

Copy link

commented Jul 27, 2016

@yallop: It's parsed correctly, but it is confusing.

Not saying I'm against it, but there's definitely confusion with list syntax.

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

Moving my question from #715:

Is an (admittedly contrived) case like:

[match e with
 [ c1 ] -> e1]

simpler from a parsing perspective than the begin match ... end example you gave in #715 (comment)?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

I'm much more in favor of this than #715 , but I think I'd personally still prefer begin match ... end; for rather personal reasons such as (i) it's easier to type on AZERTY keyboards, or (ii) it keeps the opening and closing delimiters vertically aligned and thus easier to match.

@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 27, 2016

@hcarty: indeed, that example is simpler, assuming that the opening [ is non-optional. The difficulty arises only with optional delimiters.

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

Yeah, I prefer begin and end as @alainfrisch suggests...

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

@yallop In this proposal the opening [ in a match is still optional though, correct? So maybe this is a better example:

[match e with
 [ c1 -> e1
]
@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 27, 2016

@hcarty: there's no problem with that example. The closing ] belongs to the most recent opening [.

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

@yallop In that case I don't see the issue with the example you gave in #715 (comment). match and end, begin and end - both would be valid delimiter pairs, like [ and ].

Is the issue that match and end would only be matching delimiters when the -safe-syntax flag is passed?

@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 27, 2016

@hcarty: there's no ambiguity with match ... end in the -safe-syntax mode in #715, because -safe-syntax creates a new mode in which every match must have an end.

The ambiguity arises when end is optional for some constructs.

So making end optional is ambiguous. And making it non-optional breaks backwards compatibility.

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

@yallop: Thank you, I think I properly understand your original concerns now. Apologies for the long list of questions - I appreciate your time on this.

@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 27, 2016

@alainfrisch, @mshinwell: there is, of course, nothing stopping you from continuing to use begin match ... end. (And the warning in #716 could be refined to check for either internal or external delimiters.)

@braibant

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2016

What about using match ... with begin ... end, using begin and end instead of parenthesis or braces? I actually kind of like this proposal (but prefer the already accepted style of begin match ... with ... end, as @alainfrisch and @mshinwell).

@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2016

@braibant: yes, begin and end in place of [ and ] would also work.

I'm curious as to why you say you prefer begin match ... with ... end.

Internal delimiters are quite harmonious with the rest of the language, e.g. with records, which we write like this:

type t = {
  x₁: t₁;
  x₂: t₂;
}

not like this:

begin type t =
  x₁: t₁;
  x₂: t₂;
end

and with modules, which we write like this:

module M = struct
  let v₁ = e₁
  let v₂ = e₂
end

not like this:

begin module M =
  let v₁ = e₁
  let v₂ = e₂
end

and with a number of other constructs. Half-open sequences like match, with an opening delimiter but no closing delimiter, are more unusual, especially in constructs that support nesting.

It is possible that you prefer begin match ... with ... end mostly because it's more familiar?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2016

The point is the vertical alignment of opening and closing delimiters.

For records, I tend to prefer:

type t =
  {
    ...
    ...
  }

and for structures:

module M =
  struct
    ...
    ...
  end

For structures, it is true that I often write:

module M = struct
end

because it uses less indentation level, and vertical alignment is
less critical since the struct and end are rarely on the same
screen anyway.

It is possible that you prefer begin match ... with ... end mostly because it's more familiar?

For me, yes probably. But we are talking about adding a new form; the question is not such much whether it would have been better to have from day 1, but rather whether it is worth introducing it now. I find the existing form works well and I see no advantage in match ... with begin ... end.

For me the problem is really not about begin match ... with ... end itself but about the fact that people might forget the begin...end (and they could equally well forget about them with this proposal) -- hence my warning proposal.

@yallop

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2016

The point is the vertical alignment of opening and closing delimiters.

I don't understand this point at all. This proposal precisely supports the style that you prefer:

For records you write

type t =
  {
    ...
    ...
  }

For matches you can write

match e with
  [ 
    ...
    ...
  ]

Of course, this style is not enforced, just as for records, modules, etc.

For me the problem is really not about begin match ... with ... end itself but about the fact that people might forget the begin...end (and they could equally well forget about them with this proposal) -- hence my warning proposal.

I'm in favour of your warning proposal, which solves a related but different problem.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2016

match e with
  [ 
    ...
    ...
  ]

wastes one more line than:

begin match e with
    ...
    ...
end

(and potentially one identation level as well, except if you align | with [). Record definitions are much less frequent (and nested) than pattern matching, so I can live with the extra line.

I could switch to a different syntax, even a bit heavier, if I could see any advantage. But I don't.

a related but different problem.

Your PR description states that the problem to be solved is a "kind of ambiguity in the syntax for nested matches". I don't see how the proposed solution improves on the current situation in this respect: people will have to know and think about the problem (and the warning will help), and if they do, they could as well use the existing solution.

In the PR description, you also say that the existing solution "is a little heavy". This is a matter of taste, but I find:

begin match e with
    ...
    ...
 end

less heavy than:

match e with
  [ 
    ...
    ...
  ]

(or the equivalent with begin...end). The following could be a bit less heavy:

match e with [
  ...
  ...
]

but it is actually more difficult to type on an AZERTY keyboard and it lacks the vertical alignment of the "straightforward" solution; moreover it's not a lot lighter than the current solution. Nothing to justify for me to use another syntax. (And the small advantage over the current solution in term of lightness disappear if you use begin...end instead of [...].)

But again, I'm not against the proposal. I don't see any advantage over the existing solution but several other expert people do; syntax is rather subjective and I can live with the fact that I don't understand the benefits of the proposal.

@gerdstolpmann

This comment has been minimized.

Copy link

commented Aug 8, 2016

Frankly, I really don't get this. I'm not a fan of begin/end (typographic nightmare), and hence I'm always using round parentheses. This leads to:

match e with
  C₁ -> ( match e₁ with
            C₃ -> e₃
          | C₄ -> e₄
        )
| C₂ -> e₂

Which is valid code in current OCaml, and solves already 90% of the mentioned stylistic problems. Please, please, introduce new types of parentheses only if they really improve something. (Off topic: I have a suggestion, somewhat borrowed from Scala, namely {{ expr }} as a shortcut for (fun () -> expr), and to use that for any imperative blocks, e.g. you'd write while expr {{ expr }} .)

@paurkedal

This comment has been minimized.

Copy link

commented Aug 13, 2016

I also don't see why match e with [ ... ] is an improvement over ( match e with ... ), and in fact the latter avoids the issue of special-casing the first line (edit: ok, that was not an issue). The reason I often use begin is to avoid a dangling end-parenthesis or special-casing the last case, but this proposal does not change that.

But, I do think the begin match often looks a bit cluttered, esp. as it breaks symmetry with the surrounding match if it's not begin-prefixed as well.

Edit: Sorry, only read #715 after making this comment. That PR is more or less the same idea as the following.

A better syntax in my mind would be a consistent revision to use end to terminate construct which are typically multi-line:

begin ... end
try ... with ... end    (* revised *)
match ... with ... end  (* revised *)
function ... end        (* revised *)
while ... do ... end    (* revised *)
for ... do ... end      (* revised *)
object ... end
struct ... end
sig ... end

and eventually drop the done keyword. This makes it mandatory to spend an extra line on multi-line constructs, but it could make the language easier to learn, and reduce the time spent on code layout optimisation.

This seems tricky to change without breaking existing code, though.

@yallop

This comment has been minimized.

Copy link
Member Author

commented Aug 13, 2016

This seems tricky to change without breaking existing code, though.

It's more than tricky; it's (probably) impossible. Without a concrete, backwards-compatible proposal, what you describe isn't a "better syntax" in any practically useful sense.

@paurkedal

This comment has been minimized.

Copy link

commented Aug 13, 2016

This seems tricky to change without breaking existing code, though.

It's not tricky; it's impossible. So what you propose isn't a "better syntax" in any practically useful sense.

Of course, you're right. It's a different syntax. I was merely considering if the same parser could handle current syntax as well for backwards compatibility. That may be impossible too, if the ambiguities cannot be resolved, but probably would require too much backtracking anyway. "Better syntax" OTOH is a matter of taste.

@paurkedal

This comment has been minimized.

Copy link

commented Aug 29, 2016

A reason I wasn't thrilled about this PR, was that I could not find a fully satisfactory way to indent the code, as the snippets I converted just seemed noisy. I played around a bit more this weekend, and I've put up one of the functions formatted in different ways comparing this PR compared to the closest parenthesised analogue: https://gist.github.com/paurkedal/3526cdf7cf65baea271d40f42fd50392

My own favourites are the ones labelled OLB and OCP (and well, maybe the esoteric OXX). OLB uses brackets and and OCP uses parentheses, both indenting patterns on odd columns, which I haven't even considered before. So, at least I'm assured that there are reasonably tidy ways of formatting code using the syntax suggested by this PR. On the other hand, I think I'll stop writing begin match anyway now, since the OCP and OLP styles are already supported and seem tidy enough.

On a different note, would this be allowed:?

List.exists
  function [
   | None -> false
   | Some _ -> true
  ]
  xs

A future possibility would be if fun followed by [ was re-interpreted as a multi-pattern form of fun. Currently this is parsed as a function from a list, but that is always non-exhaustive. Along with the above precedence rule, this gives a terser abstractions for one-liners like

List.exists fun[None -> false | Some _ -> true] xs
@garrigue

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2016

@paurkedal If we're in the esoteric proposals, what about the following variant of OXX, which does not require any extension to the syntax, and is 99.99% compatible.

let consider_pattern tc1 tc2 pc fp' dc =
  Log.debug_f "Considering directory entry %s" dc >>
  match String.cuts ~sep:"%" dc with
  | [] -> assert false
  | [_] -> Lwt.return_none
  | [dc1; dc2] ->
      match cut_prefix tc1 dc1, cut_suffix tc2 dc2 with
      | None, _ | _, None -> Lwt.return_none
      | Some dc1', Some dc2' ->
          Log.debug_f "Matching %s against %s%%%s" pc dc1' dc2' >>
          match cut_circumfix dc1' dc2' pc with
          | None -> Lwt.return_none
          | Some pc' ->
              consider_fixed (Filename.concat tp dc) fp' (Filename.concat rp pc')
          | _ -> .
      | _ -> .
  | _ -> Lwt.return_none

The idea is that a wild card pattern closes the pattern matching syntactically. This is of course already the case semantically, as all cases after it are unreachable by definition. The interesting thing is that the introduction of refutation cases allows to add a wild card to any pattern-matching, while preserving the exhaustiveness check. It is not 100% compatible, at it could change the extent of some pattern-matchings, but these are clearly meaningless ones, and the user would have been warned. IIRC, automatic indentation shouldn't be too tricky either. The parsing might be the hardest part.

@yallop

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2016

@garrigue: one tricky situation is where exception cases are involved, since exception cases can legitimately follow a wildcard:

    match h () with
    | g ->
      match g () with
        None -> ()
      | _ -> .
      | exception _ -> ()

@paurkedal: indeed, easing future changes to the syntax is a major motivation for this proposal.

@paurkedal

This comment has been minimized.

Copy link

commented Aug 30, 2016

@garrigue Yes, though isn't match-all patterns less common than proper exhaustive patterns? At least I would hope so, since that's one thing which makes refactoring safe in OCaml. I think OXX would be 100 % compatible as long as pointing out the last case is optional. In fact, introducing new keyword to optionally terminate patters would be backwards compatible except for the introduction of the keyword.

@paurkedal

This comment has been minimized.

Copy link

commented Aug 30, 2016

A case for this PR which was not included in the above gist is where the keywords of the construct is spread over multiple lines, as in

match
  a_long_expression
with [
 | Ok () -> ()
 | Error -> exit 69
]

In the parenthesised version, either with gets misaligned with match, or if one adds a space, either the relative alignment from with to the patters become inconsistent with other code, or one adds more indentation to the pattern which means the indentation dependents on the presence of parentheses. The issue can be hidden with larger indentation steps, but that costs horizontal space.

This issue arise even as the common case with try blocks, but never with function.

So, I'm convinced now that this PR is better than the status quo. PR #715 seems more beginner friendly, though with a bit more sophisticated indentation rules it seems we can write at least as readable code with this.

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2016

@paurkedal Why don't you just let-bind the scrutinee of the match?

I'm not sure that support for this proposal is at the level it would need to be for this kind of change. (I basically agree with @alainfrisch , FWIW.)

What do people think of @garrigue 's alternative?
(And @garrigue , do you want to comment on what @yallop pointed out about your suggestion?)

@xavierleroy Do you have an opinion?

@dra27

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2016

Are there ideas for @yallop's concern about exception clauses in @garrigue's idea? I guess it boils down to how much code 'in the wild' this may break?

There's something about forcing the wildcard pattern (with refutation, as appropriate) that I find appealing, since it forces a 'future warning' (e.g. because of a new constructor added to a variant) to become an error. Presumably it also means that extra clauses after the wildcard clause also become errors rather than unreachable warnings?

@yallop

This comment has been minimized.

Copy link
Member Author

commented Dec 27, 2016

I guess it boils down to how much code 'in the wild' this may break?

My view is that it's more a question of design. A change that doesn't break any existing code at all can still break conceptual coherence.

@garrigue's suggestion works well if we ignore exception patterns. But with exception patterns it results in quite surprising behaviour.

For example, in the following code the exception clause belongs to the inner match, both in current OCaml and with the wildcards-as-terminators suggestion:

    match d with
    | x -> match e with 
            | A, A -> x
            | _, _ -> .
            | exception (E y) -> y

At present the following code is entirely equivalent, but with wildcards-as-terminators the meaning would change so that the exception clause would belong to the outer match:

    match d with
    | x -> match e with 
            | A, A -> x
            | _    -> .
            | exception (E y) -> y

So reusing top-level wildcard patterns as terminators introduces a discontinuity in the design that could be pretty surprising in practice.

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2016

@mshinwell One potential issue with pulling the scrutinee into a let binding is exceptions. An exception clause couldn't be used unless the expression were wrapped in a thunk or made lazy.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2016

My idea was only a hack to start with, @yallop 's comment basically killed it, so I didn't bother to answer at the time. Sorry, there does not seem to be any miracle solution.

@paurkedal

This comment has been minimized.

Copy link

commented Dec 28, 2016

@paurkedal Why don't you just let-bind the scrutinee of the match?

@mshinwell Yes, and I think try-catch isn't that big a deal as it's a less common construct and often does not interfere. So this was just luke warm support for []-delimited patterns from me. I think current syntax with parentheses and 1-space indentation of patterns works just as well most of the time.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2016

@yallop - fair enough, I was wondering how damaging it would be to require that exception clauses must appear before wildcard clauses (the "in the wild" part) and personally found the potential confusion were that forgotten to be no worse than the if .. then ..; .. trap - however, having looked at 715, I realise that that thinking is not at all in line with where you were coming from with the proposal!

FWIW, I like your original idea too - although I sort of agree with Gerd's comment and would actually prefer parentheses (or even begin and end) to brackets (even given your pros and cons list)

@gasche

This comment has been minimized.

Copy link
Member

commented Dec 29, 2016

Parentheses do have the following advantage. What @yallop is doing is giving a first-class syntactical status to clause sets, which is a family of pattern -> expr associations. There is no slippery slope, but this does open the door to some features that use clause sets in other places than immediately after a with keyword, such as view patterns or some more innocuous nesting constructs: (Ok clauseset | Error clauseset) might be construed as a valid clause set. In those I'm-not-asking-for-them use cases, where clause sets may happen where patterns could be expected and clause sets are less likely (whereas they are very likely right after a with), [ may be a bit too suggestive of list patterns for its own good.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Feb 24, 2017

We discussed this PR at the developer meeting two days ago and the consensus was to work on #716. When that is merged, then this PR is too small of an improvement because you can simply add begin/end around your pattern-matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.