Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an imperative if construction #278

Closed
wants to merge 8 commits into from

Conversation

@trefis
Copy link
Contributor

commented Nov 3, 2015

A classic source of bugs in ocaml programs is related to the precedence of if statements relative the the precedence of semicolons.
By that I mean that:

if E then
  E;
  E

is parsed as (if E then E); E and not as if E then (E; E) in spite of what the indentation seem to imply.

It is an error we too often encounter at Jane Street, consider for example the situation where one refactors

if condition then
  begin match x with
  | stuff -> blah (); Deferred.unit
  end
  >>= fun () ->
  other_stuff ();
  foo ()

into

if condition then
  begin match x with
  | stuff -> blah ()
  end;
  other_stuff ();
  foo ()

This kind of error happens often enough (among newcomers to the language, as well as veterans -- see mshinwell@843ea4d#diff-f51c83f68b1c1139bb40787fec5d5ec2L425 ) that we would like compiler support to avoid them.

Proposition

The problem arising when using the open-ended if construction in presence of semicolons, we propose introducing a "terminated" version of if.

Single branch if statements would look like this :

if CONDITION do EXPRESSION done

and the ones with several alternatives look like this

if CONDITION do
  EXPRESSION
done else if CONDITION do
  EXPRESION
done else do
  EXPRESSION
done

In addition to this construction a warning (off by default) is added which triggers when the "traditional" open-ended if .. then .. else .. construction is present in an imperative context, by which we mean: on the left hand side of a semicolon or inside a do .. done block.

The choice of do .. done as delimiters for this new if construction comes from the similarity of this form with the other constructions of the language traditionally associated with imperative programming (for and while loops).
Indeed as is the case with loops, this construction insists that the content of the branches be of type unit.

Implementation summary

  • a new Pexp_ifdo construction is added to the parsetree.
  • the grammar is updated to recognize the new if .. do .. done form
  • a new warning (57) is introduced, it is disabled by default
  • the stdlib is updated to use this new construction wherever the warning was raised

I am not particularly attached to the current name and message of the new warning; any proposition to improve it is welcomed.

@yallop

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

I agree this would be a useful feature.

Some Lisps call the single-branch if "when"; I think we could have the same in OCaml, like this:

when condition do
  expression
done

I think I'd prefer to have a single distinct keyword rather than a distinct set of keywords to distinguish the one-branch and two-branch cases.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

Two questions:

  • Why add a new element in the parsetree ? It's completely equivalent to Pexp_ifthenelse.
  • Why change the syntax of if ? You could keep the normal if syntax; but optionally terminated by an end. I don't really care about which keyword you choose, but I don't see the point of having ... done else .... It's more keywords for no reason.
@gasche

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

Well supporting both if .. then .. else .. and if .. then .. else .. end would add a difficult conflict to the grammar on begin if .. then .. else .. end.

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

One less ambitious proposal would be to add warning 57, and just rewrite offending conditionals into begin if .. then .. else .. end. I would be in favor merging of a pull-request doing only this, as a first step that does not require taking Big Decisions -- except whether 57 should be activated by default.

(Also "imperative if..then..else" is a really ugly name. You could use "delimited" or "terminated" instead of "imperative". Really, would you hope to get a change accepted that adds more "imperative" stuff to the language ?)

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

I think I'd prefer to have a single distinct keyword rather than a distinct set of keywords to distinguish the one-branch and two-branch cases.

Could you elaborate, I don't quite follow what you mean?

Also, whilst the single-branch if is probably the most common place where this issue arises, if ... then ... else ... suffers from the same issue when followed by a ;.

Why add a new element in the parsetree ? It's completely equivalent to Pexp_ifthenelse.

The typing is different: if .. do ... done always has type unit.

Why change the syntax of if ? You could keep the normal if syntax; but optionally terminated by an end.

As @gasche says an optional terminator causes grammar conflicts, consider:

if ... then
  foo ();
  bar ()
end

This is not LALR since you do not know when you see the ; after foo () whether to reduce the if case or shift and carry on.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

The typing is different: if .. do ... done always has type unit.

Why the restriction ?

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

Why the restriction ?

It keeps the do .. done consistent with the do ... done in while and for, and it helps to separate if ... do ... done and if ... then ... else ... in terms of their intended uses.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

But there are lot's of cases where you want a terminated if. Not only in an imperative setting.

@bluddy

This comment has been minimized.

Copy link

commented Nov 3, 2015

We're paying the price for having tried to be too terse. What we really needed is ruby's approach, where you have if ... then ... [else ...] end. Relying on a semicolon to terminate the if expression was a bad idea. Same applies to our match expressions, where ocaml tried to get away without terminating, but we then have to wrap (some) nested match expressions with begin...end, so we end up paying more.

@kit-ty-kate

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

I agree with drup on this one. It's consistent with the usual syntax but there is a lot of usecases in which it's useful to allow the new syntax to return something else than unit.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

But there are lot's of cases where you want a terminated if. Not only in an imperative settings.

Could you give an example?

According to the manual ; is the only binary construct with lower precendence than if.

@c-cube

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

I don't think the new syntax is a good idea. The example should already been enclosed between () or begin end before being refactored; in my opinion, it is already very fragile.

if condition then
  begin match x with
  | stuff -> blah (); Deferred.unit
  end
  >>= fun () ->
  other_stuff ();
  foo ()

Relying on if test then foo >>= fun () -> a;b being if test then (foo >>= fun () -> (a;b)) is almost pure luck. In other situations, you can write begin end or ( ) to enclose the conditional branch, almost like {} in a C-like language.

@bluddy

This comment has been minimized.

Copy link

commented Nov 3, 2015

@gasche, what does warning 57 do?

@Drup

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

@lpw25 f if ... end x y You need additional enclosing begin/end at the moment.

Also, it's just plain clearer in almost every cases.

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

"Warning 57" is the best name I found, in the context of this discussion, for the new warning added by @trefis patch that warns whenever an if .. then .. else .. (as a single AST node) is followed by a ;.

Have a look at the pull request. Thomas not only introduced a new construction, he also implemented a warning to detect a (subset of) fragile use-cases for the old construction -- and converted them to the new one.

I propose that (regardless of whether the new syntax is deemed useful or not, and frankly I expect the discussion to be extremely boring) we integrate the new warning as soon as possible. If the construction is added, it can be used to silence it, otherwise I personally suspect that begin if .. end could do just fine.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

Relying on if test then foo >>= fun () -> a;b being if test then (foo >>= fun () -> (a;b)) is almost pure luck.

On the contrary, it's pretty common monadic style.

Are you honestly going to claim that this issue doesn't case bugs? There is plenty of evidence that it does.

f if ... end x y You need additional enclosing begin/end at the moment.

Sure, but that is not the issue being addressed. Issues with constructs of higher precedence can always be safely solved by just adding a ( ... ) around the whole if. The issue with ; is that there is no convenient way to specify precisely what the body of the if is. Even using begin ... end in the body does not work, because (as illustrated by @trefis's example) other things outside of the begin ... end can still be included in the body.

Basically the f if ... end x y case is an issue of convenience, whereas the ; case is one of safety.

@bluddy

This comment has been minimized.

Copy link

commented Nov 3, 2015

@gasche I agree. I think that's a good first step.

@c-cube

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

@lpw25 I agree it causes bugs, but I don't think the solution is a good one. Now, purely hypothetically, what if we introduced a breaking syntax change (feature-gated by a flag similar to -safe-string for several versions) that requires a end after if, match and try? It should be possible to provide a small tool that rewrites old syntax to new syntax.

(ok, I know, not a chance…)

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Nov 3, 2015

@gasche said:

I propose that (regardless of whether the new syntax is deemed useful or not, and frankly I expect the discussion to be extremely boring) we integrate the new warning as soon as possible. If the construction is added, it can be used to silence it, otherwise I personally suspect that begin if .. end could do just fine.

The actual implementation of the warning is dead simple: "trigger when an if expression is followed by a semicolon", we allow ourselves such a simple check because we know that a delimited form exists which could be used instead of the non-delimited one.
Despite Drup's remark the Pexp_ifdo construction is not "completely equivalent to Pexp_ifthenelse": it carries information about the presence of delimiters.
If we do not find a way to carry this information to the parsetree one way or another (it is done here with a new constructor, it could have been done with a new parameter to the Pexp_ifthenelse one) then the only other place where we have information about delimiters is in the parser itself; my gut feeling is that it would make the code significantly more ugly to do the check and raise the warning from that place.

As for using begin if .. end instead of if .. do .. done, from the warning point of view it of course makes no difference.
From the user point of you it means (if I understood your proposition correctly) that instead of writing

if <condition> do
  <expr>;
  <expr>
done;
<expr>

as proposed by this PR, one would have to write

begin if <condition> then begin
  <expr>;
  <expr>
end end;
<expr>

which is marginally more verbose, though probably not unbearable.

From an implementation point of view, I would expect the changes required to the parser to be significantly larger.

As for the ambitiousness of this new construction: it doesn't introduce any new keyword, nor does it break any existing code.

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

From an implementation point of view, I would expect the changes required to the parser to be significantly larger.

I get your point, although I don't think it would be very large. Also, the cases that would require an annotation in the grammar also correspond to cases where, with your implementation, the warning is raised where there is in fact no danger -- and the information added to the grammar would also help if one day we want to have more faithful -dsource printing.

@Chris00

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

What about the new warning being triggered if the “then” (or “else”) clause is terminated by a semicolon or possesses a semicolon in its body, and it is not braced?
So, instead of

if condition then
  begin match x with
  | stuff -> blah ()
  end;
  other_stuff ();
  foo ()

you would be required to write

if condition then (
  begin match x with
  | stuff -> blah ()
  end;
  other_stuff ();
  foo ()
);

In the name of safety, I do not see why we wouldn't then “impose” (under the new warning) that the following be braced too (as displayed):

if condition then
  do_someting()
else (
  begin match x with
  | stuff -> blah ()
  end;
  other_stuff ();
  foo ()
);

Following the same trend, why not also apply the same warning to:

if condition then
  let x = 1 in
  y;
z

(badly indented on purpose!).
Finally, one may wonder whether the following should require braces:

if cond then raise X;

used, in particular, as

if cond then invalid_arg "...";

(if it is turned on by default, the new warning will then affect quite a few codes — although I do not mind personally).

@ygrek

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2015

I dont understand how the compiler will know that begin end in first example is not what user intended. Also the last example is deal-breaker IMHO.
I think warning for every if then followed by semicolon is an overkill, OTOH if .. do ... done looks like a useful addition

@Chris00

This comment has been minimized.

Copy link
Member

commented Nov 3, 2015

I dont understand how the compiler will know that begin end in first example is not what user intended.

@ygrek It will not. It will notice that the then clause is terminated by a semicolon and thus should be surrounded by braces (or begin ... end) as in

if condition then (
  begin match x with
  | stuff -> blah ()
  end;
);
other_stuff ();
foo ()
@yallop

This comment has been minimized.

Copy link
Member

commented Nov 4, 2015

Could you elaborate, I don't quite follow what you mean?

Sure. The stated goal is to have a form of if whose syntax is such that the body includes both the versions of code above (i.e. the one with >>= fun () -> and the one with ;).

The proposal is to introduce a second form of if, like this:

if condition do
  expression
done

I have a slight preference for using when instead of if, like this:

when condition do
  expression
done

One reason I prefer when is that I think it's better to have different keywords for different forms of expression. Once you see when it's immediately clear what's going on, whereas the difference between if condition begin ... end and if condition do ... done is comparatively subtle, both syntactically and semantically. Another reason I prefer when is that it's familar from elsewhere.

if ... then ... else ... suffers from the same issue when followed by a ;.

That's true, but the single-case if is naturally imperative, since it has to return () when the condition is false. The typing in the proposed two-branch if do ... done else etc. is slightly artificial, in that there's no semantic reason that it should have unit type, unlike while and for.

@yallop

This comment has been minimized.

Copy link
Member

commented Nov 4, 2015

I also wonder whether the problem could be solved with better tools. Warnings are one possible area to improve, as discussed above. Additionally, you could avoid the problem altogether by automatically indenting your code. ocp-indent indents the first example above like this:

if condition then
  begin match x with
    | stuff -> blah (); Deferred.unit
  end
  >>= fun () ->
  other_stuff ();
  foo ()

and the second example like this:

if condition then
  begin match x with
    | stuff -> blah ()
  end;
other_stuff ();
foo ()

which makes it pretty clear what's going on. With an editor configured to indent code automatically the problem should show up during refactoring.

@yminsky

This comment has been minimized.

Copy link

commented Nov 4, 2015

I feel pretty strongly that ocp-indent is not a sufficient answer to this issue. For one thing, lots of people (ourselves included) use whitespace insensitive diffs when reading code, which means that whitespace only changes that should be significant may be missed. But more than that, OCaml is not a whitespace-sensitive language, and one shouldn't need whitespace to have a decent shot at interpreting OCaml code. It's worth noting that these cases can show up in single line contexts as well. The semantic difference between these two cases:

if cond then let x = v in expr1; expr2
if cond then expr1; expr2

will not be made less confusing by fixing indentation.

@yminsky

This comment has been minimized.

Copy link

commented Nov 4, 2015

@yallop: do you also think when is the right choice for the "when else" case, e.g.:

when cond do
   ...
done else when cond do
    ...
done else do
    ...
done

My intuition is that if is clearer in this case, but I'm not certain.

@yminsky

This comment has been minimized.

Copy link

commented Nov 4, 2015

@lpw25: I'm unsure about the must-return-unit rule. How about this kind of example:

let some_value =
  if cond do
      log_something;
      value
  done else do
      log_something_else;
      other_value
  done
in
expr

This seems like a natural case where you'd want your terminated if to actually return a value.

@ygrek

This comment has been minimized.

Copy link
Contributor

commented Nov 4, 2015

@yminsky, but in this case regular if then works and code will not compile if one omits begin end

@ygrek

This comment has been minimized.

Copy link
Contributor

commented Nov 4, 2015

@Chris00 in your example there is already begin end after then so

thus should be surrounded by braces (or begin ... end)

already holds

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2015

I like it and Xavier is neutral. The best proposal I've seen so far is @yminsky's: warn when an if doesn't have parentheses (or begin end) either around it or around its last subexpression. We probably only want to warn when the if is followed by a semicolon.

We may also want a warning for the dangling else (@trefis's latest example). That should be easy to do by pattern-matching on the AST.

Well, the only warning that works without a new syntax (and without forcing parentheses on all ifs) must warn on the following cases:

  • A single case if whose body is not wrapped in parentheses and is followed by a ;.
  • An else whose body is not wrapped in parentheses and is followed by a ;.
  • A single case if whose body has a sub-expression wrapped in parentheses but those parentheses are not the whole body. (e.g. if ... then (foo; bar) >>= fun _ -> ... ).
  • An else whose has a sub-expression wrapped in parentheses but those parentheses are not the whole body. (e.g. if ... then ... else (foo; bar) >>= fun _ -> ... ).

In order to catch dangling else it should also catch:

  • A single case if whose body is not wrapped in parentheses.

These restrictions essentially force if ... then ( ... ) else ( ... ) to have the same behaviour as if .. do ... done.

Personally, I think that having context-dependent syntactic restrictions like this (i.e. ( ... ) behaves differently just after then and else) is uglier and harder to learn than an alternative syntax, but there is no accounting for taste.

@Chris00

This comment has been minimized.

Copy link
Member

commented Nov 5, 2015

A single case if whose body has a sub-expression wrapped in parentheses but those parentheses are not the whole body.

That is good to trigger the warning if the original code was:

let f =
  if condition then
    begin match x with
    | stuff -> blah (); Deferred.unit
    end
    >>= fun () ->
    other_stuff ();
    foo ()

(no final semicolon).
I suppose the warning would then also be triggered by

if cond then print(f x)

For practical purposes, isn't it sufficient to trigger the warning only if the parenthesized expression is the first one of the then clause?

Also, while we are at it, what would be the rules for match with and try with?

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2015

@damiendoligez: I'm now more or less agreeing with @lpw25.

If we want the warning to really remove ambiguities, the only way to disable it will be to put parentheses over the whole if expression (putting them only on the then or else branches is definitely not sufficient).

This means that "correct" code would now look like

 (if cond then print "a");
 (if cond then print "a" else (print "b"; incr r));

etc...
Technically, this is ok, but I'm afraid users will then be claiming for a better syntax...
Weakening the warning just to avoid ugly code seems to be an attempt at guessing the user's intent, and is not much different from looking at indentation.

Looking at code like ctype.ml could give a good idea of what it would look like, since the code is a mix of functional and imperative.
The are indeed some occurrences of "if ... then ...; ...", maybe less than I thought first.
And I come now to realize that many of them are "if cond then raise Ex; ...". In your perspective of intent-based warning, this would probably not require a warning, as doing something after raise is meaningless.

After this small experiment my conclusion is that, by heart, I would prefer a full solution, and would happily convert my code to do ... done (of course, not force others to do that); but intent-based warnings which would not trigger for raise (use the typing to detect that again?) would require less changes.

Btw, anyone knows of an easy way to prove that we indeed remove all ambiguity over if?
A condition before doing anything it that this should be correct.

Another concern for warnings is frequently used idioms.
For instance, I often write

if cond then ... else
let x = ... in
...

This may look ambiguous to beginners, but once you're used to it this is very natural indeed.
My understanding is that none of the warnings proposed here would fire on this idiom, but there may be other idioms around.

@lefessan

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2015

I agree with Jacques that this problem would need a full solution. Yet, I am not happy with adding an alternative if ... do ... done to if ... then... else instead of just extending the current syntax, however heavy it might look at first glance.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2015

For practical purposes, isn't it sufficient to trigger the warning only if the parenthesized expression is the first one of the then clause?

Yes. I wasn't clear but this is the condition I meant.

@camlspotter

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2015

Sorry, arriving too late but I do not see any big benefit to introduce a new syntax for this problem.
Is it enough to use PPX to ban all if e then e' and force users to write if e then e' else ()?

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2015

On 2015/11/07 11:00, camlspotter wrote:

Sorry, arriving too late but I do not see any big benefit to introduce a new syntax for this problem.
Is it enough to use PPX to ban all if e then e' and force to write if e then e' else ()?

The problem occurs with else too, so this is not sufficient

if cond then … else expr1; expr2

Here whether expr2 should be included in the else clause or not is ambiguous.
For me, the point of having a new syntax would be to make it completely clear, always.

The other problem is verbosity:

if cond then begin …; … end else ()

is longer (and arguably less clear) than

if cond do …; … done

By the way, I’m not convinced yet that if-do-done solves all problems.

let wait () = print_string “Press enter\n”; read_line ()

let g () =
if !silent then "direct” else wait (); "waited”

This code returns a string, so it cannot be replaced by do done, but
it is well-typed and still suffers from the semicolon ambiguity problem.

@yminsky

This comment has been minimized.

Copy link

commented Nov 7, 2015

@garrigue I believe in the case you describe, one would have to write:

let wait () = print_string “Press enter\n”; read_line ()

let g () =
  if !silent then "direct” else begin wait (); "waited” end

I agree that there's something odd about the fact that the "do" notation doesn't work in this case, but there is an escape hatch.

@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2015

@garrigue said:

let wait () = print_string “Press enter\n”; read_line ()

let g () =
    if !silent then "direct” else wait (); "waited”

This code returns a string, so it cannot be replaced by do done, but it is well-typed and still suffers from the semicolon ambiguity problem.

That is true, however in this case warning 10 will trigger (or an error if you're using -strict-sequence). This doesn't point exactly to the error we are considering in this PR but still indicates that the code is dubious which is what we want to hear.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2015

@yminsky:
Yes, you're right, I was confusing issues.
If we assume that the if-then-else is not followed by another semicolon (which is reasonable since it intentionally returns a string), then we should just write it as now, using begin end.
And if we want the other interpretation, if-do-done works.
An extra advantage is that since if-do-else-done does not try to unify the types of the two branches, it avoids giving a confusing error message for imperative uses.

The important conclusion is still that, if we add if-do-done, we can just warn on any if-then or if-then-else followed by a semicolon, and this without restricting the language.

@yminsky

This comment has been minimized.

Copy link

commented Nov 7, 2015

@garrigue:
Precisely right.

As to the question of the inelegance of adding a second if statement, It's worth noting that lots of languages have a separation between an imperative if statement and an expression-oriented one. We're really just mimicking that structure here.

It's also worth asking the question of, if you were starting from scratch, what better solution would one pick? Would you require an endif? That puts an extra syntactic tax on expression oriented code, which the imperative-if proposal doesn't. It might be that separating out the imperative case and requiring extra syntactic markers only then is actually the right solution, even not taking history into account.

Another question is concision of the if/do syntax. On some level, it would be nice to dispense with the multiple dones. e.g., rather than writing:

if <cond> do
    <body>
done else do
   <body>
done

one might prefer

if <cond> do
   <body>
else
   <body>
done

But then the question arises of how to handle the "else-if" pattern. It works fine in @lpw25's proposal:

if <cond> do
   <body>
done else if <cond> do
   <body>
done else if <cond> do
   <body>
done else do
   <body>
done

But it's not clear how to support this with a more lightweight syntax. You could do something like this, I suppose, but it then has the odd property that do's and dones aren't matched up.

if <cond> do
   <body>
else if <cond> do
  <body>
else if <cond> do
  <body>
else do
  <body>
done

I have mixed feelings about which choice is preferable, but the above does seem harder to think about, and so is probably a bad idea.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2015

What's the problem with the following?

if <cond> do
  <body>
else if <cond> do
  <body>
else if <cond> do
  <body>
else
  <body>
done done done

It just reflects the structure of the code. And it's very hard to forget one.

@jordwalke

This comment has been minimized.

Copy link

commented Nov 8, 2015

Please excuse my interjection, despite the fact that I have contributed zero value to the OCaml community otherwise. I hope you will give my thoughts serious consideration anyway.

  1. I currently appreciate that the OCaml AST maintains some nice properties that I think become compromised when it introduces states that exist merely to resolve parsing conflicts and/or parsing pitfalls. For example, I greatly appreciate that the AST doesn't store additional nodes when you group expressions in unnecessary parenthesis ("ghost expressions" are preferred). I would hope that this PR could be modified to provide parser related warnings in some kind of earlier stage warning system instead of introducing new AST node types. Much like how a build system might implement warnings messages that concern common build/packaging issues, a parser might implement its own warning messages that concern precedence or common pitfalls. I think this warning would be helpful, but it feels like it shouldn't be added in this particular way.
  2. I could list several issues with OCaml's current syntax, each of which could possibly be resolved by adding a new AST node type, while keeping the old node type (and then warning for error-proned use of the old form) - but for each addition, all of the tooling built on top of the OCaml AST must be upgraded, and becomes more complex (and I see it would even necessitate modifications to depend.ml and typecore.ml) - all just to make up for the deficiencies of something as superficial as the parser. I think any one Pull Request that solves one of these problems looks tempting, but if you were to take the set of 20 similar Pull Requests that I could send, it would be clear that a certain elegance of the AST becomes lost.
  3. I never run into this particular issue because I always guard my sequence operations in parenthesis. When I teach people the OCaml syntax, I tell them to do the same. If I were to build a new parser, I would warn any time the ; is left unguarded by containing parenthesis (at parse time of course). (And for cases where it isn't clear, I might use indentation as a heuristic for intent). How much of the original objective would be satisfied by that approach? Ever since I started using ocp-indent, I would catch more accidental precedence issues because the formatting would act as an alarm.

Finally, I feel that the core of many of OCaml's syntax pitfalls are related to the precedence of the unguarded semicolon. In this case, the poor if statement is merely having to pick up the check. I performed an experiment where I removed the ability to have unguarded semicolons, and 75% of the legitimate syntax gripes for OCaml vanished - including the necessity of the notoriously confusing double semicolon - it just vanished. What about all the other places where the semicolon ruins the party? Should we make special syntax nodes for them as well?

In this example, the semicolon binds tighter than we'd like it to (as opposed to more loosely as in the case of the if), but my observation is that they have the same root issue - the ability for semicolons to be not contained in parens.

let myAB = {
  a=fun i -> print_string "calling a";
  b=b;
}

The above example is parsed as:

let myAB = {
  a=fun i -> (print_string "calling a";
                   b=b);
}

While I wish semicolon didn't have this issue in the first place, it would be unfortunate to introduce a new AST node type (beyond a ghost expression) for the notion of "record field value that doesn't conflict with semicolon":

let myAB = {
  a=do_field fun i -> print_string "calling a" done_field;
  b=b;
}

Especially if it required changes to coretype.ml, and especially if using parenthesis will suffice:

let myAB = {
  a=(fun i -> print_string "calling a");
  b=b;
}

Thank you all for listening.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Nov 12, 2015

I think there is an ambiguity in the refactoring example: do we want to warn on the original code, on the wrongly-refactored code, or on both? I've been assuming the second, but I'm under the impression that some people here are assuming both.

@garrigue :

If we want the warning to really remove ambiguities, the only way to disable it will be to put parentheses over the whole if expression (putting them only on the then or else branches is definitely not sufficient).

That's right, assuming we only want to warn on the wrong refactoring.

Technically, this is ok, but I'm afraid users will then be claiming for a better syntax...

I'm not so sure. And even if we add a new syntax, we'll need this warning.

So we should add the warning, then see what happens and revisit the question of adding a new syntax in one year or so.

About the implementation, I agree with @jordwalke : since this new proposal is a purely syntactic warning, it should be implemented in the parser.

@gasche

This comment has been minimized.

Copy link
Member

commented Nov 22, 2015

It seems clear that there is no consensus on the proposed solution and little chance of getting one. It is nicely executed work, and it does highlight a deficiency of the OCaml syntax, but there is no chance of satisfying users with the proposal in its current form. I'm closing the pull request.

@gasche gasche closed this Nov 22, 2015
@trefis

This comment has been minimized.

Copy link
Contributor Author

commented Nov 22, 2015

I disagree. I planed to work on adding the warning (or some variation of
it) alone as per Damien's request.

Le dim. 22 nov. 2015 17:40, Gabriel Scherer notifications@github.com a
écrit :

Closed #278 #278.


Reply to this email directly or view it on GitHub
#278 (comment).

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Nov 24, 2015

@trefis Yes please, do add the warning. But this is not going into 4.03. My understanding is that Jane Street may want to give it a try internally.

@gasche please reopen this PR.

@gasche gasche reopened this Nov 24, 2015
@fpottier

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2015

I am coming late. My two cents:

  • Please DO NOT add a new form of "if" construct with a subtly different syntax. It will be impossible/shameful to explain to a beginner why we have two "if" constructs and why the recommended one is the one with the non-standard syntax. Teachers will hate you.
  • Warning about incorrect indentation sounds like a good idea (although, I suppose, not easy).

Francois.

@correnson

This comment has been minimized.

Copy link

commented Nov 25, 2015

Hi everybody ! My two cents (this PR will become rich).

Going back to the origin of the problem, where indentation is erased on purpose:

if E then A ; B

In most programming idioms (C,Java,...), this strictly reads as (if E then A) ; B. Otherwise, you should
add block-parentheses around A ; B, following the common rule that a statement is a single instruction or a block.

The troubling point in OCaml is that, now:

if E then let ... in A ; B

is parsed as if E then (let ... in A ; B), and this form is really disrupting.

If a new warning is created, I would expect to be warned on the second form, not the first one.
To warn against the first form, I think indentation should be taken into account.

Of course, depending on the desired coding style, both warnings might be meaningful.

@damiendoligez damiendoligez added this to the 4.04-or-later milestone Dec 17, 2015
@bluddy

This comment has been minimized.

Copy link

commented May 19, 2016

So it looks like it took a company to solve the problem that we weren't willing to do ourselves.

I guess at this point we could say that a syntax is just a modular front-end to OCaml. So maybe we can take a hint from Reason and fix the things that were broken in the way that functional language enthusiasts would prefer? If people are going to switch syntax anyway, why not give them a choice of a more modern OCaml syntax to select instead?

@murmour

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2017

This PR has been a fantastic read! It inspired me to propose a related feature: #1122.

@bcc32 bcc32 referenced this pull request May 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.