Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-safe-syntax option #715

Closed
wants to merge 3 commits into from
Closed

-safe-syntax option #715

wants to merge 3 commits into from

Conversation

bluddy
Copy link
Contributor

@bluddy bluddy commented Jul 25, 2016

Proposal

Two problems presently exist and have been discussed regarding OCaml syntax (see #278 and this reddit thread):

Problem with If...else

Because the if expression has no terminator, a common issue is that refactoring of code containing an if expression can result in incorrect behavior. The scenario generally involves something like

if x = 3 then 12

Later refactoring may add to the then clause:

if x = 3 then print_string "done"; 12

Which would cause the behavior to be incorrect. Many variations of this problem exist, all of which concern the lack of a terminator. To prevent this issue, many OCaml programmers proactively insert begin and end into every if expression.

This PR changes the if syntax to be

if x = 3 then 12 end

The small, amortized addition of end means that the if expression becomes completely safe in this regard, and that there is no need to ever use begin...end in this context.

Problem with deep matching

Deep matching also suffers from the lack of a terminator. Specifically, for the example

    type t = A | B
    type u = C of t | D

    let f = function
        | C _ -> 1
        | D -> 2

can later be augmented with some deep matching, which needs to be protected with parentheses when deep matching:

    let f = function
        | C x -> begin match x with
            | A -> 3
            | B -> 4
            end
        | D -> 2

This is an annoyance when refactoring, and can sometimes even cause incorrect behavior that isn't picked up by the type checker. As a result, some people proactively couch every match in begin...end.

This PR adds an end construct to each one of match...end, function...end, and try...with...end. This change makes it easy to refactor matches and to layer deep matches.

Examples

    let f l = function
        | C x -> match x with
            | A -> try List.find (fun i -> i = 1) with Not_found -> 3 end 
            | B -> 4
            end
        | D -> 2
        end
    let g x = if x = 1 then print_string "yes"; 1 else print_string "no"; 0 end

Command line argument

The change to the syntax is only in effect when passing the -safe-syntax option. This means that no code is impacted by default except for code which the user specifically chooses to convert to using safe syntax. For example, a company could decide that safe syntax is useful for the sake of reducing potential bugs and simplifying code, but reduce exposure to specific modules.

Goals

My goals in designing the solution presented here were:

  1. Minimize changes to the language. I could have tried to tackle the issue of function vs Caml-light's fun, or reversing type application. I believe this would have prompted endless bike-shedding, as it should have -- these changes are far more subjective. Instead, I targeted what I consider to be design issues in the syntax that justify breaking backwards-compatibility, and I think these changes should be popular enough that within a few versions, -safe-syntax will be the default.
  2. Uniformity. end is the default terminator for OCaml (see struct, object, sig). While I could see the argument for making for and while also use end, there is also value in differentiation, and doing so would also increase the surface area of the proposed changes, affecting goal 1.
  3. Simplicity. Fixing these 2 problems in the syntax is most easily done at the syntax level, and this change is IMO superior to any option that just makes things more complicated. This patch is inspired by the -safe-string option. Within a few versions, I envision all OCaml code being -safe-syntax compatible. Additionally, since all forward-looking, refactor-proof OCaml code already wraps if and match with begin...end, the changes to existing codebases are trivial and the changes to the mindset of programmers are trivial as well.

Some have suggested that backwards-compatibility should trump here, and I disagree. #278 adds to the complexity of the language. Appropriate tools are brittle and hard to design, and not everyone can rely on them.

Backwards Compatibility

Of course, changing the syntax in a fundamental way breaks backwards compatibility. I believe this is still the correct approach. For old compilers (due to LTS releases/other reasons), a conversion tool exists, placing the onus on build systems (ocamlbuild/Jenga/omake/ocp-build) to support -safe-syntax in old compilers.

With this tool, build systems (ocamlbuild, Jenga, ocp-build, omake) could convert every safe-syntax file to unsafe-syntax on the fly and compile it with an older OCaml compiler. In the first stage (when -safe-syntax is opt-in in version X), they would do so for specific files that ask to be compiled in -safe-syntax mode if the system compiler is earlier than version X. In the future (when -unsafe-syntax is opt-in at version Y), they will do so for all files if the system compiler is less than X, or just supply -safe-syntax for most files if the version is >= X and < Y.

Note

This PR would benefit from #292, which would allow code sharing between parsers and therefore reduced duplication.

@const-rs
Copy link
Contributor

const-rs commented Jul 26, 2016

May be bash-like "fi" would be better than end? First - it does not conflict with begin...end block, second - it tells that the opening operator is "if".

Overall, I've seen the idea that all operators like for, while, if, match and etc should have different endings.

@garrigue
Copy link
Contributor

You should have a look at #278. This was an attempt at adding such a safe construct without breaking backward compatibility. But the requirements for any change to syntax are very high.

@alainfrisch
Copy link
Contributor

alainfrisch commented Jul 26, 2016

I don't think that fragmenting the syntax is the way to go. Every time you look at a code snippet on Stack Overflow or read some training material, you'd need to be aware of which syntax was used and to do the translation if your local style doesn't match. Add to that the fragmentation caused by "alternative standard libraries" and "concurrency libraries", plus Reason of course, and everyone will end up with its own dialect. Please avoid this Tower of Babel.

I think the problem is not extremely acute: automatic indentation by the editor and the OCaml type system would detect most cases. This can still be misleading to beginners, and I agree it might be worth doing something about it.

For me, a more realistic direction is to encourage people wrapping their match ... statements with begin...end (or parenthesis) when used in "tail position" under another match/function.

What about adding a warning that would fire e.g. on:

   let f = function
        | C x -> match x with
            | A -> 3
            | B -> 4
        | D -> 2

In addition to the type error (which might be hard to understand), the compiler would report that the inner matching (in tail position under the unique clause of the outermost one) should be delimited by begin...end.

(Technically, the parser could keep the information that a pattern-matching is protected by parenthesis or begin...end as an attribute in the Parsetree, and a simple traversal before type-checking would report cases as above.)

Alternatively (or in addition to that), the warning could be triggered by non-aligned clauses (I'm less fond of this approach, though).

@yminsky
Copy link

yminsky commented Jul 26, 2016

For what it's worth, we're currently trying out a warning version of this, but it's quite tricky. The case we've tackled thus far is the if-then-else case, and the solution has been to require begin/end or parens in nearly every case, and, importanlty, to require the parens to actually demarcate the block structure. This is a real subtlety that defeated many of our earlier attempts.

Consider this code snippet:

if some condition then 
   some thing
else (
   do something;
   do something_else)
>>= fun () ->
some other_thing

The confusing bit is that the bind is actually part of the else, rather than being bound on the full if/then/else expression. So the presence of parens isn't enough to make things OK. To make it explicit, we have the rule that the pair of parens starting after then then or else is required to demarcate the full block.

With our check in place, the following does not compile, and one has to write something like:

if some condition then 
   some thing
else (
   do something;
   do something_else
   >>= fun () ->
   some other_thing
)

Whitespace only solutions don't seem particularly good to me. Forcing indentation helps, but good diff tools sometimes obscure whitespace-only changes (which is a feature, not a bug), which makes it easier to miss whitespace-only clues. That said, requiring indentation definitely helps.

@alainfrisch
Copy link
Contributor

See #716 for an experiment with a warning for nested pattern matching.

@bluddy
Copy link
Contributor Author

bluddy commented Jul 26, 2016

@const-rs: fi not only makes me shudder since it reminds me of one of the worst languages ever devised (bash) but it goes against the goals of a) minimizing the surface area of changes to the language and b) uniformity. end is the de-facto terminator for OCaml (given struct, object etc). There's a case to be made for changing for and while to use end as well, but I believe that may be pushing uniformity too far (there is value in differentiation as well), and increasing the change's surface area, which goes against a design goal.

@garrigue: I'm aware of #278 and I believe trying to add to the language to fix basic design issues to be cumbersome and a mistake. It's much cleaner to make this small change than it is to emphasize backwards-compatibility in this instance IMO. The change is minuscule given the fact that forward-looking programmers wrap their if expressions and matches in begin...end anyway (and it doesn't even help for if, as detailed in #278).

@alainfrisch: It's not currently clear from the description, but this will (eventually) not be fragmentation. I see this as eventually being normative OCaml syntax, just as all strings will be immutable. For example, companies like Jane Street will have a massive incentive to convert all of their internal code to -safe-syntax. Within a few versions, all OCaml code (except for really old legacy code) will be converted, and the defaults can be switched. The conversion process is also incredibly simple due to how small the suggested changes are.

This isn't something that should be done by warning, or relying on a tool. It was a syntax design oversight, and It needs to be addressed at the syntax layer. One cannot mix unterminated if expressions with a language that possesses statements, and matches require termination in the general case. Given how easy the fix is, I don't see a strong argument against it.

@alainfrisch
Copy link
Contributor

I think you're overly optimistic that everyone will jump at it and adapt their code base quickly. This would take several years at best (even if automated tools are provided). The thing is that experienced users won't see the benefits of the new syntax and will actually suffer from it: they are usually not bitten by the problem, and the proposed syntax adds weights for simple cases where the closing delimiter is not required. What would be the "massive incentive" to switch?

And even if everyone switched, beginners will still be confused by materials referring to the "old" syntax.

@Drup
Copy link
Contributor

Drup commented Jul 26, 2016

The thing is that experienced users won't see the benefits of the new syntax and will actually suffer from it: they are usually not bitten by the problem, and the proposed syntax adds weights for simple cases where the closing delimiter is not required. What would be the "massive incentive" to switch?

I consider myself a mildly experienced user: I disagree with everything this paragraph contains. I would switch in an heartbeat if backward-compatibility was not a concern.

@yallop
Copy link
Member

yallop commented Jul 26, 2016

It would be useful to have explicit delimiters for the ends of matches.

But I don't think forking the syntax is a good idea.

Wrapping begin ... end or parentheses around a match does the trick, but it always feels to me like working around a shortcoming in the syntax.

I'd prefer to add optional delimeters around cases to the language, like this:

match e with
 ( A -> e₁
 | B -> e₂ )

or perhaps, following the revised syntax, like this:

match e with
[ A -> e₁
| B -> e₂ ]

Then the nested case would look like this, which seems like a small but genuine improvement over the current way of working around the problem:

match e with
[ A -> match e₁ with
       [ A -> e₂
       | B -> e₃ ]
| B -> e₄ ]

This change would be backwards-compatible, and wouldn't require any new compiler flags. And it would be easy to add a warning as in #716 for undelimited matches in tail position.

@const-rs
Copy link
Contributor

const-rs commented Jul 26, 2016

This would take several years at best (even if automated tools are provided).

It is too optimistic. Most Linux distros will just stick with ocaml4 for decade.

Inner-match warning would be very nice - usually this place triggers confusing error anyway.

Btw, issuing new warnings is not very safe. There are many projects that are compiled with smth like -Wall. And when you add new warning, it sometimes breaks the build. Everything is fine, if you are a developer of the program. But if you are just a packager? You have to go deep inside the build system or the program to eliminate the warning.

So, may be it is just possible to add couple rules to static analyzer? It can have false positives (unlike compiler). So, it can warn about inconsistent code and formatting, two identical things in if else and etc.

@dbuenzli
Copy link
Contributor

Btw, issuing new warnings is not very safe. There are many projects that are compiled with smth like -Wall. And when you add new warning, it sometimes breaks the build.

This is not a very good argument. These projects are doing it wrong, you should not distribute code with -warn-error enabled. It is the duty of packagers to check this and report it upstream if that is the case.

@const-rs
Copy link
Contributor

const-rs commented Jul 26, 2016

There is another, easier option - just stop following compiler upstream until all the libraries are get fixed.

@alainfrisch
Copy link
Contributor

I would switch in an heartbeat if backward-compatibility was not a concern.

People are even reluctant to use a new stdlib function because they want their libraries to work with OCaml version foobar. So just bringing some extra protection will not be a sufficient argument to have them switched.

@yallop's proposal is more realistic and could be easily combined with #716 (simply consider the new form to be delimited, and in the warning message, suggest to use that form instead).

@gasche
Copy link
Member

gasche commented Jul 26, 2016

@yallop: I'm fine with adding delimiters around clauses, but I think that it is important for | to still be allowed optionally after the opening bracket, which was not done in the revised syntax (this was a defect, I think). I would then recommend the following indentation that makes it easier to reorder cases or add new ones, and is thus better style:

match foo with (
  | ...
  | ...
)

@dbuenzli
Copy link
Contributor

@gasche at that point why not require all matches to be begin match ... end or (match ... ) since () and begin end are supposed to be used interchangable ?

@dbuenzli
Copy link
Contributor

dbuenzli commented Jul 26, 2016

(I personally like @yallop's proposal since it corresponds to the polyvar definition syntax.)

@gasche
Copy link
Member

gasche commented Jul 27, 2016

Note that variant polymorphic variant types accept [ | ...] (an optional starting separator after the opening delimiter), which was my request.

@yminsky
Copy link

yminsky commented Jul 27, 2016

I think you're overly optimistic that everyone will jump at it and adapt their code base quickly. This would take several years at best (even if automated tools are provided). The thing is that experienced users won't see the benefits of the new syntax and will actually suffer from it: they are usually not bitten by the problem, and the proposed syntax adds weights for simple cases where the closing delimiter is not required. What would be the "massive incentive" to switch?

I quite strongly disagree with the background idea here, which is that solving this syntactic problem doesn't add much and that experienced users don't suffer from the problem. Even though experienced users make this kind of mistake rarely, it is a disturbingly large fraction of the mistakes they do make, and it's a hard mistake to find. Every experienced user I've spoken about this with at Jane Street thinks it's one of the biggest design flaws in the language, and certainly the worst thing about the syntax.

I don't know that the present solution is a good one, and Alain's point that we should expect migration to be slow is reasonable; but this is a serious issue that deserves to be addressed.

@alainfrisch
Copy link
Contributor

@yminsky Which mistake in particular are you talking about? The one you mentioned above (with the if-then-else and >>=)? I suspect this might depend on the coding style (and use of operators). Do you also often see problems related to pattern matching?

@alainfrisch
Copy link
Contributor

but this is a serious issue that deserves to be addressed.

Do you think that a solution that notifies users about the problem (e.g. warnings based on indentation or (lack of) explicit grouping) would be sufficient, or do you think the problem deserves a change to the syntax?

@yminsky
Copy link

yminsky commented Jul 27, 2016

Apologies in advance: this is mostly about the if syntax, and so is a little off topic. But that's mostly because the if syntax is the more severe problem. I suspect that we should approach if and match in very similar ways.

This may go without saying, but warning users is essentially never enough: in practice, I've only seen benefits when you turn warnings into errors. But, since all warnings can be made into errors, that may be enough.

A few comments:

  • Experientially, the if syntax causes way more problems than nested pattern matching, which is why we attacked that problem first.
  • The primary problems with if syntax are with ordinary imperative programming (i.e., use of semi-colon), but we've seen the bind-like version of this problem in the wild as well.
  • Our current solution for the if syntax is to always require explicit demarcation of the then and else blocks, typically by parens, though we consider a few other cases to be clear enough without (lists, records, atoms). This is a little heavy, but is at least clear. For us, this is probably adequate, but the syntax is somewhat awkward.

The thing that's missing from our solution is clear demarcations of the blocks that are semantically distinct. I think the curly-brace languages are better in this regard, e.g.:

if cond { <then-expr> } else { <else-expr>}

is both clearer and lighter than

if cond then ( then-expr ) else ( else-expr )

What's particularly odd about the latter is that it can be broken by an operator, e.g.:

if foo { bar x } else { snoo y } + 5

is clear enough, but

if foo (bar x) else (snoo y) + 5

won't compile with the warning on, because the true extent of the block is (snoo y) + 5. As such, to make this compile, you have to write:

(if foo (bar x) else (snoo y)) + 5

None of this is especially tragic, but it highlights that the underlying check is awkward.

That said, my all in view is that we should try the warning route first. My proposal would be to upstream the rule we've set up internally as a warning, and see if people find it acceptable, and if so, we should move to turn the warning on by default.

For the match syntax, my best guess is that the right rule is to require a begin/end around nearly every match, similar to your proposal in #716

@alainfrisch
Copy link
Contributor

Thanks @yminsky . This is really not off-topic since this PR is about both if and match.

if foo { bar x } else { snoo y } + 5

I think that even if this were parsed as if foo { bar x } else { snoo y } + 5, I would still use parentheses around the if expression and I would'nt mind enforcing this style with a warning.

I know you did not actually propose to add such a curly-brace syntax, but it's worth noting that it would heavily conflict with records.

The primary problems with if syntax are with ordinary imperative programming (i.e., use of semi-colon), but we've seen the bind-like version of this problem in the wild as well.

I agree with that. The fact that one can write:

  if ... then
    let () = () in
    foo ();
    bar ();

but removing the let-binding changes the scope is really bad.

@bobzhang
Copy link
Member

@gasche I like your ideas, FYI reason (CC @jordwalke @chenglou) used { as delimiter, since it really does not matter which delimiter is used here, maybe we can make it consistent if we are moving forward

@yallop
Copy link
Member

yallop commented Jul 27, 2016

I've opened a new PR (#722) to discuss delimiters for match cases.

@hcarty
Copy link
Member

hcarty commented Jul 27, 2016

@yallop From a language user standpoint, I'm not sure I understand the benefit of adding delimiters around cases rather than an explicit end. Why is

match e with
[ c1 -> e1
| c2 -> e2 ]

better than

match e with
| c1 -> e1
| c2 -> e2
end

particularly when end can be used for if as well?

Is the argument that the internal changes are less invasive for delimited cases?

@bobzhang
Copy link
Member

@hcarty I like delimiters better than end, it is more concise, and most editors can highlight delimiters without any configuration

@yallop
Copy link
Member

yallop commented Jul 27, 2016

@hcarty: if end had been required from the start I think that would have worked well. However, at this point the end would have to be optional, for backwards compatibility. But match with optional end is more than a little tricky to parse. Consider the following code:

begin match e with
| c1 -> e1
| c2 -> e2
end

It's not easy to determine whether the end belongs to the match or to the begin.

@bluddy
Copy link
Contributor Author

bluddy commented Jul 27, 2016

@alainfrisch: noted. I'll correct it above.

@bschommer: my point was that you don't need syntax-aware tools for this change. A simple regex tool is enough, which means that every single build system can easily integrate this change as outlined above, and -safe-syntax code can therefore effectively be compiled in old OCaml versions that don't support it.

@yallop
Copy link
Member

yallop commented Jul 27, 2016

A simple regex tool is enough

(* I decided to begin try the regex approach, and it doesn't seem to
   begin function as expected on comments and strings. *)

@bluddy
Copy link
Contributor Author

bluddy commented Jul 27, 2016

@yallop: good point.

@alainfrisch
Copy link
Contributor

Well, breaking the text in comments is not too bad given the context (allow the code to be compiled by old compilers). Same example with strings, though, and this is more problematic.

@const-rs
Copy link
Contributor

const-rs commented Jul 27, 2016

Sed solution breaks program

if true then
    Printf.printf "HI\n"
else
    Printf.printf "HO\n";

Printf.printf "End program\n"

transforming it into


if true then begin
    Printf.printf "HI\n"
end else begin
    Printf.printf "HO\n";

Printf.printf "End program\n"

@bluddy
Copy link
Contributor Author

bluddy commented Jul 27, 2016

Just a note: I've added a more complex tool that should handle comments, strings, and quoted strings here. It uses regex + a simple state machine. This wouldn't be hard to use in any build tool, though it's admittedly a little more complex then just using sed.

@const-rs: your safe-syntax program has a syntax error, since you didn't terminate your if with an end. In any case, to handle strings and comments, a slightly more robust solution is needed, which I provided above.

@gasche
Copy link
Member

gasche commented Jul 27, 2016

I think that such conversion tool should have the support of an existing OCaml lexer to work correctly (for example the upstream lexer via compiler-libs, or the one of ocp-indent, etc.). Your new tool currently fails on the following input, which shows that the issue is tricky enough:

let () = (* beware of comment/string "nesting *)" *) if true then () else ()

@bluddy
Copy link
Contributor Author

bluddy commented Jul 27, 2016

@gasche I had no idea we parsed comments that way. Why do we do that?

@bluddy
Copy link
Contributor Author

bluddy commented Jul 27, 2016

Anyway, I concede that my hacked-together tool isn't sufficient.

@DemiMarie
Copy link
Contributor

I have thought of this problem before, and come up with a solution: 'end. I believe that 'end is currently always a syntax error, so this is backwards compatible.

One solution would be to issue a warning if the situation was at all ambiguous. But that might require some situations to be like:

(if x then (a + b) else (c + d))

I think that using end is cleaner. Moreover, it would make the language significantly easier to parse. But it does break backwards compatibility. F# solves this problem by using indentation-sensitive syntax.

@DemiMarie
Copy link
Contributor

While we are at it, could we make trailing semicolons and any appearance of ;; syntax errors? For the toplevel, I think either EOF, 2 consecutive newlines, or a newline with correct syntax up to that point would be enough.

One other wart that I don't like is the double meaning of let. I would much rather see something like val used for toplevel bindings just like in signatures (The revised syntax uses value for both purposes).

@schrodibear
Copy link

Probably Python-style alternatives to if-then-else, match and function (or even fun) with colons can be introduced:

  if condition then: expression1 else: expression2 end

,

  match expression with: cases end

and

  function: cases end

or

fun: cases end

Colon can be allowed to be separated from the previous keyword and the then, else and with keywords can even be omitted (since type annotations have mandatory parentheses) to encourage the use of the new form as as a shorter one (e.g. if signed : "s" : "u" end ^ typ instead of (if signed then "s" else "u") ^ typ and match sign : Plus -> 1 | Minus -> -1 end * n instead of (match sign with Plus -> 1 | Minus -> -1) * n ) .

@bluddy
Copy link
Contributor Author

bluddy commented Jul 28, 2016

Update: I now have a tool for converting between 4.03 syntax and safe syntax, and vice versa. It doesn't preserve comments yet since it uses the existing OCaml compiler infrastructure, but that should be good enough for now for converting from safe_syntax to earlier syntax (4.03), which means build tools can use this to provide instant compatibility for new syntax.

Thanks to the compiler devs for creating the really great pprintast.ml! So easy to modify. I'll take advantage of the object-oriented de-duplication aspects once I have the time.

Feel free to check out the tool on your own code and see what it looks like in safe_syntax mode. Given the existence of this tool (and I'm sure it needs some love), I think this proposal becomes viable.

What do you guys think?

@paurkedal
Copy link

I generally like the proposal, except for changing the current if-then which allows structuring the code as

let rec find x = function
  | Empty -> None
  | Node(l, v, d, r, _) ->
      let c = Ord.compare x v in
      if c < 0 then find x l else
      if c > 0 then find x r else
      d

It should be possible to introduce if ... then ... end along with the current if ... then ... else ..., though it only solves half the issue with imperative code.

On another note, the requirement of end can be relaxed for parenthesised expressions, since the compiler can assume it as soon as it hits an end-parenthesis which would otherwise be unbalanced.

@paurkedal
Copy link

paurkedal commented Aug 30, 2016

If we give up the goal of uniformity, and introduce two new keywords to optionally terminate patterns and conditions, I think we could keep backwards compatibility and keep support of undelimited if-then-else. The terminator would attach to the closet construct of the right kind if present. When writing pure code, one would only use the pattern-terminator, which would never terminate a conditional. I don't have good suggestions for what the keywords should be though. endif is probably ok, but matched or endmatch seem too long.

Ups: I overlooked the fact that an optional endif does not help in the context of semicolons without a prefixed let, due to the lower precedence of the semicolon.

@murmour murmour mentioned this pull request Mar 26, 2017
@damiendoligez damiendoligez added this to the long-term milestone Sep 28, 2017
@alainfrisch
Copy link
Contributor

I see quite a bit of support for the proposal, but honestly, I don't see it happening that we maintain two versions of the parser in parallel, nor that the community switch to the new syntax fast enough to drop the old one quickly.

Moreover, even if automated tools were available to convert code bases, this would still break available literature and code snippets around.

And, but this is more a matter of taste: I think that adding warnings such as #716 would be enough to avoid mistakes, while retaining the light current syntax for common cases, and without breaking backward compatibility.

So I'm tempted to close the issue, unless another maintainer feels strongly enough in favor of the proposal enough to shepherd it.

@gasche
Copy link
Member

gasche commented Nov 6, 2017

Personally I agree with the downsides you mention, but I think that there are changes to the ecosystem that could mitigate them. ocaml-format was just released and brings the hope of having more robust automated code-evolution tools (but this hope is still in the future), and I would be more confident about evolving/maintaining parsers if I manage to propose a Menhir migration. I think today is too early to consider alternative syntaxes (we haven't tended to all the camlp4-depreciation growing pains yet), but that it may change in the medium term.

@alainfrisch
Copy link
Contributor

Ok, closing for now. If the eco-system moves to a state where changing the concrete syntax becomes significantly easier, we will revisit.

@nobrowser
Copy link
Contributor

For what it's worth, we're currently trying out a warning version of this, but it's quite tricky. The case we've tackled thus far is the if-then-else case, and the solution has been to require begin/end or parens in nearly every case, and, importanlty, to require the parens to actually demarcate the block structure. This is a real subtlety that defeated many of our earlier attempts.

Consider this code snippet:

if some condition then 
   some thing
else (
   do something;
   do something_else)
>>= fun () ->
some other_thing

The confusing bit is that the bind is actually part of the else, rather than being bound on the full if/then/else expression. So the presence of parens isn't enough to make things OK. To make it explicit, we have the rule that the pair of parens starting after then then or else is required to demarcate the full block.

With our check in place, the following does not compile, and one has to write something like:

if some condition then 
   some thing
else (
   do something;
   do something_else
   >>= fun () ->
   some other_thing
)

Whitespace only solutions don't seem particularly good to me. Forcing indentation helps, but good diff tools sometimes obscure whitespace-only changes (which is a feature, not a bug), which makes it easier to miss whitespace-only clues. That said, requiring indentation definitely helps.

@yminsky , in your example neither the sequence nor the bind is in fact essential, right? I mean, you could just as well say that the following is confusing:

if not flag then 0 else ( i + 1 ) + j

assuming i and j are int of course

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet