New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-safe-syntax option #715
-safe-syntax option #715
Conversation
May be bash-like "fi" would be better than end? First - it does not conflict with begin...end block, second - it tells that the opening operator is "if". Overall, I've seen the idea that all operators like for, while, if, match and etc should have different endings. |
You should have a look at #278. This was an attempt at adding such a safe construct without breaking backward compatibility. But the requirements for any change to syntax are very high. |
I don't think that fragmenting the syntax is the way to go. Every time you look at a code snippet on Stack Overflow or read some training material, you'd need to be aware of which syntax was used and to do the translation if your local style doesn't match. Add to that the fragmentation caused by "alternative standard libraries" and "concurrency libraries", plus Reason of course, and everyone will end up with its own dialect. Please avoid this Tower of Babel. I think the problem is not extremely acute: automatic indentation by the editor and the OCaml type system would detect most cases. This can still be misleading to beginners, and I agree it might be worth doing something about it. For me, a more realistic direction is to encourage people wrapping their What about adding a warning that would fire e.g. on: let f = function
| C x -> match x with
| A -> 3
| B -> 4
| D -> 2 In addition to the type error (which might be hard to understand), the compiler would report that the inner matching (in tail position under the unique clause of the outermost one) should be delimited by (Technically, the parser could keep the information that a pattern-matching is protected by parenthesis or Alternatively (or in addition to that), the warning could be triggered by non-aligned clauses (I'm less fond of this approach, though). |
For what it's worth, we're currently trying out a warning version of this, but it's quite tricky. The case we've tackled thus far is the if-then-else case, and the solution has been to require begin/end or parens in nearly every case, and, importanlty, to require the parens to actually demarcate the block structure. This is a real subtlety that defeated many of our earlier attempts. Consider this code snippet:
The confusing bit is that the bind is actually part of the else, rather than being bound on the full if/then/else expression. So the presence of parens isn't enough to make things OK. To make it explicit, we have the rule that the pair of parens starting after then then or else is required to demarcate the full block. With our check in place, the following does not compile, and one has to write something like:
Whitespace only solutions don't seem particularly good to me. Forcing indentation helps, but good diff tools sometimes obscure whitespace-only changes (which is a feature, not a bug), which makes it easier to miss whitespace-only clues. That said, requiring indentation definitely helps. |
See #716 for an experiment with a warning for nested pattern matching. |
@const-rs: @garrigue: I'm aware of #278 and I believe trying to add to the language to fix basic design issues to be cumbersome and a mistake. It's much cleaner to make this small change than it is to emphasize backwards-compatibility in this instance IMO. The change is minuscule given the fact that forward-looking programmers wrap their @alainfrisch: It's not currently clear from the description, but this will (eventually) not be fragmentation. I see this as eventually being normative OCaml syntax, just as all strings will be immutable. For example, companies like Jane Street will have a massive incentive to convert all of their internal code to This isn't something that should be done by warning, or relying on a tool. It was a syntax design oversight, and It needs to be addressed at the syntax layer. One cannot mix unterminated |
I think you're overly optimistic that everyone will jump at it and adapt their code base quickly. This would take several years at best (even if automated tools are provided). The thing is that experienced users won't see the benefits of the new syntax and will actually suffer from it: they are usually not bitten by the problem, and the proposed syntax adds weights for simple cases where the closing delimiter is not required. What would be the "massive incentive" to switch? And even if everyone switched, beginners will still be confused by materials referring to the "old" syntax. |
I consider myself a mildly experienced user: I disagree with everything this paragraph contains. I would switch in an heartbeat if backward-compatibility was not a concern. |
It would be useful to have explicit delimiters for the ends of matches. But I don't think forking the syntax is a good idea. Wrapping I'd prefer to add optional delimeters around cases to the language, like this: match e with
( A -> e₁
| B -> e₂ ) or perhaps, following the revised syntax, like this: match e with
[ A -> e₁
| B -> e₂ ] Then the nested case would look like this, which seems like a small but genuine improvement over the current way of working around the problem: match e with
[ A -> match e₁ with
[ A -> e₂
| B -> e₃ ]
| B -> e₄ ] This change would be backwards-compatible, and wouldn't require any new compiler flags. And it would be easy to add a warning as in #716 for undelimited matches in tail position. |
It is too optimistic. Most Linux distros will just stick with ocaml4 for decade. Inner-match warning would be very nice - usually this place triggers confusing error anyway. Btw, issuing new warnings is not very safe. There are many projects that are compiled with smth like -Wall. And when you add new warning, it sometimes breaks the build. Everything is fine, if you are a developer of the program. But if you are just a packager? You have to go deep inside the build system or the program to eliminate the warning. So, may be it is just possible to add couple rules to static analyzer? It can have false positives (unlike compiler). So, it can warn about inconsistent code and formatting, two identical things in if else and etc. |
This is not a very good argument. These projects are doing it wrong, you should not distribute code with |
There is another, easier option - just stop following compiler upstream until all the libraries are get fixed. |
People are even reluctant to use a new stdlib function because they want their libraries to work with OCaml version foobar. So just bringing some extra protection will not be a sufficient argument to have them switched. @yallop's proposal is more realistic and could be easily combined with #716 (simply consider the new form to be delimited, and in the warning message, suggest to use that form instead). |
@yallop: I'm fine with adding delimiters around clauses, but I think that it is important for
|
@gasche at that point why not require all |
(I personally like @yallop's proposal since it corresponds to the polyvar definition syntax.) |
Note that variant polymorphic variant types accept |
I quite strongly disagree with the background idea here, which is that solving this syntactic problem doesn't add much and that experienced users don't suffer from the problem. Even though experienced users make this kind of mistake rarely, it is a disturbingly large fraction of the mistakes they do make, and it's a hard mistake to find. Every experienced user I've spoken about this with at Jane Street thinks it's one of the biggest design flaws in the language, and certainly the worst thing about the syntax. I don't know that the present solution is a good one, and Alain's point that we should expect migration to be slow is reasonable; but this is a serious issue that deserves to be addressed. |
@yminsky Which mistake in particular are you talking about? The one you mentioned above (with the if-then-else and |
Do you think that a solution that notifies users about the problem (e.g. warnings based on indentation or (lack of) explicit grouping) would be sufficient, or do you think the problem deserves a change to the syntax? |
Apologies in advance: this is mostly about the This may go without saying, but warning users is essentially never enough: in practice, I've only seen benefits when you turn warnings into errors. But, since all warnings can be made into errors, that may be enough. A few comments:
The thing that's missing from our solution is clear demarcations of the blocks that are semantically distinct. I think the curly-brace languages are better in this regard, e.g.:
is both clearer and lighter than
What's particularly odd about the latter is that it can be broken by an operator, e.g.:
is clear enough, but
won't compile with the warning on, because the true extent of the block is
None of this is especially tragic, but it highlights that the underlying check is awkward. That said, my all in view is that we should try the warning route first. My proposal would be to upstream the rule we've set up internally as a warning, and see if people find it acceptable, and if so, we should move to turn the warning on by default. For the match syntax, my best guess is that the right rule is to require a begin/end around nearly every match, similar to your proposal in #716 |
Thanks @yminsky . This is really not off-topic since this PR is about both if and match.
I think that even if this were parsed as I know you did not actually propose to add such a curly-brace syntax, but it's worth noting that it would heavily conflict with records.
I agree with that. The fact that one can write: if ... then
let () = () in
foo ();
bar (); but removing the let-binding changes the scope is really bad. |
@gasche I like your ideas, FYI reason (CC @jordwalke @chenglou) used |
I've opened a new PR (#722) to discuss delimiters for |
@yallop From a language user standpoint, I'm not sure I understand the benefit of adding delimiters around cases rather than an explicit
better than
particularly when Is the argument that the internal changes are less invasive for delimited cases? |
@hcarty I like delimiters better than |
@hcarty: if begin match e with
| c1 -> e1
| c2 -> e2
end It's not easy to determine whether the |
@alainfrisch: noted. I'll correct it above. @bschommer: my point was that you don't need syntax-aware tools for this change. A simple regex tool is enough, which means that every single build system can easily integrate this change as outlined above, and |
|
@yallop: good point. |
Well, breaking the text in comments is not too bad given the context (allow the code to be compiled by old compilers). Same example with strings, though, and this is more problematic. |
Sed solution breaks program
transforming it into
|
Just a note: I've added a more complex tool that should handle comments, strings, and quoted strings here. It uses regex + a simple state machine. This wouldn't be hard to use in any build tool, though it's admittedly a little more complex then just using @const-rs: your safe-syntax program has a syntax error, since you didn't terminate your |
I think that such conversion tool should have the support of an existing OCaml lexer to work correctly (for example the upstream lexer via
|
@gasche I had no idea we parsed comments that way. Why do we do that? |
Anyway, I concede that my hacked-together tool isn't sufficient. |
I have thought of this problem before, and come up with a solution: One solution would be to issue a warning if the situation was at all ambiguous. But that might require some situations to be like: (if x then (a + b) else (c + d)) I think that using |
While we are at it, could we make trailing semicolons and any appearance of One other wart that I don't like is the double meaning of |
Probably Python-style alternatives to if condition then: expression1 else: expression2 end , match expression with: cases end and function: cases end or fun: cases end Colon can be allowed to be separated from the previous keyword and the |
Update: I now have a tool for converting between 4.03 syntax and safe syntax, and vice versa. It doesn't preserve comments yet since it uses the existing OCaml compiler infrastructure, but that should be good enough for now for converting from safe_syntax to earlier syntax (4.03), which means build tools can use this to provide instant compatibility for new syntax. Thanks to the compiler devs for creating the really great pprintast.ml! So easy to modify. I'll take advantage of the object-oriented de-duplication aspects once I have the time. Feel free to check out the tool on your own code and see what it looks like in safe_syntax mode. Given the existence of this tool (and I'm sure it needs some love), I think this proposal becomes viable. What do you guys think? |
I generally like the proposal, except for changing the current if-then which allows structuring the code as let rec find x = function
| Empty -> None
| Node(l, v, d, r, _) ->
let c = Ord.compare x v in
if c < 0 then find x l else
if c > 0 then find x r else
d It should be possible to introduce On another note, the requirement of |
If we give up the goal of uniformity, and introduce two new keywords to optionally terminate patterns and conditions, I think we could keep backwards compatibility and keep support of undelimited if-then-else. The terminator would attach to the closet construct of the right kind if present. When writing pure code, one would only use the pattern-terminator, which would never terminate a conditional. I don't have good suggestions for what the keywords should be though. Ups: I overlooked the fact that an optional |
I see quite a bit of support for the proposal, but honestly, I don't see it happening that we maintain two versions of the parser in parallel, nor that the community switch to the new syntax fast enough to drop the old one quickly. Moreover, even if automated tools were available to convert code bases, this would still break available literature and code snippets around. And, but this is more a matter of taste: I think that adding warnings such as #716 would be enough to avoid mistakes, while retaining the light current syntax for common cases, and without breaking backward compatibility. So I'm tempted to close the issue, unless another maintainer feels strongly enough in favor of the proposal enough to shepherd it. |
Personally I agree with the downsides you mention, but I think that there are changes to the ecosystem that could mitigate them. ocaml-format was just released and brings the hope of having more robust automated code-evolution tools (but this hope is still in the future), and I would be more confident about evolving/maintaining parsers if I manage to propose a Menhir migration. I think today is too early to consider alternative syntaxes (we haven't tended to all the camlp4-depreciation growing pains yet), but that it may change in the medium term. |
Ok, closing for now. If the eco-system moves to a state where changing the concrete syntax becomes significantly easier, we will revisit. |
@yminsky , in your example neither the sequence nor the bind is in fact essential, right? I mean, you could just as well say that the following is confusing:
assuming |
Proposal
Two problems presently exist and have been discussed regarding OCaml syntax (see #278 and this reddit thread):
Problem with If...else
Because the
if
expression has no terminator, a common issue is that refactoring of code containing anif
expression can result in incorrect behavior. The scenario generally involves something likeif x = 3 then 12
Later refactoring may add to the
then
clause:if x = 3 then print_string "done"; 12
Which would cause the behavior to be incorrect. Many variations of this problem exist, all of which concern the lack of a terminator. To prevent this issue, many OCaml programmers proactively insert
begin
andend
into every if expression.This PR changes the
if
syntax to beif x = 3 then 12 end
The small, amortized addition of
end
means that theif
expression becomes completely safe in this regard, and that there is no need to ever usebegin...end
in this context.Problem with deep matching
Deep matching also suffers from the lack of a terminator. Specifically, for the example
can later be augmented with some deep matching, which needs to be protected with parentheses when deep matching:
This is an annoyance when refactoring, and can sometimes even cause incorrect behavior that isn't picked up by the type checker. As a result, some people proactively couch every
match
inbegin...end
.This PR adds an
end
construct to each one ofmatch...end
,function...end
, andtry...with...end
. This change makes it easy to refactor matches and to layer deep matches.Examples
Command line argument
The change to the syntax is only in effect when passing the
-safe-syntax
option. This means that no code is impacted by default except for code which the user specifically chooses to convert to using safe syntax. For example, a company could decide that safe syntax is useful for the sake of reducing potential bugs and simplifying code, but reduce exposure to specific modules.Goals
My goals in designing the solution presented here were:
function
vs Caml-light'sfun
, or reversing type application. I believe this would have prompted endless bike-shedding, as it should have -- these changes are far more subjective. Instead, I targeted what I consider to be design issues in the syntax that justify breaking backwards-compatibility, and I think these changes should be popular enough that within a few versions,-safe-syntax
will be the default.end
is the default terminator for OCaml (seestruct
,object
,sig
). While I could see the argument for makingfor
andwhile
also useend
, there is also value in differentiation, and doing so would also increase the surface area of the proposed changes, affecting goal 1.-safe-string
option. Within a few versions, I envision all OCaml code being-safe-syntax
compatible. Additionally, since all forward-looking, refactor-proof OCaml code already wrapsif
andmatch
withbegin...end
, the changes to existing codebases are trivial and the changes to the mindset of programmers are trivial as well.Some have suggested that backwards-compatibility should trump here, and I disagree. #278 adds to the complexity of the language. Appropriate tools are brittle and hard to design, and not everyone can rely on them.
Backwards Compatibility
Of course, changing the syntax in a fundamental way breaks backwards compatibility. I believe this is still the correct approach. For old compilers (due to LTS releases/other reasons), a conversion tool exists, placing the onus on build systems (ocamlbuild/Jenga/omake/ocp-build) to support
-safe-syntax
in old compilers.With this tool, build systems (ocamlbuild, Jenga, ocp-build, omake) could convert every safe-syntax file to unsafe-syntax on the fly and compile it with an older OCaml compiler. In the first stage (when
-safe-syntax
is opt-in in version X), they would do so for specific files that ask to be compiled in-safe-syntax
mode if the system compiler is earlier than version X. In the future (when-unsafe-syntax
is opt-in at version Y), they will do so for all files if the system compiler is less than X, or just supply-safe-syntax
for most files if the version is >= X and < Y.Note
This PR would benefit from #292, which would allow code sharing between parsers and therefore reduced duplication.