Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidier erl #2451

Merged
merged 5 commits into from
Jan 13, 2020
Merged

Tidier erl #2451

merged 5 commits into from
Jan 13, 2020

Conversation

Zalastax
Copy link
Contributor

@Zalastax Zalastax commented Nov 7, 2019

I have a goal to fix erl_prettypr so that I can start using it in the projects I work on. Readability of some of the outputs is the biggest issue and this patch improves the situation a lot.

To see what the second commit does, compare output for sep (new) with par (old).

I also want to start preserving newlines that are used for grouping the code. It's something we see in tools for other languages and it's really useful. Consider

f() ->
        oneaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
        twoaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, threeaaaaaaaaaaaaaaaaaaaaaa,


        later.

I would like it to format to

f() ->
        oneaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
        twoaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
        threeaaaaaaaaaaaaaaaaaaaaaa,

        later.

whereas

f() ->
        one,
        two, three,

        later.

should become

f() -> one, two, three, later.

It's easy to add detection of meaningful newlines but prettypr does not provide a means of outputting an empty line in vertical layouts. I've played around with text("") but it will always add an extra space in horizontal layouts. @kostis, any ideas?

@CLAassistant
Copy link

CLAassistant commented Nov 7, 2019

CLA assistant check
All committers have signed the CLA.

@vladdu
Copy link
Contributor

vladdu commented Nov 7, 2019

Regarding the meaningful empty lines: your transformation will remove those and running the formatter twice will get a different result. Or am I misunderstanding? A formatter should be idempotent.

@Zalastax
Copy link
Contributor Author

Zalastax commented Nov 7, 2019

I wasn't clear enough. Formatting the code below would be an identity operation. We get this behavior by checking if the next expression is more than one lines ahead.

f() ->
        oneaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
        twoaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
        threeaaaaaaaaaaaaaaaaaaaaaa,

        later.

@vladdu
Copy link
Contributor

vladdu commented Nov 7, 2019

In the first example, you go from two empty lines to one; in the second, from one empty line to none. So the first example would also have the empty line removed if formatted twice. Or is this related to the fact that the second example fits on one line? What if the empty line is a meaningful separator?

@Zalastax
Copy link
Contributor Author

Zalastax commented Nov 7, 2019

It's related to it fitting one line 😃

Perhaps expressions should never curl up to one line in function clauses? That would avoid the issue and I don't think it's that useful to compact the code in that way. I don't think gofmt nor prettier does it and that is a compelling argument to me.

@vladdu
Copy link
Contributor

vladdu commented Nov 7, 2019

Personally, I think each expression should be on its own line(s). I see no reason for
{X,_}=Arg, Y=4+X, Z=Other, on one line. It's decreasing readability.

@essen
Copy link
Contributor

essen commented Nov 7, 2019

Sounds like making it configurable would be useful. You can't really assume whether an empty line is meaningful or not, and people may prefer one/multi line clauses when possible. In the case of one line clauses other criteria/options coming to mind would be "if one line doesn't fit then make all clauses multi lines" or "if there's more than one expression in the clause then make it multi lines".

@Zalastax
Copy link
Contributor Author

Zalastax commented Nov 7, 2019

Let's not make it configurable. It's important to keep the number of settings down and here we have a strong case for readability.

It's safe to assume an empty line is meaningful. If it's not, then the developer can take it away by deleting it. This works well in practice and is championed by prettier and gofmt.

I'm going to play around a bit and see if I can make the layout engine produce empty lines for me. It should be simpler now when I won't have to consider horizontal layouts.

@essen
Copy link
Contributor

essen commented Nov 7, 2019

The empty line may have been a left over from a debugging session or unfinished code. I never use meaningful empty lines personally (at least not in my own code bases). Sometimes they get committed, I would rather they don't.

I'd love to use a formatting tool as well but it should support the way I naturally write code and correct my mistakes, not force me to use an arbitrary style. There is no "one true way" that can be decided through argumentation.

@Zalastax
Copy link
Contributor Author

Zalastax commented Nov 7, 2019

Quoting grey-area

This is a philosophical difference, not an oversight.

gofmt imposes a single style on the entire go ecosystem, that's unusual as most previous formatters have lots of options, and allow different teams to adopt different styles (e.g. tabs vs spaces). Turns out it doesn't really matter what style you choose as long as everyone uses the same one, and long lines aren't a problem in practice.

Worse is better.

It might feel radical but it's better to go away with options. You might not like how it changes your code but you'll love how it changes others.

The Elixir code formatter works in the way I suggest: Any continuous blank lines are joined into a single blank line.

@essen
Copy link
Contributor

essen commented Nov 7, 2019

I love how you just assume that our philosophical differences, as you quote it, make your point of view better than mine and that I would just love the result of what you're doing. I don't. And I certainly don't want to impose any style onto others, whether it be my preferred style or something else entirely.

By the way your quote mentions tabs vs spaces, yet gofmt has options to configure tabs/spaces. Guess even they didn't manage to impose all their preferences on everyone else.

I have said nothing about continuous blank lines, it's a good idea to join them. I only commented on the meaningfulness of blank lines which isn't always true (and you have said nothing against that, you just assume they're always meaningful or that the developer would always remove them otherwise, somehow) and on one/multi line clauses where you claim multiple lines are always more readable without any evidence, and despite there being a large amount of "one-line per clause" functions in OTP and elsewhere, to give a few examples: https://github.com/erlang/otp/blob/master/lib/stdlib/src/io_lib_format.erl#L213 https://github.com/erlang/otp/blob/master/lib/compiler/src/cerl.erl#L273

That said, it'd be fine if the tool was not configurable but would apply the preferred OTP style, I just wouldn't use it personally outside the OTP project, but as the examples above show making all clauses multi-lines would not go well with most of the OTP code.

@Zalastax
Copy link
Contributor Author

Zalastax commented Nov 7, 2019

Sorry - I was too harsh. I feel very strongly about this issue and want to keep erl_tidy as opinionated as it already is. I think it has huge value in big projects, e.g. OTP.

The examples that you gave seem to not be affected by the discussion here. The first example has one tuple expression - they would not be affected. In the second example I only see atom expressions - they would also not be affected. I don't think there is an actual issue with making this change. I sampled a several files here in OTP and can't find any cases of multiple clauses on the same line.

@zuiderkwast
Copy link
Contributor

The changes done in the actual PR seem very good and there doesn't seem to be any disagreement about it, but I think it's confusing that you start discussing something "future work" directly.

sep(Es) seems to mean all in one line if it fits, otherwise one element per line, wheareas par(Es) means fit as much as possible on one line and then continue on the next line. Nobody seems to want par(Es) (at least not for tuples).

Can you please edit the description and/or the commit messages to make it more clear? E.g. examples with 1 untidy source, 2 before and 3 after.

@zuiderkwast
Copy link
Contributor

@vladdu:

Personally, I think each expression should be on its own line(s). I see no reason for
{X,_}=Arg, Y=4+X, Z=Other, on one line. It's decreasing readability.

This PR is affecting the elements in tuple elements {A, B, C}, list elements [A, B, C], function application arguments f(A, B, C) and elements in type unions A | B | C. For these cases, we sure want them to fit them on one line sometimes, don't we?

Regarding the example below:

Or is this related to the fact that the second example fits on one line? What if the empty line is a meaningful separator?

f() ->
        one,
        two, three,

        later.

@vladdu, I think you have a point here. If the empty line should be considered meaningful, would that force one element per line here then? And what happens if there is a comment in between, before later? Where does it end up?

@zuiderkwast
Copy link
Contributor

@essen

There is no "one true way" that can be decided through argumentation.

For Erlang, there is no consensus about a true way, so I tend to agree that options might be preferable, if there are any major disagreements.

(For Go, what @Zalastax is quoting, there seems to be "one true way" though. As I understand it, the Go compiler refuses to compile if the code isn't formatted exactly in this fixed way and I guess this rule was there since the beginning. The Go community seems to like it like that.)

On the other hand, I think several improvements can be made to erl_prettypr without any options, e.g.

  • indenting using spaces only instead of replacing 8 spaces with a tab (because almost only Emacs understands this convention; other editors get confused and so does web forums such as stackoverflow, etc.)
  • don't put comma-separated expressions in a block or the body of a clause on the same line (example below, the line with io:format/2)
  • indenting clauses after of with 4 spaces just like everything else, instead of the current 2 (example below)
%% Current output of erl_tidy:
f([X | Xs]) ->
    case bar(X) of
      {ok, Result} ->
          io:format("Result: ~p~n", [Result]), Result;
      not_this_one -> f(Xs)
    end.

@vladdu
Copy link
Contributor

vladdu commented Nov 7, 2019

This PR is affecting the elements in tuple elements {A, B, C}, list elements [A, B, C], function application arguments f(A, B, C) and elements in type unions A | B | C. For these cases, we sure want them to fit them on one line sometimes, don't we?

I referred to the examples in the original comment, where the items were expressions at the top level.

Regarding meaningful empty lines, we can only guess and I think we should keep them as the author intended. Several empty lines however, are probably always not intended and can be compressed into one.

@Zalastax
Copy link
Contributor Author

Zalastax commented Nov 7, 2019

Many thanks to all for the great discussion and thank you @zuiderkwast for facilitating!

From this test file we now get this output. Formatting erl_prettypr.erl gives us this output. Let me know if you want more examples or are unhappy with anything in the output.

My current petpeeve is the three comment variants %, %%, %%% which all require different indentation. I currently think it can't be solved without modifying prettypr further. I leave that for another PR...

@zuiderkwast
Copy link
Contributor

I like it.

Just one concern about removing the concept of sub-indentation. The Emacs mode indents when using two spaces. That would be a reason to keep the concept of sub-indentation:

f(Xxxxx)
  when Yyyyy ->
    Zzzzz

Should the Emacs mode be considered the reference indentation?

@Zalastax
Copy link
Contributor Author

It would be great if erl_tidy indented the same way as Emacs. It would also be nice to get rid of sub-indentation. Should we update the Emacs mode? If not, I'll look through the Emacs mode source to know when to sub-indent...

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Nov 11, 2019
prettypr:par/2 adds a space already

  -record(a, {a :: a, b :: b}).
used to become
  -record(a, {a  :: a, b :: b}).
now it becomes
  -record(a, {a :: a, b :: b}).
par often makes for code that is more compact but less readable.
Similar tools opt for readability and we should too!
@Zalastax
Copy link
Contributor Author

@zuiderkwast are there other places than at a "when" that use sub-indentation?

I force pushed now. I removed the commit that gets rid of sub-indentation. I also re-implemented "Expressions in clauses on separate lines" since the previous version did not work correctly with multi-line expressions. It's not the prettiest code I've written but the change is backwards compatible and I don't see a better alternative at the moment.

@zuiderkwast
Copy link
Contributor

Dialyzer warning: erl_tidy.erl:351: Function check_forms/2 will never be called.

are there other places than at a "when" that use sub-indentation?

Yes, using Emacs with erlang-mode, after ( at end of line:

after_open_paren(
  two,
  spaces) ->
    four,
    spaces.

If erl_prettypr never breaks lines after (, this shouldn't be an issue.

@michalmuskala
Copy link
Contributor

michalmuskala commented Nov 15, 2019

AFAIK erl_tidy also uses sub-indentation when indenting too long operands:

test(X, Y, Z) ->
    a:fooooooooooooooooooooooooooooo(X) ++
      a:fooooooooooooooooooooooooooooo(Y) ++
	a:fooooooooooooooooooooooooooooo(Z) ++
	  a:fooooooooooooooooooooooooooooo(X).

In general, though, erl_tidy has other issues that might require more work, up to almost a complete rewrite. For example, it doesn't preserve formatting of literals, converting, for example, binary data in "\x01\x56\xa6" into unreadable "\001V¦". It also doesn't deal well with macros - ?assertMatch from stdlib is enough to completely throw it off. There are also more minor annoyances.

Before you invest more work into erl_tidy, I just wanted to say I'm working right now on a new formatter from scratch that is in its approach somewhat closer to the Elixir formatter. The current plan is for it to be public in time for Code BEAM SF next year.

@Zalastax
Copy link
Contributor Author

@michalmuskala that's great! Well, then I see little point in me investing more time in cleaning this up. Let me know if I can help out in any way!

With new light on the topic, I think I should remove the sub-indentation changes from this PR and wait for the new tool to come.

@zuiderkwast
Copy link
Contributor

@Zalastax Your changes done so far are still improvements and I think it would be nice if they were merged. It might take some time before a completely new formatter makes it into OTP...

@elbrujohalcon
Copy link
Contributor

As stated in erlang-ls/erlang_ls#15, we (@AdRoll) are also working on a standalone code formatter.
In our case, a plugin for rebar3.
We forked some of the modules that @Zalastax updated here and we'll likely be porting the changes in this PR to them soon.
Our plugin is already open-source, you can find it as http://github.com/AdRoll/rebar3_format (or https://hex.pm/packages/rebar3_format).
Our plan is to formally present it at FOSDEM 2020, but we already accept bug reports, feature requests and, of course, PRs!!

@Zalastax
Copy link
Contributor Author

Zalastax commented Dec 3, 2019

Thanks for the info @elbrujohalcon!

I'm working in a codebase that can't transition to rebar3. Are there many things that rebar3 provide to your formatter or could a standalone (perhaps less featured) version of the formatter be created?

@elbrujohalcon
Copy link
Contributor

@Zalastax as you can see in the repo I linked, the code that does the actual formatting is pretty isolated from the the rebar3-plugin code. You can just use rebar3_formatter:format/2 on your own.
To be clear, we don't have plans to expose that as a standalone thing that's why it's not documented as such. But nothing stops you from using it.

@zuiderkwast
Copy link
Contributor

It would be good to know if there is an OTP code standard at all. Is the Emacs mode the reference indentation? It would be nice if anyone from the OTP team can confirm or reject this assumption.

A formatter implementing such a standard would be nice to have inside OTP itself, preferably as an updated version of erl_prettypr to avoid duplicates.

Btw, my use case is mostly to format abstract code (sometimes extracted from beam files, sometimes generated) and formatting single abstract forms (such as a single function clause) for use in error messages in code analysis tools I'm developing. I definitely do not want to use a rebar plugin for this.

@elbrujohalcon
Copy link
Contributor

From our analysis (and we didn't go too deeply in it), it looks like there was a time when OTP code was indeed formatted in er_prettypr style (with a mixture of tabs and spaces and all)… but at some point (maybe when the project started receiving contributions from the community?)… that was dropped and now the code presents a mismatch of different styles all around.
You can find multiple modules (erl_prettypr among them) that have some lines indented with spaces while others are indented with tabs & spaces (i.e. erl_prettypr style).

@zuiderkwast
Copy link
Contributor

Regarding spaces/tabs, I think it comes from Emacs' default behaviour, which with the default configuration replaces 8 spaces with a tab: https://www.emacswiki.org/emacs/TabsAreEvil#toc3, so not only erl_prettypr does this. This is not really an issue of the Erlang mode but of Emacs itself, so for erl_prettypr it can perhaps be changed to spaces regardless of the Emacs mode.

@elbrujohalcon
Copy link
Contributor

@zuiderkwast well… for rebar3_format we changed it to spaces, indeed. Nevertheless, the code that inserts the tabs is not even in erl_prettypr. It's part of prettypr itself. 🤷‍♂️

@zuiderkwast
Copy link
Contributor

@elbrujohalcon

Nevertheless, the code that inserts the tabs is not even in erl_prettypr. It's part of prettypr itself. 🤷‍♂️

True, but in OTP, prettypr is only used by Dialyzer and HiPE, in addition to erl_prettypr. These and any third party applications would perhaps like spaces too...

@dgud
Copy link
Contributor

dgud commented Dec 4, 2019

The erlang mode for emacs is OTP's reference indentation implementation, it still have some problematic areas that is hard to fix, catch vs try catch and multiline strings and binaries if I remember correctly.

I did write one for https://github.com/erlang/sourcer as well with the hope that that would take off.
Neither emacs nor sourcer adds line breaks, so it's more indentation tools than a code formatters.
I also wrote some tests to ensure that they behaved the same, editors should be able to indent
incomplete (and wrong) code so they are not using OTP's tools.

We have decided to only use spaces inside OTP source, using emacs that is forced via a .dir-locals file in OTP's top directory. That is new code will be indented with spaces, old code left at as it is until edited to have a clean commit history as possible.

@KennethL
Copy link
Contributor

KennethL commented Dec 4, 2019 via email

@zuiderkwast
Copy link
Contributor

That's good news, @dgud and @KennethL! So, if this PR consists entirely of changes that bring erl_prettypr closer to the Emacs mode, it means that it can be accepted? All the changes are in this direction, aren't they @Zalastax?

@garazdawi garazdawi added team:PS Assigned to OTP team PS and removed team:VM Assigned to OTP team VM labels Dec 10, 2019
@dgud
Copy link
Contributor

dgud commented Dec 11, 2019

@Zalastax can you fix dialyzer warning seen in the travis log above

@robertoaloi
Copy link
Contributor

robertoaloi commented Dec 19, 2019

Currently the Emacs Erlang-mode has support for "comma-first" notation, which is used by a few companies/projects. Is the plan to support that?

E.g.:

-export([ foo/1
        , bar/2
        , baz/3
        ]).

Expressions that are separated by at least one empty line will
be separated by exactly one empty line in the output.
@zuiderkwast
Copy link
Contributor

Travis still shows a red X for one of the jobs. I can't really see why it fails. @Zalastax, can you try to re-run that job? (by clicking a link on the Travis page)

@Zalastax
Copy link
Contributor Author

There's no link as far as I can see but I did try to run it again by making a dummy change which I force pushed. Travis still failed. I don't know what might be wrong.

@ferd
Copy link
Contributor

ferd commented Dec 29, 2019

The failure seems to be related to:

1> Testing tests.kernel_test.inet_SUITE.cases: Starting test, 6 test cases
1> Testing tests.kernel_test.inet_SUITE.cases: *** SKIPPED test case 1 of 6 ***
1> Testing tests.kernel_test.inet_SUITE.cases: *** SKIPPED test case 2 of 6 ***
1> Testing tests.kernel_test.inet_SUITE.cases: *** SKIPPED test case 3 of 6 ***
1> Testing tests.kernel_test.inet_SUITE.cases: *** SKIPPED test case 4 of 6 ***
1> 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
inet_SUITE:t_gethostnative failed on line 696
Reason: {case_clause,{error,try_again}}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1> Testing tests.kernel_test.inet_SUITE.cases: *** FAILED test case 5 of 6 ***
1> Testing tests.kernel_test.inet_SUITE.cases: TEST COMPLETE, 1 ok, 1 failed, 4 skipped of 6 test cases

Might just be a container thing for the CI?

@dtip
Copy link
Contributor

dtip commented Jan 6, 2020

@Zalastax this is a great PR!

To throw another formatter into the mix, we (@old-reliable) have been developing a standalone formatter as well. Steamroller is opinionated and works as a rebar3 plugin.

It seems quite similar to the formatter being developed by @michalmuskala! It uses a style and approach similar to the Elixir formatter.

@dgud dgud self-assigned this Jan 8, 2020
@dgud dgud added the testing currently being tested, tag is used by OTP internal CI label Jan 8, 2020
@dgud dgud merged commit 18d3b92 into erlang:master Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.