Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Preserve end of line #390

Closed
nojaf opened this issue Jan 16, 2019 · 15 comments
Closed

Refactor Preserve end of line #390

nojaf opened this issue Jan 16, 2019 · 15 comments

Comments

@nojaf
Copy link
Contributor

nojaf commented Jan 16, 2019

Problem

The setting --preserveEOL will try and preserve the original end of lines.
Meaning if multiple newlines were inserted somewhere in the code, these are preserved after the formatting.

before:

let x=    1

let y=2

after:

let x = 1

let y = 2

Without this settings fantomas would place the two let bindings directly underneath each other.
When the formatted code is constructed purely from the information stored in the AST (strict mode), fantomas will output:

let x = 1
let y = 2

In TokenMatcher.fs, the original tokens are being compared with the newly constructed tokens from AST.
When a newline is found in the old tokens it is being written
and the helper function indentWithNewSpacing will try and set the correct indentation of the next line.

However this helper function is somewhat confusing and does not cover all cases. It is difficult to solve #362 when other settings are also in place.

Proposed solution

Fixing the preservation of newlines in the TokenMatcher might not be the only place where we can fix this.
Given our sample code we can view in the ranges of the AST that there is an additional newline.

[Let
   (false,
    [Binding
       (None,NormalBinding,false,false,[],
        PreXmlDoc ((1,5),Microsoft.FSharp.Compiler.Ast+XmlDocCollector),
        SynValData (None,SynValInfo ([],SynArgInfo ([],false,None)),None),
        Named
          (Wild tmp.fsx (1,4--1,5) IsSynthetic=false,x,false,None,
           tmp.fsx (1,4--1,5) IsSynthetic=false),None,
        Const (Int32 1,tmp.fsx (1,10--1,11) IsSynthetic=false),
        tmp.fsx (1,4--1,5) IsSynthetic=false,
        SequencePointAtBinding tmp.fsx (1,0--1,11) IsSynthetic=false)],
    tmp.fsx (1,0--1,11) IsSynthetic=false); // <== line 1
 Let
   (false,
    [Binding
       (None,NormalBinding,false,false,[],
        PreXmlDoc ((3,5),Microsoft.FSharp.Compiler.Ast+XmlDocCollector),
        SynValData (None,SynValInfo ([],SynArgInfo ([],false,None)),None),
        Named
          (Wild tmp.fsx (3,4--3,5) IsSynthetic=false,y,false,None,
           tmp.fsx (3,4--3,5) IsSynthetic=false),None,
        Const (Int32 2,tmp.fsx (3,6--3,7) IsSynthetic=false),
        tmp.fsx (3,4--3,5) IsSynthetic=false,
        SequencePointAtBinding tmp.fsx (3,0--3,7) IsSynthetic=false)],
    tmp.fsx (3,0--3,7) IsSynthetic=false)] // <== line 3

It would also be possible to add an additional newline in the CodePrinter.fs.

Drawbacks

  • We need to identify all cases where we should check in the AST if a preservation of newlines is required.
  • Big impact, might resolve in a breaking change (new next major version).
  • Some cases would require that we remove newlines.
    F.ex
let x=
    let z =1
    z
let y =2

should we also check in the AST if it is ok to insert an extra newline after z? Or is this overkill?

@s-trooper we (@jindraivanek and myself) would like your opinion as you made the first implementation of preserve newline.

Perhaps we also need some unification of newline break logic.

@s-trooper
Copy link
Contributor

I am bad at English and will have a hard Time to explain such complex things. So i will simplify and assert things for facts that probably are nether simple nor facts are.

The fantomas remove Newlines/Spacing not for a Style reason but because of the a limitation in the implementation. The newlines/spacing should be preserved per default and removed/inserted by Style formatter. And not the other way around, removed and then inserted by formatter, like fantomas does it.

To do so, one need both Lexer/Tokenizer and Parser Information. FSharpTokenInfo for spacing/eol information and AST Info to get symantics of a Token in code, because e.i. a "<" Token could be part of generics or a operator, a "X" Identifier could be a expresion or value, it could be anything. I tried to create such a structure but failed, as the amount of work needed is just to much for me.

It may be not obvious but the TokenMatcher.fs -> "integrateComments" is just a hacky way to get parser and lexer information together, the formatted source part is based on parser and original source is tokenized. My preserve lines solution is build upon it. But i have seen multiple limitation in this approach and i think it is not possible to reverse all needed Information by diff on formatted and original source tokens.

Sofar I do not have a good solution for preserve new lines in fantomas, the reason i wrote my own formatter. But i hope my thoughts about this Problem will help you, or at least entertain you.

@nojaf
Copy link
Contributor Author

nojaf commented Jan 18, 2019

Hello @s-trooper , thanks for the information. Would you agree with the statement that the problem --preserve-eol tries to solve is keeping newlines from disappearing after formatting?
Maybe we should focus on that and that only.

@s-trooper
Copy link
Contributor

s-trooper commented Jan 21, 2019

Sure, from the end user perspective eol disappears after formatting and that to prevent is the goal of "--preserve-eol".
Technically (in fantomas), eol disappears before formatting, as fantomas formats AST tree and that doesnt contains eol tokens. So one have to keep newlines while parsing (transform source code to AST Tree) and then while formating.

I am not sure if this is part of the question but, can "--preserve-eol" be handled in isolation, not as part of style formatters? I doubt it, spacing and most importend indentation (this is spacing after all too) are strong coupled to "preserve eol" problem space.
e.g., original source is:

let lst =
    [ 5
      6
      7 ]

it can/will be formated to:

let lst = [ 5; 6; 7 ]

in this case one have to keep original spacing when eol should be preserved.
So you are right? best "preserve eol" is part of fomatting or one run in Problems me/you have run in.

@nojaf
Copy link
Contributor Author

nojaf commented Jan 21, 2019

Thanks for the sample code, this illustrates the part of eol I would consider not to preserve actually.
I think in your example people should accept that lines between f.ex. 5 and 6 are gone.
What was the point of formatting if you don't want this?

What I do think the setting --preserve-eol should do is keep blank lines from disappearing.
F.ex

let a = 7


let b = 8

Both SynModuleDecl.Let will show a range, and you know the total lines of the document.
So one could determine that lines 2 and 3 and not in use, so they should be added after the first SynModuleDecl.Let.

This is the only case where I should preserve the newlines. @jindraivanek what do you think?

@s-trooper
Copy link
Contributor

I would argue about "point of formatting", but i got your idea. You like to preserve blank lines only and that was the main issue anyway. What about statements? like:

printfn "a"

printfn "b"

But the idea is nice! the let expression is way the most used one.

@nojaf
Copy link
Contributor Author

nojaf commented Jan 21, 2019

Yes, it should be more elaborate than just let statements. (open, do, use, ...)
At some points in the AST we should do this check. It will require some investigation of course.
Let us focus on preserving those newlines and remove the existing behavior as it is to error prone.

Question is if we should bound the proposed behavior to a setting? I'm considering no as I think it won't upset people that this blank lines are preserved.

@jindraivanek
Copy link
Contributor

I think there is 3 separate things that preserve-eol try to do today:

  1. preserve blank lines
  2. preserve line break when fantomas don't do line break
  3. preserve no line break when fantomas want to do line break

Problem with 2) and 3) is that to do them correctly we need to know correct indentation and that need to be done in CodePrinter.

I think 1) can be done in TokenMatcher like now.

One more complexity with moving logic to CodePrinter is that we need to know Range of expr following nln. That means reworking all sepNln to get it, and that could be tricky.
Alternatively, we could store "current" position in original source code in Context. That is maybe easier.

I think main question is if we can change preserve-eol to do only 1), I'm not against it, but maybe there is some people that using 2) as workaround for:

  • fantomas doing vertical alignment too much in right
  • they want only opening bracket of list on first line
  • they want opening bracket of list on new line
  • they want closing bracket of list on new line

Not sure if 3) have some good use.

With all that in mind, I propose we:

  • add preserve-empty-lines to do only 1)
  • change preserve-eol to do 2), with note it is remain buggy and it will be replaced with different flags to fine-tune where to do break lines
  • drop 3)

@nojaf
Copy link
Contributor Author

nojaf commented Jan 21, 2019

As hinted before I think we should only be doing 1..

as workaround for ...

to me that sounds like 2. is an excuse to disagree with Fantomas output, which is fine as a concept but there should be more reasoning done when people disagree with the output.

I believe people format code for consistency in their code-base. And if something is bothering people in a persistent way they should raise an issue. It may be discussed and perhaps the F# style guide could provide the necessary guidance. #386 is a good example here.

I suggest we remove --preserve-eol as setting and modify the TokenMatcher to account of additional newlines in the original code.

@jindraivanek
Copy link
Contributor

to me that sounds like 2. is an excuse to disagree with Fantomas output, which is fine as a concept but there should be more reasoning done when people disagree with the output.

I believe people format code for consistency in their code-base. And if something is bothering people in a persistent way they should raise an issue. It may be discussed and perhaps the F# style guide could provide the necessary guidance. #386 is a good example here.

I agree, but I don't want to break existing functionality without providing alternative.

I suggest we remove --preserve-eol as setting and modify the TokenMatcher to account of additional newlines in the original code.

I think "don't care about empty lines" mode is still useful.

@nojaf
Copy link
Contributor Author

nojaf commented Jan 22, 2019

Well I firmly believe that introducing settings is the root of all evil 😋.
However I guess people could be upset if we did break existing functionality.
So we remove --preserve-eol and introduce --keep-additional-newlines?
Other names for the new setting are welcome.

@s-trooper
Copy link
Contributor

  1. keep-empty-lines or no-empty-lines: bool
  2. max-empty-lines: int*
  3. python doesnt have an option and just join multiple empty lines to one, except for "function" separation, there 2 empty lines.**

(*) i would vote for this as most flexible, but it would remove original amount of empty lines or use (-1) for that case?
(**) i personally don't have meet a person that don't like paragraphs in books or empty lines in code, but of course such ppl have to exist. i would ignore them when creating a code formatter.

@jindraivanek
Copy link
Contributor

To clarify, I'm not for removing all empty lines, just for option where existence of empty line don't depend on source.

I like keep-empty-lines or preserve-empty-lines.

Regarding breaking existing functionality, we can also start this as some beta of next major version and release when we have proper solution to newline placement.

@nojaf
Copy link
Contributor Author

nojaf commented Jan 22, 2019

Ok, summarizing the outcome of this:

  • --preserve-eol setting would be completely removed.
  • --keep-empty-lines would be introduced (default false) to keep newlines that were found in the original code but not in the formatted AST.
  • This refactoring would be part of a next major.
  • We will not be solving any issue marked with area-preserve-eol.

@auduchinok
Copy link
Contributor

@nojaf I do like this one as well:

max-empty-lines: int*

I usually place an empty line between top level let bindings and members and two lines between types. Removing extra lines would be handy.

@nojaf
Copy link
Contributor Author

nojaf commented Jan 25, 2019

@auduchinok thanks for this input. We decided that we would focus on bug fixing other things first.
And leaves --preserve-eol issues for what they are. Once we started the next major, we can reconsider the outcome here.

On another note @jindraivanek maybe we should already indicate that --preserve-eol will be deprecated in the next release. Obsolete attributes and warnings in the console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants