Enhanced EOL support #2685

MysterAitch · 2020-05-19T17:09:57Z

This PR adds an enum to represent different types of end-of-line / line-separator characters.

It also adds a utility method which allows you to "detect" which style of line-separator a given string uses.
This is incorporated into the parsing process by reading in the content of the given provider and adding this as a data field within the top-most node.
This is then used as part of the CSM and pretty printer where new added nodes will reflect this (defaulting to the system's eol character if it is unknown/ambiguous e.g. it is a single line or there are mixed endings).

Reading in the provider content and then creating a brand new StringProvider using this string is a bit "hacky" and I'm sure there must be a better way to do it, I just don't know what it is -- comments and suggestions welcomed!

…`Utils.EOL` as deprecated

… which line ending was found during parsing -- this is then used during the lexical preservation csmtoken insertion/diffing to ensure that the "new" newlines are of the same type as the parent....

… separator..

…r one isn't set or it is a meta-value e.g. EMPTY or NONE

…ystem's line ending

…t should be unescaped line separators, but the lookup returns escaped line separators; deprecated isSpaceOrTab and added isWhitespaceButNotEndOfLine as a replacement;

…pace within the grammar (e.g. non-breaking spaces and tab characters)

javaparser-core/src/main/java/com/github/javaparser/JavaParser.java

javaparser-core/src/main/java/com/github/javaparser/printer/concretesyntaxmodel/CsmElement.java

javaparser-core-testing/src/test/java/com/github/javaparser/utils/TestUtils.java

javaparser-core/src/main/java/com/github/javaparser/JavaParser.java

javaparser-core/src/main/java/com/github/javaparser/ast/Node.java

javaparser-core/src/main/java/com/github/javaparser/TokenTypes.java

matozoid · 2020-05-19T18:43:10Z

Random informational remark: the textblock feature of Java 12 or 13 or so is very explicit about the kind of eols it wants. Be careful not to break it :-)

MysterAitch · 2020-05-19T20:14:58Z

Random informational remark: the textblock feature of Java 12 or 13 or so is very explicit about the kind of eols it wants. Be careful not to break it :-)

It would be a shame if our test suite wasn't able to highlight any regression like this ;)

But yes -- Just had a read through the JEP and found this:

https://openjdk.java.net/jeps/378#1--Line-terminators

The content may include line terminators directly, unlike the characters in a string literal. The use of \n in a text block is permitted, but not necessary or recommended. For example, the text block:

"""
line 1
line 2
line 3
"""

is equivalent to the string literal:

"line 1\nline 2\nline 3\n"

or a concatenation of string literals:

"line 1\n" +
"line 2\n" +
"line 3\n"

If a line terminator is not required at the end of the string, then the closing delimiter can be placed on the last line of content. For example, the text block:

"""
line 1
line 2
line 3"""

is equivalent to the string literal:

"line 1\nline 2\nline 3"

...arser-core/src/main/java/com/github/javaparser/printer/concretesyntaxmodel/CsmTextBlock.java

MysterAitch · 2020-05-22T17:13:10Z

Hmm... as far as I can tell, there is no reason for the tests to fail on appveyor (consistently) but then pass on all other places...

As far as I can see, the "expected" and "actual" sections are identical, but there's no information if there are line separator differences... I'll push a revised test in a moment.

[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   CommentsInserterTest.issue412:114 expected: </* 1
1 */
/*
2
2
 */
package /* a */
/* b */
// c
// d
com.pany/*alma*/
/*k�rte*/
// l�fasz
// j�ska
.experiment;
// z
// x
/*y*/
/*w*/
/*aa*/
/*bb*/
// cc
// dd
import com.github.javaparser.JavaParser;
public class Main {
    public static void main(String[] args) throws FileNotFoundException, IOException, ParseException {
    // try (FileInputStream fisTargetFile = new FileInputStream(new File(FILENAME))) {
    // final String content = IOUtils.toString(fisTargetFile, "UTF-8");
    // System.out.println(content);
    // }
    }
}
> but was: </* 1
1 */
/*
2
2
 */
package /* a */
/* b */
// c
// d
com.pany/*alma*/
/*k�rte*/
// l�fasz
// j�ska
.experiment;
// z
// x
/*y*/
/*w*/
/*aa*/
/*bb*/
// cc
// dd
import com.github.javaparser.JavaParser;
public class Main {
    public static void main(String[] args) throws FileNotFoundException, IOException, ParseException {
    // try (FileInputStream fisTargetFile = new FileInputStream(new File(FILENAME))) {
    // final String content = IOUtils.toString(fisTargetFile, "UTF-8");
    // System.out.println(content);
    // }
    }
}
>
[INFO] 
[ERROR] Tests run: 1085, Failures: 1, Errors: 0, Skipped: 6
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for javaparser-parent 3.15.23-SNAPSHOT:
[INFO] 
[INFO] javaparser-parent .................................. SUCCESS [  1.594 s]
[INFO] javaparser-core .................................... SUCCESS [ 27.814 s]
[INFO] javaparser-core-testing ............................ FAILURE [01:02 min]
[INFO] javaparser-core-testing-bdd ........................ SKIPPED
[INFO] javaparser-core-generators ......................... SKIPPED
[INFO] javaparser-core-metamodel-generator ................ SKIPPED
[INFO] javaparser-core-serialization ...................... SKIPPED
[INFO] javaparser-symbol-solver-core ...................... SKIPPED
[INFO] javaparser-symbol-solver-testing ................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:31 min
[INFO] Finished at: 2020-05-22T16:57:37Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project javaparser-core-testing: There are test failures.
[ERROR]

…arser into issue_2647_eol

… oops!

…r by character). If it is important for you to define the line endings used, provide a LineEnding as a second parameter else use TestUtils#readResourceUsingSystemEol

…ovides more descriptive output from the test's utility method

…`input` meaning that it checks out the code using the repo's line endings not the system's line endings this was discovered / came up as a problem in javaparser#2685 where javaparser started to respect the line endings of test resources instead of reading line-by-line and inserting the system's line endings

MysterAitch · 2020-05-23T09:37:48Z

EDIT: As a reminder to self for something to investigate tomorrow, maybe the appveyor git configuration is not performing autocrlf changes, meaning that the checked out code is unexpectedly using the repo's line endings (\n iirc) as opposed to the system's line endings or something? That might explain why the issues is only being seen on appveyor.

Oh dear, I should have followed up on this passing thought for just a couple of minutes... cursory searching turned up many results about this...

I read on another thread (http://help.appveyor.com/discussions/problems/671-coreautocrlf) the default configuration for the git autocrlf setting is "input". This means AppVeyor is checking out files with LF (\n) endings on Windows. That broke some of my tests.

https://help.appveyor.com/discussions/problems/1119-allow-changing-git-autocrlf-setting

The git core.autocrlf setting defaults to true (ie, all text files get
checked out as CRLF on Windows), except on Appveyor where's set to
"input" (ie, all text files get checked out with the upstream
repository's line endings, which for us typically means LF.)

https://gitlab.freedesktop.org/mesa/mesa/commit/9e5e3a8ead5321a6abc0add8f6e59e50052f9698

Adding this to the appveyor.yml has the tests passing :)

init:
  # Setup autocrlf -- by default, appveyor uses autocrlf input
  # ... This affects tests which expect resource files to have the systems's line separator.
  - git config --global core.autocrlf true

https://ci.appveyor.com/project/ftomassetti/javaparser/builds/33067787

matozoid · 2020-05-23T17:02:56Z

Adding this to the appveyor.yml has the tests passing :)

Yeeeeessss, but the tests passed before, and shouldn't need an AppVeyor change. Can you reason why this PR broke all those tests in the first place?

One place to look for is files with test content. Those have some kind of EOL, and tests probably depend on that, or better: expect certain EOLs to be in there. That means AppVeyor was doing the right thing - leave the EOLs alone.

Note that the AppVeyor build exists for only one thing: check EOL behaviour on Windows!

matozoid · 2020-05-23T17:06:41Z

I was notified that I was mentioned here or so? I don't see anything? Or was it a notification of a new commit maybe?

MysterAitch · 2020-05-23T17:46:35Z

Adding this to the appveyor.yml has the tests passing :)

Yeeeeessss, but the tests passed before, and shouldn't need an AppVeyor change. Can you reason why this PR broke all those tests in the first place?

One place to look for is files with test content. Those have some kind of EOL, and tests probably depend on that, or better: expect certain EOLs to be in there. That means AppVeyor was doing the right thing - leave the EOLs alone.

Note that the AppVeyor build exists for only one thing: check EOL behaviour on Windows!

Basically appveyor checked out the code using the repo's line endings. That meant that the files all have \n endings instead of \r\n.

Once this PR started to respect the file's ACTUAL line endings (e.g. within lexical preservation tests), it broke the tests.

Setting autocrlf to true causes appveyor to checkout the code using the host's line separator again 😄.

MysterAitch · 2020-05-24T18:16:46Z

From chat:

Danny van Bruggen @matozoid 19:02
Yeah, I think you can simply register it as a pre- and post-processor. Look through the stream while preprocessing, store the eol info inside the processor, and in the post-processor you have the result object so you can store the eol info wherever you're storing it currently.

The only difficult thing is handling the JavaCC Provider implementation :-# That interface is begging for one-off errors.

… caller wants the actual line separator "as-is" and without any additional escaping; added human-readable "descriptions" to the LineEnding enums; progress on LineEndingProcessingProvider.java -- now functional and tests passing;

…ost's line separator

…to retaining them

…tring

…to retaining them

matozoid · 2020-05-27T07:22:11Z

Well done @MysterAitch ! This is a fundamental improvement!

MysterAitch added 9 commits May 18, 2020 20:55

added enum to represent different styles of line endings, and marked …

1687acc

…`Utils.EOL` as deprecated

Added test case, comments, and fixed a few bugs...

463d704

fixes javaparser#2647 -- nodes now have a data item with detail about…

7ae8157

… which line ending was found during parsing -- this is then used during the lexical preservation csmtoken insertion/diffing to ensure that the "new" newlines are of the same type as the parent....

added helper method and checks where used to default to system's line…

a383ff4

… separator..

extracted out method which gets a string from a provider

1575ce4

added method that allows passing a default line ending if there eithe…

6094e5a

…r one isn't set or it is a meta-value e.g. EMPTY or NONE

bug fix where the node has no EOL chars already -- default to using s…

0303145

…ystem's line ending

fixed the CsmToken based only on a given token type int -- the conten…

b64e895

…t should be unescaped line separators, but the lookup returns escaped line separators; deprecated isSpaceOrTab and added isWhitespaceButNotEndOfLine as a replacement;

added comments to the variety of whitespace characters treated as a s…

fa48fe4

…pace within the grammar (e.g. non-breaking spaces and tab characters)