Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseException on unusual characters #186

Closed
frederik-vaassen opened this issue Sep 9, 2015 · 13 comments
Closed

ParseException on unusual characters #186

frederik-vaassen opened this issue Sep 9, 2015 · 13 comments

Comments

@frederik-vaassen
Copy link

Ever since i updated the CheckStyle-IDEA plugin to version 4.19.1, I am unable to scan my project for violations when some files contain unusual characters, as it will always fail with an exception of the following form (blanked out some sensitive references):

An error occurred while scanning a file.: An error occurred during a file scan.
org.infernus.idea.checkstyle.exception.CheckStylePluginParseException: An error occurred during a file scan.
    at org.infernus.idea.checkstyle.exception.CheckStylePluginException.wrap(CheckStylePluginException.java:32)
    at org.infernus.idea.checkstyle.checker.CheckFilesThread.run(CheckFilesThread.java:43)
Caused by: com.puppycrawl.tools.checkstyle.api.CheckstyleException: TokenStreamRecognitionException occurred during the analysis of file ****\AnnotatorRep.java.
    at com.puppycrawl.tools.checkstyle.TreeWalker.processFiltered(TreeWalker.java:218)
    at com.puppycrawl.tools.checkstyle.api.AbstractFileSetCheck.process(AbstractFileSetCheck.java:79)
    at com.puppycrawl.tools.checkstyle.Checker.process(Checker.java:265)
    at org.infernus.idea.checkstyle.checker.CheckStyleChecker.processAndAudit(CheckStyleChecker.java:64)
    at org.infernus.idea.checkstyle.checker.CheckStyleChecker.scan(CheckStyleChecker.java:40)
    at org.infernus.idea.checkstyle.checker.FileScanner.lambda$checkPsiFile$12(FileScanner.java:90)
    at org.infernus.idea.checkstyle.checker.FileScanner$$Lambda$30/1070323125.apply(Unknown Source)
    at java.util.Optional.map(Optional.java:215)
    at org.infernus.idea.checkstyle.checker.FileScanner.checkPsiFile(FileScanner.java:90)
    at org.infernus.idea.checkstyle.checker.FileScanner.run(FileScanner.java:42)
    at com.intellij.openapi.application.impl.ApplicationImpl.runReadAction(ApplicationImpl.java:872)
    at org.infernus.idea.checkstyle.checker.CheckFilesThread.runFileScanner(CheckFilesThread.java:30)
    at org.infernus.idea.checkstyle.checker.AbstractCheckerThread.processFilesForModuleInfoAndScan(AbstractCheckerThread.java:113)
    at org.infernus.idea.checkstyle.checker.CheckFilesThread.run(CheckFilesThread.java:39)
Caused by: ****\AnnotatorRep.java:12:7: Unexpected character 0x2030 in identifier
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaLexer.nextToken(GeneratedJavaLexer.java:405)
    at antlr.TokenStreamHiddenTokenFilter.consume(TokenStreamHiddenTokenFilter.java:38)
    at antlr.TokenStreamHiddenTokenFilter.nextToken(TokenStreamHiddenTokenFilter.java:134)
    at antlr.TokenBuffer.fill(TokenBuffer.java:69)
    at antlr.TokenBuffer.LA(TokenBuffer.java:80)
    at antlr.LLkParser.LA(LLkParser.java:52)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.postfixExpression(GeneratedJavaRecognizer.java:7637)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.unaryExpressionNotPlusMinus(GeneratedJavaRecognizer.java:7298)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.unaryExpression(GeneratedJavaRecognizer.java:7114)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.multiplicativeExpression(GeneratedJavaRecognizer.java:6980)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.additiveExpression(GeneratedJavaRecognizer.java:6931)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.shiftExpression(GeneratedJavaRecognizer.java:6874)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.relationalExpression(GeneratedJavaRecognizer.java:6697)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.equalityExpression(GeneratedJavaRecognizer.java:6648)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.andExpression(GeneratedJavaRecognizer.java:6619)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.exclusiveOrExpression(GeneratedJavaRecognizer.java:6590)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.inclusiveOrExpression(GeneratedJavaRecognizer.java:6561)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.logicalAndExpression(GeneratedJavaRecognizer.java:6532)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.logicalOrExpression(GeneratedJavaRecognizer.java:6503)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.conditionalExpression(GeneratedJavaRecognizer.java:2146)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.assignmentExpression(GeneratedJavaRecognizer.java:6265)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.expression(GeneratedJavaRecognizer.java:4782)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.expressionList(GeneratedJavaRecognizer.java:6039)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.argList(GeneratedJavaRecognizer.java:3436)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.enumConstant(GeneratedJavaRecognizer.java:3143)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.enumBlock(GeneratedJavaRecognizer.java:2656)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.enumDefinition(GeneratedJavaRecognizer.java:719)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.typeDefinitionInternal(GeneratedJavaRecognizer.java:569)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.typeDefinition(GeneratedJavaRecognizer.java:388)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.compilationUnit(GeneratedJavaRecognizer.java:201)
    at com.puppycrawl.tools.checkstyle.TreeWalker.parse(TreeWalker.java:470)
    at com.puppycrawl.tools.checkstyle.TreeWalker.processFiltered(TreeWalker.java:205)
    ... 13 more

I think the important bit here is:
Caused by: ****\AnnotatorRep.java:12:7: Unexpected character 0x2030 in identifier

Another file shows a similar failure

An error occurred while scanning a file.: An error occurred during a file scan.
org.infernus.idea.checkstyle.exception.CheckStylePluginParseException: An error occurred during a file scan.
    at org.infernus.idea.checkstyle.exception.CheckStylePluginException.wrap(CheckStylePluginException.java:32)
    at org.infernus.idea.checkstyle.checker.CheckFilesThread.run(CheckFilesThread.java:43)
Caused by: com.puppycrawl.tools.checkstyle.api.CheckstyleException: TokenStreamRecognitionException occurred during the analysis of file ****\characternormalization\CharacterMappings.java.
    at com.puppycrawl.tools.checkstyle.TreeWalker.processFiltered(TreeWalker.java:218)
    at com.puppycrawl.tools.checkstyle.api.AbstractFileSetCheck.process(AbstractFileSetCheck.java:79)
    at com.puppycrawl.tools.checkstyle.Checker.process(Checker.java:265)
    at org.infernus.idea.checkstyle.checker.CheckStyleChecker.processAndAudit(CheckStyleChecker.java:64)
    at org.infernus.idea.checkstyle.checker.CheckStyleChecker.scan(CheckStyleChecker.java:40)
    at org.infernus.idea.checkstyle.checker.FileScanner.lambda$checkPsiFile$12(FileScanner.java:90)
    at org.infernus.idea.checkstyle.checker.FileScanner$$Lambda$30/1070323125.apply(Unknown Source)
    at java.util.Optional.map(Optional.java:215)
    at org.infernus.idea.checkstyle.checker.FileScanner.checkPsiFile(FileScanner.java:90)
    at org.infernus.idea.checkstyle.checker.FileScanner.run(FileScanner.java:42)
    at com.intellij.openapi.application.impl.ApplicationImpl.runReadAction(ApplicationImpl.java:872)
    at org.infernus.idea.checkstyle.checker.CheckFilesThread.runFileScanner(CheckFilesThread.java:30)
    at org.infernus.idea.checkstyle.checker.AbstractCheckerThread.processFilesForModuleInfoAndScan(AbstractCheckerThread.java:113)
    at org.infernus.idea.checkstyle.checker.CheckFilesThread.run(CheckFilesThread.java:39)
Caused by: ****\characternormalization\CharacterMappings.java:31:7: expecting ''', found '€'
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaLexer.nextToken(GeneratedJavaLexer.java:405)
    at antlr.TokenStreamHiddenTokenFilter.consume(TokenStreamHiddenTokenFilter.java:38)
    at antlr.TokenStreamHiddenTokenFilter.nextToken(TokenStreamHiddenTokenFilter.java:134)
    at antlr.TokenBuffer.fill(TokenBuffer.java:69)
    at antlr.TokenBuffer.LA(TokenBuffer.java:80)
    at antlr.LLkParser.LA(LLkParser.java:52)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.expression(GeneratedJavaRecognizer.java:4781)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.expressionList(GeneratedJavaRecognizer.java:6039)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.argList(GeneratedJavaRecognizer.java:3436)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.postfixExpression(GeneratedJavaRecognizer.java:7605)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.unaryExpressionNotPlusMinus(GeneratedJavaRecognizer.java:7298)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.unaryExpression(GeneratedJavaRecognizer.java:7114)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.multiplicativeExpression(GeneratedJavaRecognizer.java:6980)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.additiveExpression(GeneratedJavaRecognizer.java:6931)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.shiftExpression(GeneratedJavaRecognizer.java:6874)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.relationalExpression(GeneratedJavaRecognizer.java:6697)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.equalityExpression(GeneratedJavaRecognizer.java:6648)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.andExpression(GeneratedJavaRecognizer.java:6619)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.exclusiveOrExpression(GeneratedJavaRecognizer.java:6590)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.inclusiveOrExpression(GeneratedJavaRecognizer.java:6561)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.logicalAndExpression(GeneratedJavaRecognizer.java:6532)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.logicalOrExpression(GeneratedJavaRecognizer.java:6503)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.conditionalExpression(GeneratedJavaRecognizer.java:2146)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.assignmentExpression(GeneratedJavaRecognizer.java:6265)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.expression(GeneratedJavaRecognizer.java:4782)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.traditionalStatement(GeneratedJavaRecognizer.java:5349)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.statement(GeneratedJavaRecognizer.java:4252)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.compoundStatement(GeneratedJavaRecognizer.java:3918)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.field(GeneratedJavaRecognizer.java:3320)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.classBlock(GeneratedJavaRecognizer.java:2542)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.classDefinition(GeneratedJavaRecognizer.java:633)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.typeDefinitionInternal(GeneratedJavaRecognizer.java:555)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.typeDefinition(GeneratedJavaRecognizer.java:388)
    at com.puppycrawl.tools.checkstyle.grammars.GeneratedJavaRecognizer.compilationUnit(GeneratedJavaRecognizer.java:201)
    at com.puppycrawl.tools.checkstyle.TreeWalker.parse(TreeWalker.java:470)
    at com.puppycrawl.tools.checkstyle.TreeWalker.processFiltered(TreeWalker.java:205)
    ... 13 more

where again, the cause seems to be a non-standard character:
Caused by: ****\characternormalization\CharacterMappings.java:31:7: expecting ''', found '€'

Since we work on language-focused applications, non-standard characters are simply part of our workflow, so we can't just remove or replace them. I would expect the Checkstyle plugin to either handle this properly, or simple skip the file, but it simply stops scanning the project.

@jshiell
Copy link
Owner

jshiell commented Sep 9, 2015

Hmm. What's your source file encoding?

@frederik-vaassen
Copy link
Author

UTF-8, both of them.

@jshiell
Copy link
Owner

jshiell commented Sep 9, 2015

Guess something has broken then, as anything UTF-8 should be fine. I'll have a butcher's. Thanks for the report.

@frederik-vaassen
Copy link
Author

Thanks! If you need a file to test on, I can send you a class file. I'd rather not put it up in public, though. :)

I've managed to work around the problem by replacing all non-standard characters with their escaped forms (\u2122 etc), but that does make it a lot less visual.

@jshiell
Copy link
Owner

jshiell commented Sep 9, 2015

Thanks. I'll have a play myself first - from the description it should be fairly straightforward to trigger. I'll shout if I have problems 😄

@jshiell
Copy link
Owner

jshiell commented Sep 9, 2015

Hmm. I stand corrected. The following is working fine for me, on OS X 10.10.4 (en_GB), IDEA 14.1.4/JDK 8u40 (JetBrains build) and project JDK 8u60.

public class ChårTést {


    public void méthød() {
        String süß = "Kuchen";
        System.out.println(süß + " ist süß");
        double € = 1.2;
        double £ = 2.4;
        double $ = (€ * 1.17) + (£ * 1.54);
    }

}

Any hints you can offer would be much appreciated. Anything you don't want public can be sent to james at infernus dot org.

It's curious that 4.19.1 triggered this as well, as the only things that changed were how we handle exceptions and upgrading CheckStyle. Or I'm missing something blatantly obvious, which wouldn't be the first time...

@frederik-vaassen
Copy link
Author

I can't say for sure that it's the upgrade to 4.19.1 that triggered this. It's possible I hadn't upgraded in a while (though I'm quite conscientious about upgrades usually). I'll shoot you an e-mail with those two classes.

@jshiell
Copy link
Owner

jshiell commented Sep 12, 2015

Thanks for the files, much obliged. The bad news is it all works perfectly here.

So I've tried revisiting my assumptions instead. The (rather old) code that wrote out the temporary files takes the content of the file, and writes it out as UTF-8. I've added a new branch so that if a virtual file exists it'll instead write it as binary, which will hopefully preserve the character set and so on without any fiddling about.

It works nicely here (i.e. as it did before), but given I can't reproduce the problem that's really rather meaningless. I've uploaded a test build with this change to my public Dropbox - if you have a chance, I'd appreciate it if you could give this build a try and report your results (as I don't have a Win64 dev box available to play on here).

@frederik-vaassen
Copy link
Author

Haha, looks like this is turning out to be a tricky one. :)

I've updated to your Dropbox version. Now when I run Checkstyle on one of the problematic files, it just gets stuck in "Scanning current file..." forever.

If there's no way for you to reproduce it, I understand this is a tricky one to fix. I can just carry on by replacing the literal characters with their escaped variants for now.

@jshiell
Copy link
Owner

jshiell commented Sep 14, 2015

I've had a productive night 😄 I downloaded one of the test.ie VMs (which are 32bit which is a little retro) and set up IDEA and gave it a try.

The good news:

  • I can reproduce the problem
  • The hang in 'scanning current file..' was another fix I was trying for parse errors. Mea culpa.

The bad news:

  • It looks to be a Checkstyle problem - it seems to occur on the fast path (i.e. the file hasn't been modified, so we just give the existing file name to Checkstyle instead of creating a temp copy).
  • It seems to happen via the Checkstyle command line tool as well.

Could you please try a scan with the Checkstyle 6.10.1 CLI tool - my expectation is that you'll see a TokenStreamRecognitionException. Given how this has gone so far, I wouldn't be surprised if I'm wrong!

If it does happen though, we might need to raise a bug with the Checkstyle team - I've had a quick bounce around their issue list, but can't see anything likely at present.

@frederik-vaassen
Copy link
Author

Wow, you've really gone all the way to reproduce this one :)

Confirmed, running the CLI tool gives me the same exception!

> java -jar checkstyle-6.10.1-all.jar -c checkstyle.xml CharacterMappings.java

TokenStreamRecognitionException occurred during the analysis of file CharacterMappings.java.
Checkstyle ends with 1 errors.

@jshiell
Copy link
Owner

jshiell commented Sep 15, 2015

I've a bugbear about character set bugs - the last one I had took me ages to track done, so I'm keen on sorting them as soon as they appear now!

I've raised a bug over on the Checkstyle side. Once they've sorted it and released a new Checkstyle, I'll get the plugin updated ASAP.

Thanks for your help with this!

@wdonet
Copy link

wdonet commented May 3, 2016

Adding sonar.sourceEncoding=UTF-8 to jenkins Analysis properties or in sonar-project.properties file worked for me

@jshiell jshiell closed this as completed Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants