Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPlag crashes when parsing UTF-16 encoded files #1528

Open
jheinzel opened this issue Feb 5, 2024 · 1 comment
Open

JPlag crashes when parsing UTF-16 encoded files #1528

jheinzel opened this issue Feb 5, 2024 · 1 comment
Labels
bug Issue/PR that involves a bug language PR / Issue deals (partly) with new and/or existing languages for JPlag minor Minor issue/feature/contribution/change

Comments

@jheinzel
Copy link

jheinzel commented Feb 5, 2024

When JPlag tries to parse files encoded in UTF-16 it crashes.

When checking C++ projects it terminates with the following error log:

java -jar ..\lib\jplag.jar -l cpp2 -r .\jplag-bug\plagiarism-report .\jplag-bug\
2024-02-05-07:14:13_596 [main] [INFO] LanguageLoader - Available languages: '[C/C++ Scanner [basic markup], C/C++ Parser, C# 6 Parser, EMF metamodel, Go Parser, Javac based AST plugin, Kotlin Parser, Python3 Parser, R Parser, Rust Language Module, Scala parser, SchemeR4RS Parser [basic markup], Swift Parser, Text Parser (naive)]'
2024-02-05-07:14:13_650 [main] [INFO] ParallelComparisonStrategy - Start comparing...
line 1:0 token recognition error at: '?'
line 1:1 token recognition error at: '?'
line 2:0 token recognition error at: ''
line 3:0 token recognition error at: ''
line 4:0 token recognition error at: ''
line 4:2 token recognition error at: ''
line 5:0 token recognition error at: ''
...

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "java.util.List.iterator()" because "next.children" is null        
        at de.jplag.cpp2.CPPTokenListener.getDescendant(CPPTokenListener.java:375)
        at de.jplag.cpp2.CPPTokenListener.enterSimpleDeclaration(CPPTokenListener.java:326)
        at de.jplag.cpp2.grammar.CPP14Parser$SimpleDeclarationContext.enterRule(CPP14Parser.java:5639)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.enterRule(ParseTreeWalker.java:50)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:33)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at de.jplag.cpp2.CPPParserAdapter.scan(CPPParserAdapter.java:46)
        at de.jplag.cpp2.CPPLanguage.parse(CPPLanguage.java:48)
        at de.jplag.Submission.parse(Submission.java:249)
        at de.jplag.SubmissionSet.parseSubmissions(SubmissionSet.java:147)
        at de.jplag.SubmissionSet.parseAllSubmissions(SubmissionSet.java:103)
        at de.jplag.SubmissionSet.<init>(SubmissionSet.java:45)
        at de.jplag.SubmissionSetBuilder.buildSubmissionSet(SubmissionSetBuilder.java:78)
        at de.jplag.JPlag.run(JPlag.java:55)
        at de.jplag.cli.CLI.main(CLI.java:91)
2024-02-05-07:13:21_529 [main] [INFO] LanguageLoader - Available languages: '[C/C++ Scanner [basic markup], C/C++ Parser, C# 6 Parser, EMF metamodel, Go Parser, Javac based AST plugin, Kotlin Parser, Python3 Parser, R Parser, Rust Language Module, Scala parser, SchemeR4RS Parser [basic markup], Swift Parser, Text Parser (naive)]'
2024-02-05-07:13:21_578 [main] [INFO] ParallelComparisonStrategy - Start comparing...

When checking Java-based projects it terminates with the following error log:

java -jar ..\lib\jplag.jar -l java -r .\jplag-bug\plagiarism-report .\jplag-bug\
2024-02-05-07:09:39_947 [main] [INFO] LanguageLoader - Available languages: '[C/C++ Scanner [basic markup], C/C++ Parser, C# 6 Parser, EMF metamodel, Go Parser, Javac based AST plugin, Kotlin Parser, Python3 Parser, R Parser, Rust Language Module, Scala parser, SchemeR4RS Parser [basic markup], Swift Parser, Text Parser (naive)]'
2024-02-05-07:09:39_997 [main] [INFO] ParallelComparisonStrategy - Start comparing...
2024-02-05-07:09:40_642 [main] [WARN] Submission - Failed to parse submission student1 with error {}
de.jplag.ParsingException: failed to parse '...\jplag-bug\student1\App.java' with reason: error while visiting (ERROR)
failed to parse '...\jplag-bug\student1\App.java' with reason: error while visiting (ERROR)
failed to parse '...\jplag-bug\student1\App.java' with reason: error while visiting (ERROR)
failed to parse '...\jplag-bug\student1\App.java' with reason: error while visiting (ERROR)
failed to parse '...\jplag-bug\student1\App.java' with reason: error while visiting (ERROR)
failed to parse '...\jplag-bug\student1\App.
...

java' with reason: class, interface, enum, or record expected
        at de.jplag.ParsingException.wrappingExceptions(ParsingException.java:70)
        at de.jplag.java.JavacAdapter.parseFiles(JavacAdapter.java:59)
        at de.jplag.java.Parser.parse(Parser.java:25)
        at de.jplag.java.Language.parse(Language.java:47)
        at de.jplag.Submission.parse(Submission.java:249)
        at de.jplag.SubmissionSet.parseSubmissions(SubmissionSet.java:147)
        at de.jplag.SubmissionSet.parseAllSubmissions(SubmissionSet.java:103)
        at de.jplag.SubmissionSet.<init>(SubmissionSet.java:45)
        at de.jplag.SubmissionSetBuilder.buildSubmissionSet(SubmissionSetBuilder.java:78)
        at de.jplag.JPlag.run(JPlag.java:55)
        at de.jplag.cli.CLI.main(CLI.java:91)
2024-02-05-07:09:40_758 [main] [INFO] SubmissionSetBuilder - Summary of all Errors:
2024-02-05-07:09:40_005 [main] [ERROR] SubmissionSetBuilder - Ignore submission with invalid suffix: plagiarism-report.zip
2024-02-05-07:09:40_774 [main] [INFO] Parser - Summary of all Errors:
2024-02-05-07:09:40_601 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: unmappable character (0xFE) for encoding UTF-8
??package queues;
^
2024-02-05-07:09:40_601 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: unmappable character (0xFF) for encoding UTF-8
??package queues;
 ^
2024-02-05-07:09:40_601 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: illegal character: '\ufffd'
??package queues;
^
2024-02-05-07:09:40_604 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: illegal character: '\ufffd'
??package queues;
 ^
2024-02-05-07:09:40_604 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: illegal character: '\u0000'
??package queues;
  ^
2024-02-05-07:09:40_606 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: illegal character: '\u0000'
??package queues;
                  ^
2024-02-05-07:09:40_606 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: class, interface, enum, or record expected
??package queues;
                               ^
2024-02-05-07:09:40_607 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:1: error: illegal character: '\u0000'
??package queues;
                                ^
2024-02-05-07:09:40_607 [main] [ERROR] Parser - ...\jplag-bug\student1\App.java:2: error: illegal character: '\u0000'

...

2024-02-05-07:09:40_920 [main] [INFO] CLI - Summary of all Errors:
2024-02-05-07:09:40_725 [main] [ERROR] CLI - Not enough valid submissions! (found 1 valid submissions)
2024-02-05-07:09:40_920 [main] [INFO] SubmissionSet - Summary of all Errors:
2024-02-05-07:09:40_693 [main] [ERROR] SubmissionSet - ERROR -> Submission student1 removed
@tsaglam
Copy link
Member

tsaglam commented Feb 6, 2024

Using UTF-8 is recommended for JPlag. If students use different encodings, you can run JPlag with -d which copies the unparsable submission into an error folder. Then, you can convert them to UTF-8, replace the illegal characters, and run JPlag again with the fixed submissions. However, submissions that do not compile will still lead to errors.

There are two underlying issues:

  • For the cpp2 module, we rely on an ANTLR grammar that does not support special characters (see Issues with cpp2 language module #1427)
  • For Java, we try to guess the encoding if it is not UTF-8, however, doing so reliably is non-trivial.

@tsaglam tsaglam added bug Issue/PR that involves a bug minor Minor issue/feature/contribution/change language PR / Issue deals (partly) with new and/or existing languages for JPlag labels Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue/PR that involves a bug language PR / Issue deals (partly) with new and/or existing languages for JPlag minor Minor issue/feature/contribution/change
Projects
None yet
Development

No branches or pull requests

2 participants