Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c] Parse exception with inline assembly #4635

Open
penguin42 opened this issue Jul 21, 2023 · 4 comments
Open

[c] Parse exception with inline assembly #4635

penguin42 opened this issue Jul 21, 2023 · 4 comments
Labels
a:bug PMD crashes or fails to analyse a file. in:cpd Affects the copy-paste detector in:grammar About the grammar of a lexer or parser, eg, a parse/lex exception

Comments

@penguin42
Copy link

Affects PMD Version:

7.0.0-rc3

Description:

I've been trying pmd on the Linux kernel source, and I'm getting a crash on the arch/arc subdirectory, I've narrowed it down to the file arch/arc/include/asm/entry-arcv2.h
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arc/include/asm/entry-arcv2.h

A clear and concise description of what the bug is.

The following crashes, originally found with just --dir arch but I narrowed it down to that file, which is actually arc asm.

$ pmd cpd --minimum-tokens 100 --language c --dir arch/arc/include/asm/entry-arcv2.h


[main] INFO net.sourceforge.pmd.cli.commands.internal.AbstractPmdSubcommand - Log level is at INFO
java.lang.IllegalArgumentException: Begin column must be >= 1, got 0
	at net.sourceforge.pmd.util.AssertionUtil.mustBe(AssertionUtil.java:203)
	at net.sourceforge.pmd.util.AssertionUtil.mustBe(AssertionUtil.java:199)
	at net.sourceforge.pmd.util.AssertionUtil.requireOver1(AssertionUtil.java:132)
	at net.sourceforge.pmd.lang.document.FileLocation.<init>(FileLocation.java:55)
	at net.sourceforge.pmd.lang.document.FileLocation.<init>(FileLocation.java:48)
	at net.sourceforge.pmd.lang.document.FileLocation.caret(FileLocation.java:169)
	at net.sourceforge.pmd.lang.ast.TokenMgrError.location(TokenMgrError.java:61)
	at net.sourceforge.pmd.lang.ast.FileAnalysisException.positionToString(FileAnalysisException.java:79)
	at net.sourceforge.pmd.lang.ast.FileAnalysisException.getMessage(FileAnalysisException.java:63)
	at java.base/java.lang.Throwable.getLocalizedMessage(Throwable.java:397)
	at java.base/java.lang.Throwable.toString(Throwable.java:496)
	at java.base/java.lang.String.valueOf(String.java:4216)
	at java.base/java.lang.StringBuilder.append(StringBuilder.java:173)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2050)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
	at picocli.CommandLine.execute(CommandLine.java:2170)
	at net.sourceforge.pmd.cli.PmdCli.main(PmdCli.java:18)

Code Sample demonstrating the issue:

see above header from kernel

Steps to reproduce:

Please provide detailed steps for how we can reproduce the bug.

  1. Check out a Linux git from the git listed above, I doubt it's actually that version sensitive but I'm at 1ef6663a587ba3e57dc5065a477db1c64481eedd
  2. pmd cpd --minimum-tokens 100 --language c --dir arch/arc/include/asm/entry-arcv2.h

Running PMD through: [CLI | Ant | Maven | Gradle | Designer | Other]

CLI

running on Fedora 38 Linux x86-64
$ java --version
openjdk 17.0.7 2023-04-18
OpenJDK Runtime Environment (Red_Hat-17.0.7.0.7-4.fc38) (build 17.0.7+7)
OpenJDK 64-Bit Server VM (Red_Hat-17.0.7.0.7-4.fc38) (build 17.0.7+7, mixed mode, sharing)

@penguin42 penguin42 added the a:bug PMD crashes or fails to analyse a file. label Jul 21, 2023
@oowekyala oowekyala added the in:cpd Affects the copy-paste detector label Jul 26, 2023
@oowekyala
Copy link
Member

Thanks for the report. I could reproduce it in master and also in #4397.

The problem is the following fragment (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arc/include/asm/entry-arcv2.h#n184):

	;    but that doesn't really matter
	bz	1f

The lexer doesn't know about assembly so it doesn't recognize the first line as a comment. It then tries to parse the part after the apostrophe as a character literal, which fails when it consumes up to the line ending.

There are two problems here

  • The lexer produces an error with a column number of zero, which is invalid (and causes the reported exception). That we can fix easily.
  • The lexer doesn't know about assembly-specific syntax. It looks like the .macro ... .endm delimiters are specific to the GNU assembler. Maybe we could recognize them in the grammar.

@oowekyala oowekyala added the in:grammar About the grammar of a lexer or parser, eg, a parse/lex exception label Jul 26, 2023
@penguin42
Copy link
Author

Thanks for the report. I could reproduce it in master and also in #4397.

The problem is the following fragment (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arc/include/asm/entry-arcv2.h#n184):

	;    but that doesn't really matter
	bz	1f

The lexer doesn't know about assembly so it doesn't recognize the first line as a comment. It then tries to parse the part after the apostrophe as a character literal, which fails when it consumes up to the line ending.

There are two problems here

* The lexer produces an error with a column number of zero, which is invalid (and causes the reported exception). That we can fix easily.

Great.

* The lexer doesn't know about assembly-specific syntax. It looks like the `.macro ... .endm` delimiters are specific to the [GNU assembler](https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC109). Maybe we could recognize them in the grammar.

I wouldn't bother recognising the asm necessarily; I think as long as you don't choke on it, then it's OK.

oowekyala added a commit to oowekyala/pmd that referenced this issue Jul 27, 2023
Ref pmd#4635

Col number of zero may happen when the error is reported
on a newline. In the rest of PMD a newline is considered
to sit at the last index of the previous line. So the index
is now the first character of the next line, which is slightly
wrong but that doesn't matter much.
@adangel
Copy link
Member

adangel commented Sep 28, 2023

I think, this can be reproduced for java with the following snipped:

public class Foo {
    String s = "abc";"
    String s2 = "xyz";
}

Notice the unbalanced quote in line 2...

@adangel adangel changed the title [c] java.lang.IllegalArgumentException: Begin column must be >= 1, got 0 [c] Parse exception with inline assembly Jan 5, 2024
@adangel
Copy link
Member

adangel commented Jan 5, 2024

The lexer produces an error with a column number of zero, which is invalid (and causes the reported exception). That we can fix easily.

This is fixed with #4694

The lexer doesn't know about assembly-specific syntax. It looks like the .macro ... .endm delimiters are specific to the GNU assembler. Maybe we could recognize them in the grammar.

This is still open.

@jsotuyod jsotuyod added the needs:pmd7-revalidation The issue hasn't yet been retested vs PMD 7 and may be stale label Apr 2, 2024
@oowekyala oowekyala removed the needs:pmd7-revalidation The issue hasn't yet been retested vs PMD 7 and may be stale label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:bug PMD crashes or fails to analyse a file. in:cpd Affects the copy-paste detector in:grammar About the grammar of a lexer or parser, eg, a parse/lex exception
Projects
None yet
Development

No branches or pull requests

4 participants