-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc] Clarify how CPD --ignore-literals
and --ignore-identifiers
work
#4676
Comments
--ignore-literals
and --ignore-identifier
do not work
Thanks for reporting! Note, that the CLI syntax has changed with PMD 7 - and the documentation on https://docs.pmd-code.org/latest/pmd_userdocs_cpd.html should be updated to make it clear, how to enable/disable various flags. Probably provide some examples and also mention this in the migration guide. By default, identifiers are supposed to be not ignored. If you want to have identifiers ignored, you just add
Using your test file, I couldn't see any difference when adding |
--ignore-literals
and --ignore-identifier
do not work --ignore-literals
and --ignore-identifiers
do not work
I am very glad to receive your reply! Is there any way to know the similarity rate of the duplicates? Thanks! |
Not sure, if we can calculate this. We use the Rabin-Karp algorithm, which is implemented in MatchAlgorithm. I think, when we find a match, then it is a perfect match (similarity rate 100%). If you use the flags We would need to compare at the end the two code snippets with the original identifiers. Then we probably could calculate something like a similarity rate. |
After calling PMD with the correct parameters and using a proper test file, I can confirm, that everything still works as it should. Extract this ZIP into a directory
They have almost the same content, but each enum value is suffixed with 2 in the second file. When running CPD with the defaults: $ pmd-bin-7.0.0-rc3/bin/pmd cpd --language java --format xml --minimum-tokens 45 -d src
[main] INFO net.sourceforge.pmd.cli.commands.internal.AbstractPmdSubcommand - Log level is at INFO
<?xml version="1.0" encoding="UTF-8"?>
<pmd-cpd>
<file path="/home/andreas/PMD/source/pmd-it/issue-4676-cpd-cli/src/CommonMethodEnum.java"
totalNumberOfTokens="137"/>
<file path="/home/andreas/PMD/source/pmd-it/issue-4676-cpd-cli/src/CommonMethodEnum2.java"
totalNumberOfTokens="137"/>
</pmd-cpd> No duplications are detected, because the enum values differ. But if called with "ignoreIdentifiers": $ pmd-bin-7.0.0-rc3/bin/pmd cpd --language java --format xml --ignore-identifiers --minimum-tokens 45 -d src
[main] INFO net.sourceforge.pmd.cli.commands.internal.AbstractPmdSubcommand - Log level is at INFO
<?xml version="1.0" encoding="UTF-8"?>
<pmd-cpd>
<file path="/home/andreas/PMD/source/pmd-it/issue-4676-cpd-cli/src/CommonMethodEnum.java"
totalNumberOfTokens="137"/>
<file path="/home/andreas/PMD/source/pmd-it/issue-4676-cpd-cli/src/CommonMethodEnum2.java"
totalNumberOfTokens="137"/>
<duplication lines="34" tokens="113">
<file begintoken="0"
column="1"
endcolumn="12"
endline="34"
endtoken="112"
line="1"
path="/home/andreas/PMD/source/pmd-it/issue-4676-cpd-cli/src/CommonMethodEnum.java"/>
<file begintoken="138"
column="1"
endcolumn="12"
endline="34"
endtoken="250"
line="1"
path="/home/andreas/PMD/source/pmd-it/issue-4676-cpd-cli/src/CommonMethodEnum2.java"/>
<codefragment><![CDATA[public enum CommonMethodEnum {
SUB_LIST("subList"), /* List.subList(int start, int end) */
GET_BYTES1("getBytes"), /* String.getBytes(...) */
GET("get"), /* List.get(int index) / FileItem.get() / Properties.get() / Map.get() */
SET("set"), /* set() / Collection.set(int index, E element) */
GET_BYTES("getBytes"), /* String.getBytes(...) */
GET_NAME("getName"), /* ZipEntry.getName(), Class.getName() */
GET_SIZE("getSize"), /* ZipEntry.getSize() */
GET_INSTANCE("getInstance"), /* Calendar.getInstance() */
GET_AND_INCREMENT("getAndIncrement"), /* AtomicInteger.getAndIncrement */
LOCK("lock"), /* Lock.lock() */
TRY_LOCK("tryLock"), /* Lock.tryLock() */
VALIDATE("validate"), /* ValidatorForm.validate */
PARSE_INT("parseInt"), /* Integer.parseInt */
PARSE_BYTE("parseByte"), /* Byte.parseByte */
PARSE_SHORT1("parseShort"), /* Short.parseShort */
PARSE_LONG("parseLong"), /* Long.parseLong */
PARSE_FLOAT("parseFloat"), /* Float.parseFloat */
PARSE_DOUBLE1("parseDouble"), /* Double.parseDouble */
PARSE_BOOLEAN("parseBoolean"), /* Boolean.parseBoolean */
SUBMIT("submit"), /* ThreadPoolExecutor.submit(), ExecutorService.submit() */
EXECUTE("execute"); /* ThreadPoolExecutor.execute() */
private final String name;
private CommonMethodEnum(String name) {]]></codefragment>
</duplication>
</pmd-cpd> A duplication is detected as expected. So, I think this is not a code bug, but just documentation. By the way, you could calculate how much of a file is duplicated: E.g. CommonMethodEnum.java has in total 137 tokens, and it contains a duplication with length 113 tokens, so 82% of the file content is duplication. Let me know, if it still doesn't work for you (e.g. I ran into the error, that I used |
--ignore-literals
and --ignore-identifiers
do not work --ignore-literals
and --ignore-identifiers
work
If tested a bit further, and with the current SNAPSHOT version of PMD 7, the CLI interface is even better unified and behaves more correct (thanks to #4397). E.g. "-language java" throws an error, that the language "anguage" doesn't exist ("-l" is the short option of "--language"). So, you can't call CPD wrongly in that case. I've also verified behavior of PMD 6.55.0, and:
So, The PMD 7 SNAPSHOT version supports such a "main" parameter and interprets it as part of the file list to be analyzed. Since "false" does not exist, it will log an error:
In summary, this is really just a documentation issue. |
Hi, I have called PMD with the correct parameters(pmd-bin-7.0.0-rc3 and pmd-bin-6.55.0), and it's excited that I got the expected result! It's really great! I tried to reflect on why I did not find it was caused by my wrong use of parameters in time. I did not add any redundant parameters at the beginning, but I still seemed to encounter this problem. And then I found that one reason was that I was misleading by the output. The first time I use CPD, the command is: \pmd-bin-7.0.0-rc3\bin> .\pmd.bat cpd --minimum-tokens 45 --dir "...\example" the test file is like: public class Example{
public void theFirstMethod(List<String> strList) {
for (String str : strList) {
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
}
}
public void theSecondMethod(List<String> strList) {
for (String str : strList) {
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
}
}
} and the result is: WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/C:/Users.../pmd-bin-7.0.0-rc3/lib/groovy-2.4.21.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[main] INFO net.sourceforge.pmd.cli.commands.internal.AbstractPmdSubcommand - Log level is at INFO
Found a 9 line (58 tokens) duplication in the following files:
Starting at line 2 of ...\example\Example.java
Starting at line 12 of...\example\Example.java
public void theFirstMethod(List<String> strList) {
for (String str : strList) {
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
}
} As we can see, the source codes on line 2(method name is Now I realize it's because the duplicates starts in one of the columns of the starting line. If I use ...\pmd-bin-7.0.0-rc3\bin> .\pmd.bat cpd --minimum-tokens 45 --dir "...\example" --format xml
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:.../pmd-bin-7.0.0-rc3/lib/groovy-2.4.21.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[main] INFO net.sourceforge.pmd.cli.commands.internal.AbstractPmdSubcommand - Log level is at INFO
<?xml version="1.0" encoding="UTF-8"?>
<pmd-cpd>
<file path="...\example\Example.java"
totalNumberOfTokens="127"/>
<duplication lines="9" tokens="58">
<file begintoken="7"
column="31"
endcolumn="6"
endline="10"
endtoken="64"
line="2"
path="...\example\Example.java"/>
<file begintoken="68"
column="32"
endcolumn="6"
endline="20"
endtoken="125"
line="12"
path="...\example\Example.java"/>
<codefragment><![CDATA[ public void theFirstMethod(List<String> strList) {
for (String str : strList) {
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
}
}]]></codefragment>
</duplication>
</pmd-cpd> the column has been shown. However, the codefragment still have How about when the output is accurate to line(with no column), if the repeating code starts at the beginning of the line, the beginning line counts as this line, and when the repeating code starts in the middle of the line, the beginning line counts as the next line? For example, in this case, when the command is \pmd-bin-7.0.0-rc3\bin> .\pmd.bat cpd --minimum-tokens 45 --dir "...\example" the result is: Starting at line 3 of ...\example\Example.java
Starting at line 13 of...\example\Example.java
for (String str : strList) {
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
System.out.println(str);
}
} Best wishes! |
And in
Could you please paste the url here of \pmd-bin-7.0.0-rc3\bin> .\pmd.bat cpd -language java --format xml --minimum-tokens 45 -d "...\CommonMethodEnum"
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:.../pmd-bin-7.0.0-rc3/lib/groovy-2.4.21.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[main] INFO net.sourceforge.pmd.cli.commands.internal.AbstractPmdSubcommand - Log level is at INFO
<?xml version="1.0" encoding="UTF-8"?>
<pmd-cpd/> Thanks a lot! |
I started a separate issue #4725 to track the issue with the text renderer and start / end columns to keep things easier to track. I'm closing this as the original issue was addressed with better documentation already. |
Affects PMD Version:
pmd-bin-7.0.0-rc3
Description:
I use the cpd function, and the command is as follows:
the result is as follows:
The content from row 2 to 16 is different from row 20 to 30. Args
--ignore-literals
and--ignore-identifiers
have been set to false. The expected result is no duplicates be found which contradicts the actual result.Is there something wrong with my args? What should I do if I want to reach my expectations?
And is there any way to know the similarity rate of the duplicates?
Thanks so much!
Exception Stacktrace:
Code Sample demonstrating the issue:
Steps to reproduce:
Please provide detailed steps for how we can reproduce the bug.
mvn clean verify
)Running PMD through: [CLI | Ant | Maven | Gradle | Designer | Other]
The text was updated successfully, but these errors were encountered: