Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpd] Found match is not maximal #4835

Closed
jdupak opened this issue Feb 25, 2024 · 3 comments
Closed

[cpd] Found match is not maximal #4835

jdupak opened this issue Feb 25, 2024 · 3 comments
Labels
a:false-negative PMD doesn't flag a problematic piece of code in:cpd Affects the copy-paste detector was:invalid

Comments

@jdupak
Copy link

jdupak commented Feb 25, 2024

Affects PMD Version: 6.55.0

image

void print_fence(short current_row, short fence_size) {
  short i = 0;
  if (current_row == 0 ||
      current_row == fence_size) { // checks for first and last row
    if (fence_size % 2 == 0) {     // checks if the fence is even
      for (; i < fence_size;
           i += 2) { // i += 2 because two characters are being printed
        printf("-|");
      }
    } else {
      printf("|");
      i++; // i++ because one character was printed
      for (; i < fence_size;
           i += 2) { // i += 2 because two characters are being printed
        printf("-|");
      }
    }
  } else {
    if (fence_size % 2 == 0) { // checks if the fence is even
      for (; i < fence_size;
           i += 2) { // i += 2 because two characters are being printed
        printf(" |");
      }
    } else {
      printf("|");
      i++; // i++ because one character was printed
      for (; i < fence_size;
           i += 2) { // i += 2 because two characters are being printed
        printf(" |");
      }
    }
  }
  printf("\n");
}

The for loop and if are also identical.

@jdupak jdupak added the a:false-negative PMD doesn't flag a problematic piece of code label Feb 25, 2024
@adangel adangel added the in:cpd Affects the copy-paste detector label Feb 25, 2024
@jsotuyod jsotuyod added the needs:pmd7-revalidation The issue hasn't yet been retested vs PMD 7 and may be stale label Mar 17, 2024
@jsotuyod jsotuyod added was:invalid a:false-negative PMD doesn't flag a problematic piece of code and removed needs:pmd7-revalidation The issue hasn't yet been retested vs PMD 7 and may be stale a:false-negative PMD doesn't flag a problematic piece of code labels Apr 2, 2024
@jsotuyod
Copy link
Member

jsotuyod commented Apr 2, 2024

This is actually not a false negative.

Even though the code is extremely similar, the literal strings " |" and "-|" are not identical, and therefore won't be part of the match.

If C/C++ supported --ignore-literals (which it currently doesn't), such differences could be optionally ignored to produce a larger match.

@jsotuyod jsotuyod closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024
@jdupak
Copy link
Author

jdupak commented Apr 2, 2024

Oh, I see it now. I got confused by my own visualization.

Anyway, could you give me some pointers, where in code the ignore literals is/should be handled. I could really use this so I will try to implement it myself.

@jsotuyod
Copy link
Member

jsotuyod commented Apr 2, 2024

@jdupak sure! Any contributions are more than welcomed!

First of, you would need to declare support for it on

public LanguagePropertyBundle newPropertyBundle() {
LanguagePropertyBundle bundle = super.newPropertyBundle();
bundle.definePropertyDescriptor(CpdLanguageProperties.CPD_IGNORE_LITERAL_SEQUENCES);
bundle.definePropertyDescriptor(CpdLanguageProperties.CPD_IGNORE_LITERAL_AND_IDENTIFIER_SEQUENCES);
bundle.definePropertyDescriptor(CPD_SKIP_BLOCKS);
return bundle;

By adding a line bundle.definePropertyDescriptor(CpdLanguageProperties.CPD_ANONYMIZE_LITERALS);

Then you would have to edit CppCpdLexer to firstly pick up the property value on the constructor, and then add a method override for processToken(TokenFactory tokenEntries, JavaccToken cppToken) to actually implement this.

Doing so basically requires first looking at cppToken.kind and determining if it is a literal of any kind (numbers, strings, etc.) and if so, overriding the image to a standard value for each literal kind (so all strings are seen as equals, all numbers are seen as equals, etc.).

Java implements exactly that here:

protected void processToken(TokenFactory tokenEntries, JavaccToken javaToken) {

For ease of comprehension, these are the relevant bits of the Java implementation (as it also does other transformations):

    @Override
    protected void processToken(TokenFactory tokenEntries, JavaccToken javaToken) {
        String image = javaToken.getImage();

        if (ignoreLiterals && (javaToken.kind == JavaTokenKinds.STRING_LITERAL
                || javaToken.kind == JavaTokenKinds.CHARACTER_LITERAL
                || javaToken.kind == JavaTokenKinds.INTEGER_LITERAL
                || javaToken.kind == JavaTokenKinds.FLOATING_POINT_LITERAL)) {
            image = JavaTokenKinds.describe(javaToken.kind);
        }

        tokenEntries.recordToken(image, javaToken.getReportLocation());
    }

I believe this should be straight-forward given this context and detail, but let me know if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:false-negative PMD doesn't flag a problematic piece of code in:cpd Affects the copy-paste detector was:invalid
Projects
None yet
Development

No branches or pull requests

3 participants