New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Update cpd.md with information about risks #1361

Merged
merged 1 commit into from Oct 27, 2018

Conversation

Projects
None yet
5 participants
@davidmichaelkarr
Contributor

davidmichaelkarr commented Sep 25, 2018

The current CPD documentation covers very well how to use it, but there was no information here that describes why someone would care about how much duplicate code they have. The developers I work with want to "cheat" on their SonarQube quality gates by excluding some code from CPD checks, because they don't really understand what the problem is. Ironically, the main risk from duplicate code is when that code unintentionally becomes not duplicate anymore, creating a hidden problem that may or may not erupt on you.

The text I've added here is excerpted from a short tech note I wrote to my team on the subject.

Fixes #1349

Before submitting a PR, please check that:

  • The PR is submitted against master. The PMD team will merge back to support branches as needed.
  • ./mvnw test passes.
  • ./mvnw pmd:check passes.
  • ./mvnw checkstyle:check passes. Check this for more info

PR Description:

Update cpd.md with information about risks
The current CPD documentation covers very well how to use it, but there was no information here that describes why someone would care about how much duplicate code they have.  The developers I work with want to "cheat" on their SonarQube quality gates by excluding some code from CPD checks, because they don't really understand what the problem is. Ironically, the main risk from duplicate code is when that code unintentionally becomes not duplicate anymore, creating a hidden problem that may or may not erupt on you.

The text I've added here is excerpted from a short tech note I wrote to my team on the subject.
@pmd-test

This comment has been minimized.

pmd-test commented Sep 25, 2018

1 Message
📖 No java rules are changed!

Generated by 🚫 Danger

@jsotuyod jsotuyod added this to the 6.8.0 milestone Sep 25, 2018

If you're analyzing code that never changes, it's reasonable to assume that you don't need to care too much about this. Risk comes in when you're working with complex code that can change over time.
#### Refactoring known duplications can be messy

This comment has been minimized.

@oowekyala

oowekyala Sep 28, 2018

Member

Don't these sections basically say the same thing? I think your case may be summed up in a few bullet points:

  • Assuming duplicated blocks of code are supposed to do the same thing, any refactoring, even simple, must be duplicated too -- which is unrewarding grunt work, and puts pressure on the developer to find every place in which to perform the refactoring.
    • Failure to keep the code in sync may mean automated tools will no longer recognise these blocks as duplicates. This means the task of finding duplicates to keep them in sync when doing subsequent refactorings can no longer be entrusted to an automated tool -- and humans do mistakes. Segments of code supposed to do the same thing may grow apart undetected upon further refactoring.
  • If the code may never change in the future, then this is not a problem
  • Otherwise, the most viable solution is to not duplicate. Possible ways to remove duplicates:
    • When the duplication is local to a method or single class: extract a local variable/ method
    • When the duplication occurs in siblings within a class hierarchy: extract a method and pull it up in the class hierarchy/ use the Template Method design pattern
    • When the duplication occurs in unrelated hierarchies: use the Strategy pattern
    • ...?

This comment has been minimized.

@davidmichaelkarr

davidmichaelkarr Oct 1, 2018

Contributor

Which specific sections are you suggesting say the same thing?

I don't know that I would be willing to "sum up" these ideas.

Your mention of refactoring strategies is a good idea, but that calls for spelling out that information in text, not just summarizing in bullet points.

This comment has been minimized.

@oowekyala

oowekyala Oct 12, 2018

Member

These excerpts from the section about the first risk just states that the blocks need to be kept in sync without exactly explaining why it would be bad for them not to be dups anymore:

you still have to make sure that you’ve found all the duplicated blocks and made the exact same change in each one, not just so they are logically correct, but so that they are still duplicate blocks.

you have to ensure that they are still recognized as duplicate blocks,

At this point we could ask ourselves: why would we want them to stay duplicates? The next section clarifies that by describing consequences of syncing mistakes:

QUOTE A: making it possible that the person doing the refactoring simply won’t know about that code that needs to change. After that refactoring, you now have two blocks of code that were supposed to do the same thing, but now they are not.

But it also kind of repeats what was stated in the first section, which adds noise IMHO:

If you are not careful when refactoring one or more of the code blocks, you could accidentally leave them so they are no longer recognized as duplicates by the scanning technology, but they are still functionally identical.

In the third section:

If someone makes logic changes to a block of code that has exact or logical duplicates somewhere else, while not considering those duplicates in any way, leads to the same situation at the end of the previous risk, being blocks that should have been doing the same thing, but now are not, and it will be very difficult for anyone to know that.

It seems to me that the bold part is the risk meant to be exposed by this sentence, correct? In that case, I think it was already stated in the previous section (italic part of quote A). For this reason, I don't really know how the third section explains a different scenario from the second. Would you care to explain about that?


IMHO the problem is that the fact they must be kept in sync is not a risk per se, it's a fact. The risk stems from syncing mistakes though, and is that duplicates may not do the same thing after a while. This is undesirable for many reasons, and aggravated by the fact that automated tools don't help anymore (which is not really a risk either).

I don't intend the doc to be made in bullet points, I just used them to present another structure for this documentation, that avoids repetition. E.g. the first bullet point may be a section, and similarly for others.

This comment has been minimized.

@adangel

adangel Oct 14, 2018

Member

Maybe we should mention the DRY-Principle, since that's the thing, that CPD tries to enforce: avoiding repetition/duplication of parts, that look the same (and hence seem to be related in some way) and should therefore be refactored, to avoid the risks you mentioned.

@adangel adangel modified the milestones: 6.8.0, 6.9.0 Sep 29, 2018

@davidmichaelkarr

This comment has been minimized.

Contributor

davidmichaelkarr commented Oct 14, 2018

@adangel adangel changed the title from Update cpd.md with information about risks to [doc] Update cpd.md with information about risks Oct 17, 2018

@oowekyala oowekyala merged commit b80b99b into pmd:master Oct 27, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@davidmichaelkarr

This comment has been minimized.

Contributor

davidmichaelkarr commented Oct 28, 2018

Oh. I was sort of surprised to see this merged, considering the discussion that had been going on about it. I'm not complaining, but I did think some reasonable points were being presented. On the other hand, maybe it's better to just get this out there and see how it feels.

@oowekyala

This comment has been minimized.

Member

oowekyala commented Oct 29, 2018

@davidmichaelkarr You're absolutely welcome to propose new changes to the documentation if you think it makes sense. The reason I went ahead was that we wanted to include this in release 6.9.0, which went out today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment