Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[apex] Add validation of ApexDoc comments #1314

Merged
merged 7 commits into from
Sep 23, 2018
Merged

Conversation

jeffhube
Copy link
Contributor

@jeffhube jeffhube commented Aug 21, 2018

Please, prefix the PR title with the language it applies to within brackets, such as [java] or [apex]. If not specific to a language, you can use [core]

Before submitting a PR, please check that:

  • The PR is submitted against master. The PMD team will merge back to support branches as needed.
  • ./mvnw test passes.
  • ./mvnw pmd:check passes.
  • ./mvnw checkstyle:check passes. Check this for more info

PR Description:

This rule validates that:

  • ApexDoc comments are present for classes, methods, and properties that are public or global, excluding overrides and test classes (as well as the contents of test classes).
  • ApexDoc comments should contain @description.
  • ApexDoc comments on non-void, non-constructor methods should contain @return.
  • ApexDoc comments on void or constructor methods should not contain @return.
  • ApexDoc comments on methods with parameters should contain @param for each parameter, in the same order as the method signature.

Since comments are not present in the AST, this rule expects the ApexDoc comment to be the first non-whitespace token that starts above the line containing either the first annotation if present, otherwise the class/method/property name (since that is considered to be the start of the class/method/property node). This does require the source code in string form, which I am capturing in ApexRootNode, and also tokenizing in the rule using ApexLexer. Suggestions of better ways to handle this are welcome.

Finds ApexDoc Comment:

/**
 * @description Foo
 */
public class Foo { }

/**
 * @description Foo
 */
@Deprecated
public class Foo { }

/**
 * @description Foo
 */

public class Foo { }

Does not find ApexDoc comment:

/**
 * @description Foo
 */
// foo
public class Foo { }

/** @description Foo */ public class Foo { }

/**
 * @description Foo
 */
public class
Foo { }

@@ -8,4 +8,23 @@
<description>
Rules that are related to code documentation.
</description>

<rule name="ApexDoc"
since="9.9.9"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time, the next release would be 6.7.0

<properties>
<!-- relevant for Code Climate output only -->
<property name="cc_categories" value="Style" />
<property name="cc_remediation_points_multiplier" value="50" />
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the multiplier be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for explanation of remediation points: https://github.com/codeclimate/spec/blob/master/SPEC.md#remediation-points

In the implementation of our CodeClimateRenderer, we use the baseline of 50_000 (which is the estimated required time for a trivial fix) and multiply it with the property defined here: https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/renderers/CodeClimateRenderer.java#L124-L132

I see, that we use multipliers between 1 and 250. Fixing missing documentation can be time consuming and we provide here a rough guess across all types of missing doc (class documentation might take more effort than a property documentation).

IMHO, 50 sounds like a reasonable value to me.

@pmd-test
Copy link

1 Message
📖 No java rules are changed!

Generated by 🚫 Danger

@jsotuyod
Copy link
Member

@jeffhube thanks for the contribution. This is a very interesting approach to rebuilding comment info from source, I didn't realize we had that handleSourceCode override...

The approach however seems not very extensible (having the rule parse everything and find comments), so we should probably analyze these very interesting ideas you bring and see if we can work out a more extensible approach. Let us ponder a few days on this and come back to you with our thoughts.

@jsotuyod
Copy link
Member

jsotuyod commented Aug 27, 2018

I've been thinking about this, and I believe we should move the parsing to ApexTreeBuilder, and have the doc (comment starting with /**) be an actual AST tree (ASTFormalComment for consistency with Java). Other comments can be dropped for the time being (but this would be groundwork for implementing #1087).

Doing so would require:

  1. Have the lexer initialized in the ApexTreeBuilder constructor
  2. Have the ASTFormalComment node extend AbstractNode and implement Node. Notice we can't extend AbstractApexNode nor implement ApexNode as there is no underlying node from Jorje. This would probably need a new abstract class above AbstractApexNode to implement all Node methods without repetition between ASTFormalComment and AbstractApexNode.
  3. Have ApexTreeBuilder.build check if a node has a ASTFormalComment on top, and have it added to the tree structure.

Once that's done, the rule could be updated to simply visiting the nodes that require doc, and transversing the tree to look for those nodes without parsing.

@jsotuyod jsotuyod added the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Aug 28, 2018
@jeffhube
Copy link
Contributor Author

jeffhube commented Aug 28, 2018

Here is where I am at so far:

  • Added a class between AbstractApexNode<T> and AbstractNode named AbstractApexNodeBase (naming?)
  • Moved methods that aren't dependent on an underlying node from Jorje from AbstractApexNode<T> to AbstractApexNodeBase
  • Created ASTFormalComment which extends AbstractApexNodeBase
  • Added visit() for ASTFormalComment and AbstractApexNodeBase to ApexParserVisitor, ApexParserVisitorAdapter, and AbstractApexRule
  • ApexTreeBuilder tokenizes the source code in its constructor
  • ApexTreeBuilder constructs ASTFormalComment nodes in visitEnd() for UserClass / UserInterface / Method / Property, see more notes below.
  • Added test cases for interfaces and for methods and properties preceded by annotations
  • TODO: ASTFormalComment does not have its location set
  • TODO: ASTFormalComment should probably be the first child, currently it is the last

Construction of ASTFormalComment nodes

Currently the ASTFormalComment is built in visitEnd() because I need to look for an ASTAnnotation, and if I find one, adjust the location of where I look for a comment. The ASTAnnotation has been built and visited by the time visitEnd() is called for classes etc, so I can find it easily. Another approach would be to call traverse() on the Jorje node and find the Jorje annotation node before the ASTAnnotation is built.

And the reason for this mess is the incorrect positioning on ASTUserClass, ASTMethod, ASTModifierNode, etc. In the following code snippet, both ASTUserClass and its child ASTModifierNode start at the first character of the class name, Foo, after the class token and any modifiers. The ASTAnnotation, which is also a child of ASTUserClass , starts before it on the @ as you would expect. So that is why building the ASTFormalComment is currently handled in the relevant visit() methods and not in build() as you mentioned above.

@Deprecated
public class Foo { }

@jsotuyod
Copy link
Member

@jeffhube I'm impressed by your progress so far! Thanks for being so awesome.

TODO: ASTFormalComment should probably be the first child, currently it is the last

yes! definitely! We want the tree to mirror the code as much as possible!

One more thing I've just noticed, you seem to assume any previous formal comment belongs to a given class / method / property / etc. So in an example such as:

/**
 * Some comment
 */
public with sharing class Simple {
    public String someParam { get; set; }

    public void getInit() {
        someParam = "test"; 
    }
}

All 3, the class, the property AND the method will have an ASTFormalComment attached, with the "some comment" content.

@jeffhube
Copy link
Contributor Author

Thanks, @jsotuyod !

I've fixed the outstanding items I mentioned above. The ASTFormalComment is now the first child and has positioning info.

I've not been able to reproduce the issue you mentioned. Looking at that code snippet in the designer, I only see a single comment node, parented to the class. That case should also be covered by unit tests like this one:

    <test-code>
        <description>public method should have comment</description>
        <expected-problems>1</expected-problems>
        <code><![CDATA[
/**
 * @description Foo
 */
public class Foo {
    public void bar() { }
}
        ]]></code>
    </test-code>

Copy link
Member

@adangel adangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!
I've just a few minor things, but we can fix them upon merge.
This looks ready, WDYT @jsotuyod ?

return null;
}

private class ApexDocComment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a private static class

@@ -8,4 +8,23 @@
<description>
Rules that are related to code documentation.
</description>

<rule name="ApexDoc"
since="6.7.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by now 6.8.0 😄

<properties>
<!-- relevant for Code Climate output only -->
<property name="cc_categories" value="Style" />
<property name="cc_remediation_points_multiplier" value="50" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for explanation of remediation points: https://github.com/codeclimate/spec/blob/master/SPEC.md#remediation-points

In the implementation of our CodeClimateRenderer, we use the baseline of 50_000 (which is the estimated required time for a trivial fix) and multiply it with the property defined here: https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/renderers/CodeClimateRenderer.java#L124-L132

I see, that we use multipliers between 1 and 250. Fixing missing documentation can be time consuming and we provide here a rough guess across all types of missing doc (class documentation might take more effort than a property documentation).

IMHO, 50 sounds like a reasonable value to me.

super(klass.hashCode());
}

public void calculateLineNumbers(SourceCodePositioner positioner, int startOffset, int endOffset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not be public - it's enough, if it can be called from classes within the same package (net.sourceforge.pmd.lang.apex.ast).

class="net.sourceforge.pmd.lang.apex.rule.documentation.ApexDocRule"
externalInfoUrl="${pmd.website.baseurl}/pmd_rules_apex_documentation.html#apexdoc">
<description>
Identifies missing or incorrect ApexDoc comments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enhance the description from your PR description:

This rule validates that:

  • ApexDoc comments are present for classes, methods, and properties that are public or global, excluding
    overrides and test classes (as well as the contents of test classes).
  • ApexDoc comments should contain @description.
  • ApexDoc comments on non-void, non-constructor methods should contain @return.
  • ApexDoc comments on void or constructor methods should not contain @return.
  • ApexDoc comments on methods with parameters should contain @param for each parameter, in the same
    order as the method signature.

@adangel adangel removed the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Sep 11, 2018
@adangel adangel added this to the 6.8.0 milestone Sep 11, 2018
@jsotuyod
Copy link
Member

@jeffhube well done!

@adangel I'm having a hell of a week, but I'll review during the weekend at the very latest.

Copy link
Member

@jsotuyod jsotuyod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll edit when I merge.

}

public AbstractApexNodeBase(Class<?> klass) {
super(klass.hashCode());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all instances of a node type using this constructor would have the same id.. I'm not sure it would affect our current use, but maybe we should avoid this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, we already did it this way.. to be revised in a different PR

private TokenLocation getApexDocTokenLocation(int index) {
TokenLocation last = null;
for (TokenLocation location : tokenLocations) {
if (location.index >= index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tokenLocations list is sorted, we can probably be smarter about this search

ApexLexer lexer = new ApexLexer(stream);

ArrayList<TokenLocation> tokenLocations = new ArrayList<>();
Integer startIndex = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a primitive

ArrayList<TokenLocation> tokenLocations = new ArrayList<>();
Integer startIndex = 0;
Token token = lexer.nextToken();
Integer endIndex = lexer.getCharIndex();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a primitive

@Override
public Object visit(ASTAnnotation node, Object data) {
if (node.getImage().equals("IsTest")) {
inTestClass = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this variable is never being reset

@jsotuyod jsotuyod merged commit 86dc44d into pmd:master Sep 23, 2018
jsotuyod added a commit that referenced this pull request Sep 23, 2018
@jsotuyod
Copy link
Member

Merged! Thanks @jeffhube for the amazing job!

I fixed all comments and reworked the parsing a little...

  1. I do not take all token locations, only comments
  2. I create formal comments at visit start rather than visitEnd, which allows me to create them in order…
  3. … which allowed me to drop the searching and simply keep an iterator on the TokenLocations and consume them in order

This will be part of PMD 6.8.0. The groundwork for #1087 is now in place, I left a TODO there.

Thanks again for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants