Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java] New expression and type grammar #1759

Merged
merged 197 commits into from
May 26, 2019

Conversation

oowekyala
Copy link
Member

@oowekyala oowekyala commented Apr 4, 2019

(This is testable with the jar in the wiki)

Main changes

There are many things still to do to have a consistent AST, eg the new ArrayTypeDims should be used in the other places it's meant for, Dimensionable should be removed, etc. This is already big enough (keep in mind, about half of the line changes are tests).

All the changes to the AST structure are already described on this wiki page in the category "Done".

There's one change that was not in the preview version you've already seen: the parameters of lambda expressions are now their own node, which removes inconsistencies between the case when there's a single VariableDeclaratorId and the case where there's a full FormalParameter.

Temporary measures

  • I didn't fix any tests of rules, I just kept them compilable, which sometimes involved shamelessly commenting out some code
  • The only tests I made sure did pass are the kotlin tests of the ast package
  • Nodes that are not pushed anymore by the parser are not removed, just deprecated (for the rules to compile).
  • They're not on the visitor interface anymore but there are some deprecated methods on the adapter and AbstractJavaRule to keep implementors compilable

PrimaryExpression

JavaCC doesn't support left-recursion, so the left recursive nodes are built by manipulating the jjtree stack.

You can see in alljavacc.xml that the method extendLeft is inserted in JJTreeJavaParserState.java. This bumps the arity of the current node (the one being built, ie is not on the stack) by one, which means that if the node ends up being closed, the node that was pushed immediately before the current one is added as a child of the current node.

The productions PrimaryPrefix and PrimarySuffix are kept but made #void.

  • The first parses the expressions that can only occur in start position, eg array allocation. It tries its best to not push an AmbiguousName with some lookaheads
  • The second parses a segment that needs a left-hand side, e.g. .foo() or ::new. The extendLeft method is called systematically by all of those productions to enclose the previous node on the stack, which necessarily exists because we've already parsed a PrimaryPrefix (or another suffix).

Disambiguation

The parser needs to parse AmbiguousNames to keep lookaheads to a minimum. The disambiguation that uses only syntactic context is implemented at various points in the parser and the initialization code of the nodes

  • The javadoc of ASTAmbiguousName gives an overview of that with relevant links.

Since this version already needs to reclassify some ambiguous names, the machinery to actually replace a name with an unambiguous version is already in place. A later PR can add a naive disambiguation pass that uses the current symbol table in order to remove most of the ambiguous names

Operator nodes (#1661)

That's implemented using basically the same template for all the relevant nodes. See AbstractLrBinaryExpr and eg the production ShiftExpression

Optional vs @Nullable

One thing I decided which may be controversial is not to use java.util.Optional in the AST API. Instead I went with JSR 305's @Nullable and @Nonnull annotations, for several reasons:

  • The Optional type was initially added as a form of documentation, specifically for return types (eg not for fields or method parameters). Nullable annotations achieve the same goal but may be used in all places where it's useful, without having to explicitly wrap and unwrap optionals everywhere.
  • Using annotations makes it possible to incrementally migrate the codebase to this better documented form, without introducing API breaking changes. Many existing methods that have a nullable result won't need to be broken.
  • Using annotations improves the Kotlin compatibility by a long range. The AST API can be deemed idiomatic in both Java and Kotlin.
  • The Optional API is very incomplete anyway, eg it lacks an isEmpty method, ifPresentOrElse, or or (added in Java 11, 9 and 9 respectively), there's still no ifEmpty. Many times you just want to check whether a node is present, in which case dealing with Optional instead of doing a simple null check is obnoxious. When you do want to use Optional features like map or stuff then Optional.ofNullable is easy to add.
  • The difference that made me switch is that it's possible to override a Nullable method with a Nonnull method, which is of course not possible with Optional. This allows more things to be abstracted. E.g. you can have an interface QualifiableExpr { @Nullable getLhs(); } and implement it in ASTArrayAccess with a @Nonnull, instead of having to return a never-empty Optional.

@NoAttribute

This is used to remove the XPath attributes of ASTAmbiguousName to the minimum. It has many irrelevant attributes because it implements ASTExpression and ASTType (maybe that should be changed though). We can eg also use it to remove the @Image attribute from the new nodes to avoid people using them. It's pretty simple but not tested yet.

Other stuff

  • I made it so that the children table is never null (just use an empty array). That makes it easier to work with and also ensures that when trying to fetch a child that doesn't exist, only ArrayIndexOutOfBoundsException is thrown, and never NullPointerException
  • The test DSL now asserts the text bounds of nodes are ordered by inclusion (which is messed up by touching with the jjtree stack). What's currently done on master with SwitchLabeledRule breaks this invariant, so that I can't extract those changes into a separate PR without bringing in some more changes from here.
  • I removed the SemanticCheck attribute from AmbiguousName, we can reuse the removed code to do the temporary disambiguation pass later
  • JavaParserVisitorAdapter now delegates by default to abstract supertypes of nodes. So eg overriding visit(ASTExpression) effectively visits all expressions.
    • This is error prone to maintain manually so we should probably generate the delegation code somehow (would be nice to have it in the default interface methods so that AbstractJavaRule can have the same behaviour without duplication).
    • I deprecated JavaParserVisitorReducedAdapter because it doesn't serve any purpose now. MethodLikeNode introduces a discrepancy because now LambdaExpression delegates to visit(ASTExpression) and not visit(ASTMethodLikeNode). I think it should be removed entirely, it makes LambdaExpression implement AccessNode (!). This can go with a simplification of the metrics framework. There's no reason that metrics be constrained to be computed on methods, most of them can actually be computed on any block of code...

@oowekyala oowekyala added the in:ast About the AST structure or API, the parsing step label Apr 4, 2019
@oowekyala oowekyala added this to the 7.0.0 milestone Apr 4, 2019
Copy link
Member

@jsotuyod jsotuyod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to look into the grammar itself in more detail, but this is looking really good so far

@jsotuyod
Copy link
Member

jsotuyod commented Apr 8, 2019

I'd love to see test cases for the resolved issues, as some of the new added methods for completeness.

@adangel adangel added the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label May 1, 2019
@adangel
Copy link
Member

adangel commented May 1, 2019

So far I have separated these out (for 7.0.x):

The rebased version of this PR is in my branch pr-1759-rebased2. Unfortunately not much smaller yet.
(Edit: need to compare pr-1759-newbase...pr-1759-rebased2, otherwise the two separated changes would be included).

all are generic (aka not java related) - although one or two changes need some adjustments as well to fix compile errors or tests.

Other things, that are generic, so should be separated and go into pmd/7.0.x:

  • the @Nonull stuff / checkersframework. It doesn't seem to have anything to do with expressions/grammar changes. Do we need it now? Or can we do this later?

Then there are the actual grammar changes, which should go into the java-grammar branch:

  • Array changes
  • Support for "LeftRecursiveNode" - fixing the node positions
  • Dealing with operators
  • Annotation changes
  • Literal changes
  • Lambda Expressions. The plan is, to move them in the grammar from being a child of PrimaryExpression to Expression (a expression is either a LambdaExpression or a AssignmentExpression).
  • Switch Expressions. Should be separated and an done in an extra PR. There is some TODO in the grammar. Needs lambda expressions to be finished, to be able to remove the workaround "inSwitchLabel"
  • Type changes
  • PrimaryExpression changes

@adangel
Copy link
Member

adangel commented May 3, 2019

Re: checker framework

I think, this is useful and we should use it in general as one of api design goals. Let me know if you agree/disagree.
That's why I created this section in the wiki https://github.com/pmd/pmd/wiki/PMD-7.0.0-API#api-general-design-goals

For this PR, I tend to keep the changes related to it in - although it modifies pmd-core/pom.xml ... But I think, adding the checker dependency in pmd-core is the correct way.

Update: Actually, I moved adding the dependency to pmd-core out, but left everything else in.

oowekyala added a commit that referenced this pull request May 18, 2019
From PR #1826

* This PR is for PMD 7
* It's extracted from #1759

Changes:
* Ensures that a node has it's parent, when it is added as a child.
* Makes `getXPathNodeName()` abstract by removing the default
* implementation.
oowekyala added a commit that referenced this pull request May 18, 2019
From PR #1824

* This PR is for PMD 7
* It's extracted from #1759

Changes:
* adds the new `assertTextRangeIsOk`, that is executed on every node in
* the kotlin-based tests
* this ensures, that the position of the nodes in the AST is the same as
* they appear in the source code
* For the new switch labeled rules, we have the following grammar

```
void SwitchStatement():
{}
{
  "switch" "(" Expression() ")" SwitchBlock()
}

void SwitchBlock() #void :
{}
{
    "{"
        (
            SwitchLabel()
            (
                "->" SwitchLabeledRulePart() (SwitchLabeledRule())*
            |
                ":" (LOOKAHEAD(2) SwitchLabel() ":")*
(BlockStatement())* (SwitchLabeledStatementGroup())*
            )
        )?
    "}"
}

void SwitchLabeledRulePart() #void:
{checkForSwitchRules();}
{
    (
        ( Expression() ";" ) #SwitchLabeledExpression(2)
      |
        ( Block() ) #SwitchLabeledBlock(2)
      |
        ( ThrowStatement() ) #SwitchLabeledThrowStatement(2)
    )
}
```

`#SwitchLabeledBlock(2)` takes the last two nodes (SwitchLabel + Block)
and adds them as SwitchLabeledBlock to SwitchStatement. JavaCC sets the
first token of SwitchLabeledBlock to be the Block node, but it should be
the SwitchLabel node.

This is fixed in the `jjtClose` methods in the three related
SwitchLabeled* nodes.

On the expression grammar PR, there is a better solution (you can mark
the nodes via the interface `LeftRecursiveNode`).

* So, this actually fixes a bug, that is present in PMD 6.
oowekyala added a commit that referenced this pull request May 18, 2019
From PR #1823

* This PR is for PMD 7.
* It's extracted from #1759

Changes:
* The children array in AbstractNode is now initialized with an empty
* array.
* This means, it is now never null, thus the null checks can be removed.
* The only change to the children array is, when adding a new child
* (`jjtAddChild`), or removing a child (`removeChildAtIndex`).

Future:
* the children array is protected. This means, sub classes could assign
* null to it... do we really need field available in subclasses? I'd
* assume, the already available methods are enough. -> this is something
* for defining a general AST API (which methods should be package
* private only, as they are only used by the parser, which methods
* define the API, when do we implement iterator, etc..)
oowekyala added a commit that referenced this pull request May 18, 2019
From PR #1827

* This PR is for PMD 7
* It's extracted from #1759

This only adds the dependency (compile time), and does not make use of
it yet.
I've added a section in the wiki:
https://github.com/pmd/pmd/wiki/PMD-7.0.0-API
We'll need to flesh out the details over time and verify our APIs, that
we have properly annotated them (if we all agree, that we use
`@Nullable`).
@oowekyala
Copy link
Member Author

@adangel I investigated the failing tests and fixed the following problems:

  • NumericLiteral really had a problem, the tests were not well written
  • The optimisation in 96ea140 wasn't tested well-enough, it made the parser parse (a) * 2 as a cast, then choke on the * token.
  • ReportTest and similar were failing, because violation suppression with annotation depends on type resolution for annotations. Since there is no Name anymore in annotations, it didn't work anymore. I added a sketchy fix in ClassTypeResolver: d2f3968
  • ASTTryStatementTest was failing with a ClassCastException. That is because the grammar for concise try-with-resources was changed in this PR. Instead of pushing a raw Name, the parser now pushes an Expression. The exception is fixed in 72c931c even though the occurrence finder probably doesn't work.

I also ignored some tests:

  • Metrics test, because metrics are just like rules
  • DocumentNavigator tests: because they relied on PrimarySuffix, and anyway will probably end up being replaced with [core] NodeStream API #1622
  • Some tests that depend on type resolution

With those changes, the remaining failing tests are either tests of type resolution, or of the symbol table. I think we should ignore them too while we're not done with the AST. And I think we should proceed with #1566 instead of fixing the current symbol table.

No numeric promotion is performed when comparing numeric
values.

Some implicit assertions are added to check an invariant
about literals.
Copy link
Member

@adangel adangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I finally got through the changes. I think we should merge this PR and then work step by step on the open TODOs.

Here are some general notes/TODOs:

Deprecations due on PMD 6.x

  • constructors or methods, that are package private now. We need to mark them on the master branch first.

  • maybe we should simply go through all AST nodes on master and mark them accordingly for once

  • Classes

    • ASTAdditiveExpression (constructor, getOperator())
    • ASTArgumentList (constructor)
    • ASTArrayInitializer (constructor)
    • ASTBooleanLiteral (constructor, setTrue())
    • ASTCastExpression (constructor, setIntersectionTypes/hasIntersectionType())
    • ASTCatchStatement (constructor)
    • ASTClassOrInterfaceType (constructor, isArray() and getArrayDepth())
    • ASTConditionalAndExpression (constructor)
    • ASTConditionalExpression (constructor)
    • ASTEnumConstant (constructor, getQualifiedName())
    • ASTEqualityExpression (constructor, getOperator())
    • ASTExplicitConstructorInvocation (constructor, setIsThis(), setIsSuper())
    • ASTLambdaExpression (constructor)
    • ASTMarkerAnnotation (constructor, getAnnotationName())
    • ASTMemberValueArrayInitializer (constructor)
    • ASTMemberValuePair (constructor)
    • ASTMethodDeclarator (constructor)
    • ASTMethodReference (constructor)
    • ASTMultiplicativeExpression (constructor, getOperator())
    • ASTName (constructor)
    • ASTNormalAnnotation (constructor, getAnnotationName())
    • ASTNullLiteral (constructor)
    • ASTPostfixExpression (constructor)
    • ASTPrimitiveType (constructor)
    • ASTRelationalExpression (constructor)
    • ASTShiftExpression (constructor, getOperator())
    • ASTSingleMemberAnnotation (constructor, getAnnotationName())
    • ASTSwitchStatement (constructor)
    • ASTTypeBound (getBoundTypeNodes())
    • ASTTypeParameter (constructor)
    • ASTUnaryExpression (constructor)

Nodes/classes that are deprecated on PMD 7, but not on PMD 6

  • should we remove them directly in PMD 7 and deprecate in 6?

  • we need to deprecate them at least in PMD 6 Since we have no replacement for them on 6.0.x, we won't deprecate them (it would only be noise to users). Instead we should keep next_major_development.md up-to-date with the planned deprecations, and maybe link to it in the minor release notes to keep users informed.

    • ASTAllocationExpression
    • ASTArguments
    • ASTArrayDimsAndInits
    • ASTAssignmentOperator
    • ASTMemberSelector
    • ASTMemberValuePairs
    • ASTTypeArgument
    • ASTUnaryExpression (method getOperator())
    • ASTUnaryExpressionNotPlusMinus
    • ASTVariableInitializer
    • ASTWildcardBounds
    • JavaParserVisitorReducedAdapter

Nodes that are moved from class to interface

  • How do we mark these? Should we mark these nodes in PMD 6 as deprecated as well (is this useful at all?)?
    • ASTAnnotation
    • ASTLiteral
    • ASTMemberValue
    • ASTPrimaryExpression
    • ASTReferenceType
    • ASTType
    • ASTVariableInitializer (note: this is also deprecated in PMD 7...) Yes this is kept for compatibility with rules but is not necessary

  • Double File License: Some AST nodes have the license header twice. Not sure, if I introduced this during rebase... oowekyala: This is because when I copy paste node files to create new ones, IntelliJ ignores the copyright comment if it starts with a double star like a javadoc one, and inserts a new comment.
  • Commented out code and removed test cases - we need to revisit later and not to forget
    • NameFinder
    • ClassTypeResolver
    • ASTAssignmentOperatorTest
    • ASTFieldDeclarationTest.testGetVariableName()
    • ASTLiteralTest
    • ASTPrimarySuffixTest

docs/pages/next_major_development.md Show resolved Hide resolved
docs/pages/next_major_development.md Outdated Show resolved Hide resolved
docs/pages/next_major_development.md Outdated Show resolved Hide resolved
docs/pages/next_major_development.md Outdated Show resolved Hide resolved
docs/pages/next_major_development.md Outdated Show resolved Hide resolved
*
* @throws NoSuchElementException If there's less than n tokens to the left of the anchor.
*/
// test only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean, this method is public only for tests? Do we need an annotation @TestOnly?

@adangel adangel merged commit d75b6de into pmd:java-grammar May 26, 2019
@adangel adangel removed the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Jun 3, 2019
@oowekyala oowekyala deleted the grammar-expressions branch June 4, 2019 01:57
oowekyala added a commit to oowekyala/pmd that referenced this pull request Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:ast About the AST structure or API, the parsing step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants