Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Remove ParserOptions #2602

Closed
4 tasks done
oowekyala opened this issue Jun 18, 2020 · 5 comments
Closed
4 tasks done

[core] Remove ParserOptions #2602

oowekyala opened this issue Jun 18, 2020 · 5 comments
Labels
in:ast About the AST structure or API, the parsing step
Milestone

Comments

@oowekyala
Copy link
Member

oowekyala commented Jun 18, 2020

Parser options for now are just implemented for decoration. There's no API for PMD to configure them, and they can only be used if you use the parser directly.

  • It will likely be never supported to have a rule choose special parser options. This is because, if several rules need different sets of options, we'd have to parse files several time.
  • There would also be several ways to parse trees, meaning it's harder to get a hand on the API and start writing rules.
  • Nontrivial ParserOptions are implemented in the following modules only:
    • Ecmascript:
      public static final BooleanProperty RECORDING_COMMENTS_DESCRIPTOR = new BooleanProperty("recordingComments",
      "Specifies that comments are produced in the AST.", Boolean.TRUE, 3.0f);
      public static final BooleanProperty RECORDING_LOCAL_JSDOC_COMMENTS_DESCRIPTOR = new BooleanProperty(
      "recordingLocalJsDocComments", "Specifies that JsDoc comments are produced in the AST.", Boolean.TRUE,
      4.0f);
      public static final EnumeratedProperty<Version> RHINO_LANGUAGE_VERSION = new EnumeratedProperty<>(
      "rhinoLanguageVersion",
      "Specifies the Rhino Language Version to use for parsing. Defaults to Rhino default.", VERSION_LABELS,
      Version.values(), 0, Version.class, 5.0f);
      • The rhino version should be translated to a LanguageVersion, not a ParserOption
      • Preserving comments is done by default and we could just remove those options
    • [javascript] Remove JS parser options #2837
    • XML:
      public static final BooleanProperty COALESCING_DESCRIPTOR = new BooleanProperty("coalescing",
      "Specifies that the XML parser convert CDATA nodes to Text nodes and append it to the adjacent (if any) text node.",
      Boolean.FALSE, 3.0f);
      public static final BooleanProperty EXPAND_ENTITY_REFERENCES_DESCRIPTOR = new BooleanProperty(
      "expandEntityReferences", "Specifies that the XML parser expand entity reference nodes.", Boolean.TRUE,
      4.0f);
      public static final BooleanProperty IGNORING_COMMENTS_DESCRIPTOR = new BooleanProperty("ignoringComments",
      "Specifies that the XML parser ignore comments.", Boolean.FALSE, 5.0f);
      public static final BooleanProperty IGNORING_ELEMENT_CONTENT_WHITESPACE_DESCRIPTOR = new BooleanProperty(
      "ignoringElementContentWhitespace",
      "Specifies that the XML parser eliminate whitespace in element content. Setting this to 'true' will force validating.",
      Boolean.FALSE, 6.0f);
      public static final BooleanProperty NAMESPACE_AWARE_DESCRIPTOR = new BooleanProperty("namespaceAware",
      "Specifies that the XML parser will provide support for XML namespaces.", Boolean.TRUE, 7.0f);
      public static final BooleanProperty VALIDATING_DESCRIPTOR = new BooleanProperty("validating",
      "Specifies that the XML parser will validate documents as they are parsed. This only works for DTDs.",
      Boolean.FALSE, 8.0f);
      public static final BooleanProperty XINCLUDE_AWARE_DESCRIPTOR = new BooleanProperty("xincludeAware",
      "Specifies that the XML parser will process XInclude markup.", Boolean.FALSE, 9.0f);
      public static final BooleanProperty LOOKUP_DESCRIPTOR_DTD = new BooleanProperty("xincludeAware",
      "Specifies whether XML parser will attempt to lookup the DTD.", Boolean.FALSE, 10.0f);
      • The ASTs PMD produces should be "low-level" enough that code style issues can be detected. Coalescing CDATA nodes, or expanding entity references, defeats this purpose.
      • PMD should probably not be attempting to validate schemas in the parser. If anything schema validation should maybe be a rule.
      • Namespace awareness is kind of like symbol resolution in other languages and should be kept enabled by default
    • [xml] Remove xml dom rule and parser options #2971 [xml] Remove XML parser options

I think parser options can go away completely, and language properties (#2518) may be used to replace its only functionality (the comment marker). This would also allow suppress markers to be set per-language, as a happy consequence.

On master we need to

It's still ok to use ParserOptions on master (though not the language-specific subclasses), because LanguageVersionHandler::getParser takes a ParserOptions parameter for the suppress marker. In 7.0 we should however introduce an overload that takes no parameter.

@oowekyala oowekyala added the in:ast About the AST structure or API, the parsing step label Jun 18, 2020
@oowekyala oowekyala added this to the 7.0.0 milestone Jun 18, 2020
@adangel
Copy link
Member

adangel commented Jul 31, 2020

@oowekyala Is this necessary for the java-grammar branch? I'd really like to have the java-grammar branch finished to get rid of one branch before changing the API of PMD....

@oowekyala
Copy link
Member Author

It's not strictly necessary. But my understanding was that we would be rewriting Java rules to use the newer APIs (AST, typeres, symtable) on the java-grammar branch. The day we start doing that is probably the day we freeze the PMD 6 branch, because merge conflicts would become hard to handle. And also, I would rather rewrite rules once the complete API is reasonably done (ie also changes to core).

So in that understanding, we complete the API on 7.0.x while keeping java-grammar up until the API is done, and the new pmd-java APIs are stable, and we have ported rules (which will take a lot of time).

This assumes we want to keep all tests green in the 7.0.x branch, which is what we set out for in the beginning. But I agree that keeping java-grammar separate is really costly and it should be merged sooner rather than later.

If you're ok with ignoring the rule tests for pmd-java on the 7.0.x branch, then we could merge java-grammar into 7.0.x soon. Then we'd have time to figure out the API used by rules, and improving other areas of the core API, before rewriting rules (and with no separate branch). I'd still like to merge the type resolution PR into java-grammar though.

@adangel
Copy link
Member

adangel commented Aug 2, 2020

I seem to have a slightly different understanding :)

But my understanding was that we would be rewriting Java rules to use the newer APIs (AST, typeres, symtable) on the java-grammar branch.

Exactly. That means, we fix the Java rule to adapt to the new APIs in the java module, which is mostly these parts you mentioned: AST, typeres, symtable. AST is only affecting java module, typeres is a core API, but only used in java (should it even be a core API for other languages?), as well as symtable (same question...)

we complete the API on 7.0.x while keeping java-grammar up until the API is done

I'm not sure, we should do this, since then we need to update the API always twice: once in pmd/7.0.x and again in java-grammar
The question is also: How do we know, that we have the API on 7.0.x completed?

If you're ok with ignoring the rule tests for pmd-java on the 7.0.x branch, then we could merge java-grammar into 7.0.x soon.

IMHO we should not do this. Disabled tests means, we don't see, if something breaks. Not good.

And also, I would rather rewrite rules once the complete API is reasonably done (ie also changes to core).

That's understandable. The problem I see here is however, that we are changing too many things at once. If we change stuff in smaller steps, then we never need to rewrite rules, we "just" need to refactor them (that's of course just my theory, telling you "it won't be as bad as you expect" without proof...).

I'd suggest to finish first the java-grammar stuff and post-pone anything, that is not absolutely necessary. The branch java-grammar is mergeable, as soon as we have enabled the tests and the tests run successful again. As I understand, that last big part is typeres (#2689). Once that is done, we should enable test by test/rule by rule and work our way through.

@oowekyala
Copy link
Member Author

IMHO we should not do this. Disabled tests means, we don't see, if something breaks. Not good.

I don't think this is as bad as you think. The tests on java-grammar are disabled, and they are those that matter. Rules on the 7.0.x branch will see their implementation refactored, so having their tests pass... means nothing.

I see a value in having a branch on which all tests pass, and that is to do refactoring. But if we avoid refactoring on 7.0.x until all rules are ported, we waste this benefit, and we just pay the price of synchronizing the branches.

Compare the relative cost of

  • refactoring something on 7.0.x, then merging into java-grammar. The refactoring may look ok on 7.0.x, but may not even compile on java-grammar and take hours to merge. AND there may still be bugs uncovered by the tests of java-grammar, since the implementation of rules is different on java-grammar. A fix for those is extra costly if it needs to be backported to 7.0.x: you need to reimplement everything, taking care of not using APIs only present on java-grammar, go through review again, re-merge into java-grammar, where it necessarily conflicts...
  • refactoring something on a 7.0.x merged with java-grammar, that has tests disabled. The refactoring may introduce some bugs somewhere in the disabled tests. But at least we have made sure it compiles, and there's no merge. Tests that are disabled are already meaningless anyway. Enabling tests means also finding out the bugs. And we can rapidly ramp up the number of enabled tests for simpler rules.

AFAIU, your solution is to avoid refactoring anything on 7.0.x until java-grammar is done. This doesn't really prevent problems, since there has already been a lot of changes to 7.0.x, and since the 6.x release cycle is going on. We already waste a lot of time merging things.

So I'm all for merging java-grammar into 7.0.x ASAP. If we wait for rules to be done, then I'd like rules to be done ASAP. I don't want to spend 2 years updating rules one by one. I already have the impression, that the current point where java-grammar is could have been reached a year ago.

The question is also: How do we know, that we have the API on 7.0.x completed?

That's true. All rules need a different subset of features. That's why I think rules that can be ported today should be, and others should be postponed. Eg rules about comments could use the javadoc parser. Does that mean all rules need to wait for the javadoc parser before they're updated? I don't think so.

The main 7.0.x API I was referring to is to change the API of AbstractJavaRule to be more like the AbstractAntlrRule. Instead of making the rule class implement the visitor interface, we have the rule class produce a (generic) visitor, eg JavaVisitor<RuleContext, Void>. This allows us to have type safe access to the RuleContext within rules. This would mean introducing a language independent visitor rule class: since we have #2589, the implementation can be shared. Since we have #2617, the language-specific base rule classes are completely redundant.

If we change stuff in smaller steps, then we never need to rewrite rules, we "just" need to refactor them (that's of course just my theory, telling you "it won't be as bad as you expect" without proof...).

Hmm.. Maybe that's true for some rules, but many will need a near 100% rewrite. I don't have proof either, except that the couple of rules I've tried to port all have needed a 100% rewrite. Those are StringToString, UnnecessaryFullyQualifiedName, ForLoopCanBeForeach and UselessParentheses.

typeres is a core API, but only used in java (should it even be a core API for other languages?)

I don't think it should be. Type systems are completely language-specific and attempts to abstract functionality there are pipe dreams IMO. I think the auxclasspath option should be replaced with a language property.

In any case I'll do a write up about what is still missing in java-grammar. #2689 is the last major part, but there are other things that are incomplete in the symbol and AST API. Many rules don't depend on that though, and those may be already be updated.

@adangel
Copy link
Member

adangel commented Feb 19, 2021

Done with #3085 for PMD 7.

@adangel adangel closed this as completed Feb 19, 2021
@adangel adangel mentioned this issue Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:ast About the AST structure or API, the parsing step
Projects
None yet
Development

No branches or pull requests

2 participants