Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Language properties #2518

Closed
Tracked by #3898
oowekyala opened this issue May 21, 2020 · 17 comments · Fixed by #4060
Closed
Tracked by #3898

[core] Language properties #2518

oowekyala opened this issue May 21, 2020 · 17 comments · Fixed by #4060
Labels
a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community an:enhancement An improvement on existing features / rules
Projects
Milestone

Comments

@oowekyala
Copy link
Member

oowekyala commented May 21, 2020

I think the most generic solution to the problems we have when it comes to configuring languages, is to configure Language instances directly via properties.

Example use cases:

The core of this proposal is to make LanguageRegistry non-static, and giving it a proper lifecycle. This makes it so, that analysis-global state (like the classloader) can be stored in the language instance, and shared with eg the parser and other processing stages, without having to jump through hoops to figure out if we have the same parameters as before.

Consider making the service that is loaded by ServiceLoader not Language directly, but a LanguageLoader. This object creates a Language instance with property descriptors. The initialization takes care of configuring the language with all those things mentioned above.

For example, a LanguageLoader interface could look like so:

interface LanguageLoader<P extends PropertySource> {

    String id();

    P newPropertyBundle();

    Language createLanguage(P properties);

}
  • newPropertyBundle creates a PropertySource and defines the PropertyDescriptors accepted by the language
  • createLanguage creates a language instance given a configuration

Eg the java implementation of this would be

class JavaLanguageLoader implements LanguageLoader<JavaLanguageProperties> {


    @Override
    public String id() {
        return JavaLanguageModule.TERSE_NAME;
    }

    @Override
    public JavaLanguageProperties newPropertyBundle() {
        return new JavaLanguageProperties(); // defines some properties (auxclasspath, suppressMarker, filePatterns)
    }

    @Override
    public Language createLanguage(JavaLanguageProperties properties) {
        // in this language instance we can initialize the special classloader before analysing the first file
        return new JavaLanguageModule(
            properties.getFilePatterns(), // more general than file extensions, eg supports "pom.xml"
            properties.getSuppressMarker(),
            properties.getAuxclasspathClassloader()
        );
    }
}

Language properties can be set with this simple CLI extension: -L<langId>:<propName> <value>.
For example -Lxml:fileNamePatterns 'pom.xml,*.fxml', or -Ljava:auxclasspath 'some;cp'.

The initialization process of a language registry would look like so:

  1. LanguageRegistry uses ServiceLoader to load a bunch of LanguageLoader instances
  2. -L options of the command line args are partitioned by language ID
  3. Foreach language loader, create a language instance like so:
    1. create a property bundle (newPropertyBundle)
    2. call setProperty for each -L switch targeting the given language (this will throw errors for misconfigurations)
    3. call createLanguage with that property bundle
  4. Create a LanguageRegistry wrapping this set of languages

Language and LanguageRegistry can implement AutoCloseable. This allows the language registry to be used in a try statement.
So, very high in the PMD call stack, we can have something like

    public static void main(String[] args) {
           
        // langProperties := properties from the CLI, partitioned by language 
        // serviceClassLoader := classloader used by ServiceLoader
        // LanguageRegistry::loadWithProperties is the loading routine described above

        try (LanguageRegistry registry = LanguageRegistry.loadWithProperties(serviceClassLoader, langProperties)) {
            // load rulesets, execute, etc, using this registry
            // Other analyses use different LanguageRegistries, so are independent from this one
        }
        // classloader & other resources allocated on construction are reclaimed
    }

I think this mechanism solves long-standing issues with our configuration model, namely

  • adding options to a language implementation requires changing everything, from the upper layers like Configuration and the CLI itself (or the schema, similar to [core] Abstract away optional AST traversals (first step) #1426), to the implementation of the feature, that is spread ad-hoc over all the pmd-core codebase. Most languages probably don't need all those options anyway (eg the auxclasspath).
  • there's no "analysis scoped" objects, which means we have to use dirty tricks like threadlocals or
    the horror that's PmdAsmClassLoader to share objects through an analysis. Such "tricks" make PMD harder to use in a concurrent setting. There's no cleanup of static state at all.
  • ParserOptions is a pain to use and could be removed (see wiki)

It gives us a simple and general extension mechanism to configure language-specific behavior,
without affecting other languages, and without needing changes to pmd-core. For example, we could use that to enable logging of detailed type-resolution-specific debug information (missing classes, type inference failures), which would fit well into the updated java module

@oowekyala oowekyala added an:enhancement An improvement on existing features / rules a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community labels May 21, 2020
@oowekyala oowekyala added this to the 7.0.0 milestone May 21, 2020
@adangel
Copy link
Member

adangel commented Aug 20, 2020

Thanks for writing this up! These are very good ideas.

I've right now only one comment regarding file extensions/patterns: I think, the reason, why PMD actually looks at the file extension, is this: it was envisioned to have a ruleset that references rules of different languages and you would call PMD on a project directory which contains files of these different languages. Now PMD has to figure out, which rules should be applied to which files - since trying to parse a file with the wrong parser will result in many errors, which are not real errors.
When calling PMD from CLI, our CLI impl will find the files and feed it into analysis. When calling PMD from ant, a ant fileset is used. the maven-pmd-plugin searches the file on its own.
The only benefit I see of this is, that you need to run PMD only once and you could get a single report for all the different languages in the project versus calling PMD multiple times with rulesets tailored for each language and getting multiple reports.
This language property would definitively be a solution for this, I just wanted to make sure, we actually need it...
If we provide sensible defaults for everything, I could even imagine that one could just run PMD CLI without any arguments and we'll run analysis of all files we understand and provide a nice (html) report - that would be an easy way to start getting to know PMD... and that would require, that we are able to figure out which rule should be applied to which file.

@linusjf
Copy link

linusjf commented Nov 27, 2020

What's the assumption made about the rules file? Is PMD expecting separate rule config files for each language? This could be the norm but it's quite likely that a project would support multiple languages such as java, JSP, xml in the same project.

Wouldn't it be better to have these language properties reside in a properties file unique to that project and have the ruleset file refer to these properties somewhat like Ant does? This lends itself to some extensibility and the language properties can be prefixed by the language identifier. This only becomes complex if you are supporting multiple versions of the same language i.e., jdk ; some naming convention can be arrived at to resolve this.

@linusjf

This comment has been minimized.

@oowekyala

This comment has been minimized.

@linusjf

This comment has been minimized.

@oowekyala

This comment has been minimized.

@linusjf

This comment has been minimized.

@linusjf

This comment has been minimized.

@linusjf

This comment has been minimized.

@linusjf

This comment has been minimized.

@linusjf

This comment has been minimized.

@oowekyala
Copy link
Member Author

oowekyala commented Dec 10, 2020

My point is simply that language properties ought to reside with the language rules.

I don't think this necessarily holds. More importantly this issue is mostly about the lifecycle, and the cli extension is just the easiest way to close the loop. I suggest we start by implementing that, and then discuss better ergonomics if need be -> but somewhere else.

@linusjf

This comment has been minimized.

@oowekyala

This comment has been minimized.

@linusjf

This comment has been minimized.

@oowekyala

This comment has been minimized.

@oowekyala oowekyala added this to To do in PMD 7 Apr 3, 2021
@oowekyala oowekyala mentioned this issue Apr 7, 2022
55 tasks
@oowekyala oowekyala linked a pull request Sep 21, 2022 that will close this issue
8 tasks
adangel added a commit to oowekyala/pmd that referenced this issue Feb 2, 2023
@adangel
Copy link
Member

adangel commented Feb 10, 2023

Done for PMD 7 via #4060

@adangel adangel closed this as completed Feb 10, 2023
PMD 7 automation moved this from To do to Done Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community an:enhancement An improvement on existing features / rules
Projects
No open projects
PMD 7
  
Done
Development

Successfully merging a pull request may close this issue.

3 participants