[core] Language properties #2518

oowekyala · 2020-05-21T12:16:22Z

I think the most generic solution to the problems we have when it comes to configuring languages, is to configure Language instances directly via properties.

Example use cases:

We currently have a CLI switch for the suppress marker, another for the auxclasspath.
- The auxclasspath is only used by JVM languages (currently only Java). We try to reuse our special PmdAsmClassLoader, bound to the classloader created for the auxclasspath. That classloader is stored in a static variable. It's overwritten if you launch an analysis with another auxclasspath. This means, analysing different projects concurrently will overwrite each other's classloader repeatedly, undermining reuse. This also means the only way to reclaim the memory for this classloader is to launch another analysis, otherwise it will stay in the static variable. Clearly there's a problem with lifecycle here. See here
- The suppress marker is set for every language globally. It's the only use case for ParserOptions, which otherwise could be removed. It would make sense to configure this language by language
Filename extensions are not configurable, though it has been a feature request for a long time: [xml] Please externalize the mapping of file extensions to languages as preference file #461, [pom] Maven POM Rules don't check files named "pom.xml" #1540, [xml] Allow to check Salesforce XML Metadata using XPath rules #2133
Language-specific tabsizes [core] Support configurable tabsize for AST node position in reports #292
Apex-specific configuration [apex] Apexlink POC #2830
Visualforce-specific configuration [vf] Provide expression type information to Visualforce rules to avoid false positives #2864

The core of this proposal is to make LanguageRegistry non-static, and giving it a proper lifecycle. This makes it so, that analysis-global state (like the classloader) can be stored in the language instance, and shared with eg the parser and other processing stages, without having to jump through hoops to figure out if we have the same parameters as before.

Consider making the service that is loaded by ServiceLoader not Language directly, but a LanguageLoader. This object creates a Language instance with property descriptors. The initialization takes care of configuring the language with all those things mentioned above.

For example, a LanguageLoader interface could look like so:

interface LanguageLoader<P extends PropertySource> {

    String id();

    P newPropertyBundle();

    Language createLanguage(P properties);

}

newPropertyBundle creates a PropertySource and defines the PropertyDescriptors accepted by the language
createLanguage creates a language instance given a configuration

Eg the java implementation of this would be

class JavaLanguageLoader implements LanguageLoader<JavaLanguageProperties> {


    @Override
    public String id() {
        return JavaLanguageModule.TERSE_NAME;
    }

    @Override
    public JavaLanguageProperties newPropertyBundle() {
        return new JavaLanguageProperties(); // defines some properties (auxclasspath, suppressMarker, filePatterns)
    }

    @Override
    public Language createLanguage(JavaLanguageProperties properties) {
        // in this language instance we can initialize the special classloader before analysing the first file
        return new JavaLanguageModule(
            properties.getFilePatterns(), // more general than file extensions, eg supports "pom.xml"
            properties.getSuppressMarker(),
            properties.getAuxclasspathClassloader()
        );
    }
}

Language properties can be set with this simple CLI extension: -L<langId>:<propName> <value>.
For example -Lxml:fileNamePatterns 'pom.xml,*.fxml', or -Ljava:auxclasspath 'some;cp'.

The initialization process of a language registry would look like so:

LanguageRegistry uses ServiceLoader to load a bunch of LanguageLoader instances
-L options of the command line args are partitioned by language ID
Foreach language loader, create a language instance like so:
1. create a property bundle (newPropertyBundle)
2. call setProperty for each -L switch targeting the given language (this will throw errors for misconfigurations)
3. call createLanguage with that property bundle
Create a LanguageRegistry wrapping this set of languages

Language and LanguageRegistry can implement AutoCloseable. This allows the language registry to be used in a try statement.
So, very high in the PMD call stack, we can have something like

    public static void main(String[] args) {
           
        // langProperties := properties from the CLI, partitioned by language 
        // serviceClassLoader := classloader used by ServiceLoader
        // LanguageRegistry::loadWithProperties is the loading routine described above

        try (LanguageRegistry registry = LanguageRegistry.loadWithProperties(serviceClassLoader, langProperties)) {
            // load rulesets, execute, etc, using this registry
            // Other analyses use different LanguageRegistries, so are independent from this one
        }
        // classloader & other resources allocated on construction are reclaimed
    }

I think this mechanism solves long-standing issues with our configuration model, namely

adding options to a language implementation requires changing everything, from the upper layers like Configuration and the CLI itself (or the schema, similar to [core] Abstract away optional AST traversals (first step) #1426), to the implementation of the feature, that is spread ad-hoc over all the pmd-core codebase. Most languages probably don't need all those options anyway (eg the auxclasspath).
there's no "analysis scoped" objects, which means we have to use dirty tricks like threadlocals or
the horror that's PmdAsmClassLoader to share objects through an analysis. Such "tricks" make PMD harder to use in a concurrent setting. There's no cleanup of static state at all.
ParserOptions is a pain to use and could be removed (see wiki)

It gives us a simple and general extension mechanism to configure language-specific behavior,
without affecting other languages, and without needing changes to pmd-core. For example, we could use that to enable logging of detailed type-resolution-specific debug information (missing classes, type inference failures), which would fit well into the updated java module

The text was updated successfully, but these errors were encountered:

adangel · 2020-08-20T14:25:57Z

Thanks for writing this up! These are very good ideas.

I've right now only one comment regarding file extensions/patterns: I think, the reason, why PMD actually looks at the file extension, is this: it was envisioned to have a ruleset that references rules of different languages and you would call PMD on a project directory which contains files of these different languages. Now PMD has to figure out, which rules should be applied to which files - since trying to parse a file with the wrong parser will result in many errors, which are not real errors.
When calling PMD from CLI, our CLI impl will find the files and feed it into analysis. When calling PMD from ant, a ant fileset is used. the maven-pmd-plugin searches the file on its own.
The only benefit I see of this is, that you need to run PMD only once and you could get a single report for all the different languages in the project versus calling PMD multiple times with rulesets tailored for each language and getting multiple reports.
This language property would definitively be a solution for this, I just wanted to make sure, we actually need it...
If we provide sensible defaults for everything, I could even imagine that one could just run PMD CLI without any arguments and we'll run analysis of all files we understand and provide a nice (html) report - that would be an easy way to start getting to know PMD... and that would require, that we are able to figure out which rule should be applied to which file.

linusjf · 2020-11-27T09:30:52Z

What's the assumption made about the rules file? Is PMD expecting separate rule config files for each language? This could be the norm but it's quite likely that a project would support multiple languages such as java, JSP, xml in the same project.

Wouldn't it be better to have these language properties reside in a properties file unique to that project and have the ruleset file refer to these properties somewhat like Ant does? This lends itself to some extensibility and the language properties can be prefixed by the language identifier. This only becomes complex if you are supporting multiple versions of the same language i.e., jdk ; some naming convention can be arrived at to resolve this.

oowekyala · 2020-12-10T21:49:44Z

My point is simply that language properties ought to reside with the language rules.

I don't think this necessarily holds. More importantly this issue is mostly about the lifecycle, and the cli extension is just the easiest way to close the loop. I suggest we start by implementing that, and then discuss better ergonomics if need be -> but somewhere else.

adangel · 2023-02-10T09:52:33Z

Done for PMD 7 via #4060

oowekyala added an:enhancement An improvement on existing features / rules a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community labels May 21, 2020

oowekyala added this to the 7.0.0 milestone May 21, 2020

oowekyala mentioned this issue Jun 18, 2020

[core] Remove ParserOptions #2602

Closed

4 tasks

This was referenced Jul 29, 2020

[apex] Integrate nawforce/ApexLink to build robust Unused rule #2667

Closed

[java] New type resolution #2689

Merged

oowekyala mentioned this issue Aug 22, 2020

[apex] Fix False Negatives with Salesforce-specific Dataflow/Multifile #2717

Closed

oowekyala mentioned this issue Sep 17, 2020

[java] Merge Java-grammar into 7.0.x #2788

Merged

4 tasks

oowekyala mentioned this issue Oct 28, 2020

[vf] Provide expression type information to Visualforce rules to avoid false positives #2864

Merged

4 tasks

oowekyala mentioned this issue Nov 10, 2020

[java] ClassTypeResolver#searchNodeNameForClass leaks memory #2911

Closed

oowekyala mentioned this issue Nov 27, 2020

[java] UnusedFormalParameter could adhere to Sonar spec #2944

Open

This comment has been minimized.

Sign in to view

oowekyala mentioned this issue Nov 27, 2020

[core] Language properties CLI surface syntax #2947

Open

6 tasks

This comment has been minimized.

Sign in to view

oowekyala mentioned this issue Feb 19, 2021

[apex] Apexlink POC #2830

Merged

oowekyala mentioned this issue Mar 28, 2021

[doc] Document language module features #3175

Closed

oowekyala added this to To do in PMD 7 Apr 3, 2021

oowekyala mentioned this issue Feb 12, 2022

[core] Language lifecycle #3782

Closed

oowekyala mentioned this issue Feb 25, 2022

[core] Remove processing stages #3810

Merged

4 tasks

oowekyala mentioned this issue Mar 22, 2022

[core] Turn PropertySource into PropertyBundle #3869

Open

oowekyala mentioned this issue Apr 7, 2022

PMD 7 Tracking Issue #3898

Closed

55 tasks

oowekyala mentioned this issue Jul 22, 2022

[core] Language lifecycle #4060

Merged

8 tasks

oowekyala linked a pull request Sep 21, 2022 that will close this issue

[core] Language lifecycle #4060

Merged

8 tasks

adangel added a commit to oowekyala/pmd that referenced this issue Feb 2, 2023

[doc] Update release notes (pmd#4060, pmd#2518, pmd#3782)

a864087

adangel closed this as completed Feb 10, 2023

PMD 7 automation moved this from To do to Done Feb 10, 2023

adangel mentioned this issue Jun 29, 2023

[core] Fix loading language properties from env vars #4611

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Language properties #2518

[core] Language properties #2518

oowekyala commented May 21, 2020 •

edited

Loading

adangel commented Aug 20, 2020

linusjf commented Nov 27, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

oowekyala commented Dec 10, 2020 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

adangel commented Feb 10, 2023

[core] Language properties #2518

[core] Language properties #2518

Comments

oowekyala commented May 21, 2020 • edited Loading

adangel commented Aug 20, 2020

linusjf commented Nov 27, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

oowekyala commented Dec 10, 2020 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

adangel commented Feb 10, 2023

oowekyala commented May 21, 2020 •

edited

Loading

oowekyala commented Dec 10, 2020 •

edited

Loading