Skip to content

Commit

Permalink
Merge pull request #4766 from adangel:issue-4319-typeres-symbols-api
Browse files Browse the repository at this point in the history
[doc] Document TypeRes API and Symbols API (#4319) #4766
  • Loading branch information
adangel committed Jan 12, 2024
2 parents acb3fd8 + 8ce3176 commit e65f10b
Show file tree
Hide file tree
Showing 4 changed files with 162 additions and 27 deletions.
Expand Up @@ -3,7 +3,7 @@ title: Adding PMD support for a new ANTLR grammar based language
short_title: Adding a new language with ANTLR
tags: [devdocs, extending]
summary: "How to add a new language to PMD using ANTLR grammar."
last_updated: April 2023 (7.0.0)
last_updated: December 2023 (7.0.0)
sidebar: pmd_sidebar
permalink: pmd_devdocs_major_adding_new_language_antlr.html
folder: pmd/devdocs
Expand Down Expand Up @@ -51,23 +51,25 @@ definitely don't come for free. It is much effort and requires perseverance to i

" %}

## 1. Start with a new sub-module
## Steps

### 1. Start with a new sub-module
* See pmd-swift for examples.
* Make sure to add your new module to PMD's parent pom as `<module>` entry, so that it is built alongside the
other languages.
* Also add your new module to the dependencies list in "pmd-languages-deps/pom.xml", so that the new language
is automatically available in the binary distribution (pmd-dist).


## 2. Implement an AST parser for your language
### 2. Implement an AST parser for your language
* ANTLR will generate the parser for you based on the grammar file. The grammar file needs to be placed in the
folder `src/main/antlr4` in the appropriate sub package `ast` of the language. E.g. for swift, the grammar
file is [Swift.g4](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/antlr4/net/sourceforge/pmd/lang/swift/ast/Swift.g4)
and is placed in the package `net.sourceforge.pmd.lang.swift.ast`.
* Configure the options "superClass" and "contextSuperClass". These are the base classes for the generated
classes.

## 3. Create AST node classes
### 3. Create AST node classes
* The individual AST nodes are generated, but you need to define the common interface for them.
* You need to define the supertype interface for all nodes of the language. For that, we provide
[`AntlrNode`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrNode.java).
Expand Down Expand Up @@ -106,7 +108,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
* You can add additional methods in your "InnerNode" (e.g. `SwiftInnerNode`) that are available on all nodes.
But on most cases you won't need to do anything.

## 4. Generate your parser (using ANTLR)
### 4. Generate your parser (using ANTLR)
* Make sure, you have the property `<antlr4.visitor>true</antlr4.visitor>` in your `pom.xml` file.
* This is just a matter of building the language module. ANTLR is called via ant, and this step is added
to the phase `generate-sources`. So you can just call e.g. `./mvnw generate-sources -pl pmd-swift` to
Expand All @@ -115,7 +117,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
source control.
* You should review [`pmd-swift/pom.xml`](https://github.com/pmd/pmd/blob/master/pmd-swift/pom.xml).

## 5. Create a TokenManager
### 5. Create a TokenManager
* This is needed to support CPD (copy paste detection)
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java).
* You must create your own "AntlrTokenizer" such as we do with
Expand All @@ -130,13 +132,13 @@ definitely don't come for free. It is much effort and requires perseverance to i
If you don't need a custom token filter, you don't need to override the method. It returns the default
`AntlrTokenFilter` which doesn't filter anything.

## 6. Create a PMD parser “adapter”
### 6. Create a PMD parser “adapter”
* Create your own parser, that adapts the ANLTR interface to PMD's parser interface.
* We provide a [`AntlrBaseParser`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrBaseParser.java)
implementation that you need to extend to create your own adapter as we do with
[`PmdSwiftParser`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/ast/PmdSwiftParser.java).

## 7. Create a language version handler
### 7. Create a language version handler
* Now you need to create your version handler, as we did with [`SwiftHandler`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/SwiftHandler.java).
* This class is sort of a gateway between PMD and all parsing logic specific to your language.
* For a minimal implementation, it just needs to return a parser *(see step #6)*.
Expand All @@ -148,7 +150,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
* metrics
* custom XPath functions

## 8. Create a base visitor
### 8. Create a base visitor
* A parser visitor adapter is not needed anymore with PMD 7. The visitor interface now provides a default
implementation.
* The visitor for ANTLR based AST is generated along the parser from the ANTLR grammar file. The
Expand All @@ -158,7 +160,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
See [`SwiftVisitorBase`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/ast/SwiftVisitorBase.java)
as an example.

## 9. Make PMD recognize your language
### 9. Make PMD recognize your language
* Create your own subclass of `net.sourceforge.pmd.lang.impl.SimpleLanguageModuleBase`, see Swift as an example:
[`SwiftLanguageModule`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/SwiftLanguageModule.java).
* Add for each version of your language a call to `addVersion` in your language module’s constructor.
Expand All @@ -167,7 +169,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
* Create the service registration via the text file `src/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language`.
Add your fully qualified class name as a single line into it.

## 10. Create an abstract rule class for the language
### 10. Create an abstract rule class for the language
* You need to create your own abstract rule class in order to interface your language with PMD's generic rule
execution.
* See [`AbstractSwiftRule`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/rule/AbstractSwiftRule.java) as an example.
Expand All @@ -184,7 +186,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
interface of the specific language). Now the rule just provides a visitor, which can be hidden and potentially
shared between rules.

## 11. Create rules
### 11. Create rules
* Creating rules is already pretty well documented in PMD - and it’s no different for a new language, except you
may have different AST nodes.
* PMD supports 2 types of rules, through visitors or XPath.
Expand All @@ -207,7 +209,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
</resources>
```

## 14. Test the rules
### 12. Test the rules
* Testing rules is described in depth in [Testing your rules](pmd_userdocs_extending_testing.html).
* Each rule has its own test class: Create a test class for your rule extending `PmdRuleTst`
*(see
Expand All @@ -231,7 +233,7 @@ definitely don't come for free. It is much effort and requires perseverance to i

*Note:* You'll need to add your ruleset to `categories.properties`, so that it can be found.

### 15. Create documentation page
### 13. Create documentation page
Finishing up your new language module by adding a page in the documentation. Create a new markdown file
`<langId>.md` in `docs/pages/pmd/languages/`. This file should have the following frontmatter:

Expand All @@ -252,3 +254,10 @@ There is also the following Jekyll Include, that creates summary box for the lan
{% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true supports_pmd=true %}
{% endraw %}
```

## Optional features

See [Optional features in JavaCC based languages](pmd_devdocs_major_adding_new_language_javacc.html#optional-features).

In order to implement these, most likely an AST needs to be developed first. The parse tree (CST, concrete
syntax tree) is not suitable to add methods such as `getSymbol()` to the node classes.
Expand Up @@ -265,3 +265,52 @@ If you want to add support for computing metrics:
* Implement {% jdoc core::lang.LanguageVersionHandler#getLanguageMetricsProvider() %}, to make the metrics available in the designer.

See {% jdoc java::lang.java.metrics.JavaMetrics %} for an example.

### Symbol table

A symbol table keeps track of variables and their usages. It is part of semantic analysis and would
be executed in your parser adapter as an additional pass after you got the initial AST.

There is no general language independent API in PMD core. For now, each language will need to implement
its own solution. The symbol information that has been resolved in the additional parser pass
can be made available on the AST nodes via extra methods, e.g. `getSymbolTable()`, `getSymbol()`, or
`getUsages()`.

Currently only Java provides an implementation for symbol table,
see [Java-specific features and guidance](pmd_languages_java.html).

{% capture deprecated_symbols_api_note %}
With PMD 7.0.0 the symbol table and type resolution implementation has been
rewritten from scratch. There is still an old API for symbol table support, that is used by PLSQL,
see {% jdoc_package core::lang.symboltable %}. This will be deprecated and should not be used.
{% endcapture %}
{% include note.html content=deprecated_symbols_api_note %}

### Type resolution

For typed languages like Java type information can be useful for writing rules, that trigger only on
specific types. Resolving types of expressions and variables would be done after in your parser
adapter as yet another additional pass, potentially after resolving the symbol table.

Type resolution tries to find the actual class type of each used type, following along method calls
(including overloaded and overwritten methods), allowing to query subtypes and type hierarchy.
This might require additional configuration for the language, e.g. in Java you need
to configure an auxiliary classpath.

There is no general language independent API in PMD core. For now, each language will need to implement
its own solution. The type information can be made available on the AST nodes via extra methods,
e.g. `getType()`.

Currently only Java provides an implementation for type resolution,
see [Java-specific features and guidance](pmd_languages_java.html).

### Call and data flow analysis

Call and data flow analysis keep track of the data as it is moving through different execution paths
a program has. This would be yet another analysis pass.

There is no general language independent API in PMD core. For now, each language will need to implement
its own solution.

Currently Java has some limited support for data flow analysis,
see [Java-specific features and guidance](pmd_languages_java.html).
101 changes: 88 additions & 13 deletions docs/pages/pmd/languages/java.md
Expand Up @@ -2,15 +2,13 @@
title: Java support
permalink: pmd_languages_java.html
author: Clément Fournier
last_updated: September 2023 (7.0.0)
last_updated: December 2023 (7.0.0)
tags: [languages, PmdCapableLanguage, CpdCapableLanguage]
summary: "Java-specific features and guidance"
---

{% include language_info.html name='Java' id='java' implementation='java::lang.java.JavaLanguageModule' supports_pmd=true supports_cpd=true since='1.0.0' %}

{% include warning.html content="WIP, todo for pmd 7" %}

## Overview of supported Java language versions

Usually the latest non-preview Java Version is the default version.
Expand Down Expand Up @@ -55,24 +53,94 @@ See [Java language properties](pmd_languages_configuration.html#java-language-pr

## Type and symbol resolution

Java being a statically typed language, a Java program contains more information that just its syntax tree; for instance, every expression has a static type, and every method call is bound to a method overload statically (even if that overload is virtual). In PMD, much of this information is resolved from the AST by additional passes, which run after parsing, and before rules can inspect the tree.
Java being a statically typed language, a Java program contains more information than just its syntax tree;
for instance, every expression has a static type, and every method call is bound to a method overload
statically (even if that overload is virtual). In PMD, much of this information is resolved from the AST
by additional passes, which run after parsing, and before rules can inspect the tree.

The semantic analysis roughly works like so:
1. The first passes resolve *symbols*, which are a model of the named entities that Java programs declare, like classes, methods, and variables.
2. Then, each name in the tree is resolved to a symbol, according to the language's scoping rules. This may modify the tree to remove *ambiguous names* (names which could be either a type, package, or variable).
3. The last pass resolves the types of expressions, which performs overload resolution on method calls, and type inference.

TODO describe
* why we need auxclasspath, and how to put the java classes onto the auxclasspath (jre/lib/rt.jar or lib/jrt-fs.jar).
* how disambiguation can fail
1. The first passes resolve *symbols*, which are a model of the named entities that Java programs declare,
like classes, methods, and variables.
2. Then, each name in the tree is resolved to a symbol, according to the language's scoping rules. This may
modify the tree to remove *ambiguous names* (names which could be either a type, package, or variable).
3. The last pass resolves the types of expressions, which performs overload resolution on method calls,
and type inference.

The analyzed code might reference types from other places of the project or even from external
dependencies. If e.g. the code extends a class from an external dependency, then PMD needs to know
this external dependency in order to figure out, that a method is actually an override.

In order to resolve such types, a complete so-called auxiliary classpath need to be provided.
Technically, PMD uses the [ASM framework](https://asm.ow2.io/index.html) to read the bytecode and build
up its own representation to resolve names and types. It also reads the bytecode of the Java runtime
in order to resolve Java API references.

## Providing the auxiliary classpath

The auxiliary classpath (or short "auxClasspath") is configured via the
[Language Property "auxClasspath"](pmd_languages_configuration.html#java-language-properties).
It is a string containing multiple paths separated by either a colon (`:`) under Linux/MacOS
or a semicolon (`;`) under Windows.

In order to resolve the types of the Java API correctly, the Java Runtime must be on the
auxClasspath as well. As the Java API and Runtime evolves from version to version, it is important
to use the correct Java version, that is being analyzed. This might not necessarily be the
same Java runtime version that is being used to run PMD.

Until Java 8, there exists the jar file `rt.jar` in `${JAVA_HOME}/jre/lib`. It is enough, to include
this jar file in the auxClasspath. Usually, you would add this as the first entry in the auxClasspath.

Beginning with Java 9, the Java platform has been modularized and [Modular Run-Time Images](https://openjdk.org/jeps/220)
have been introduced. The file `${JAVA_HOME}/lib/modules` contains now all the classes, but it is not a jar file
anymore. However, each Java installation provides an implementation to read such Run-Time Images in
`${JAVA_HOME}/lib/jrt-fs.jar`. This is an implementation of the `jrt://` filesystem and through this, the bytecode
of the Java runtime classes can be loaded. In order to use this with PMD, the file `${JAVA_HOME}/lib/jrt-fs.jar`
needs to be added to the auxClasspath as the first entry. PMD will make sure, to load the Java runtime classes
using the jrt-filesystem.

If neither `${JAVA_HOME}/jre/lib/rt.jar` nor `${JAVA_HOME}/lib/jrt-fs.jar` is added to the auxClasspath, PMD falls
back to load the JAva runtime classes **from the current runtime**, that is the runtime that was used to
execute PMD. This might not be the correct version, e.g. you might run PMD with Java 8, but analyze code
written for Java 21. In that case, you have to provide "jrt-fs.jar" on the auxClasspath.

## Symbol table APIs

TODO content will be filled by #4766
{% jdoc_nspace :ast java::lang.java.ast %}
{% jdoc_nspace :symbols java::lang.java.symbols %}

Symbol table API related classes are in the package {% jdoc_package :symbols %}.
The root interface for symbols is {%jdoc symbols::JElementSymbol %}.

The symbol table can be requested on any node with the method {% jdoc ast::AbstractJavaNode#getSymbolTable() %}.
This returns a {% jdoc symbols::table.JSymbolTable %} which gives you access to variables, methods and types that are
within scope.

A {% jdoc ast::ASTExpression %} might represent a {% jdoc ast::ASTAssignableExpr.ASTNamedReferenceExpr %}
if it e.g. references a variable name. In that case, you can access the referenced variable symbol
with the method {% jdoc ast::ASTAssignableExpr.ASTNamedReferenceExpr#getReferencedSym() %}.

Declaration nodes, such as {% jdoc ast::ASTVariableDeclaratorId %} implement the interface
{%jdoc ast::SymbolDeclaratorNode %}. Through the method
{% jdoc ast::SymbolDeclaratorNode#getSymbol() %} you can also access the symbol.

To find usages, you can call {% jdoc ast::ASTVariableDeclaratorId#getLocalUsages() %}.

## Type resolution APIs

TODO describe APIs: see #4319 and #2689
{% jdoc_nspace :types java::lang.java.types %}

Type resolution API related classes are in the package {% jdoc_package :types %}.

The core of the framework is a set of interfaces to represent types. The root interface is
{% jdoc types::JTypeMirror %}. Type mirrors are created by a
{% jdoc types::TypeSystem %} object. This object is analysis-global.

The utility class {% jdoc types::TypeTestUtil %} provides simple methods to check types,
e.g. `TypeTestUtil.isA(String.class, variableDeclaratorIdNode)` tests, whether the given
variableDeclaratorId is of type "String".

Any {% jdoc ast::TypeNode %} provides access to the type with the method {% jdoc ast::TypeNode#getTypeMirror() %}.
E.g. this can be called on {% jdoc ast::ASTMethodCall %} to retrieve the return type of the called method.

## Metrics framework

Expand Down Expand Up @@ -105,3 +173,10 @@ Java does this by adding the following additional information for each reported
* {% jdoc core::RuleViolation#PACKAGE_NAME %}

You can access these via {% jdoc core::RuleViolation#getAdditionalInfo() %}

## Dataflow

There is no API yet for dataflow analysis. However, some rules such as {% rule java/bestpractices/UnusedAssignment %}
or {% rule java/design/ImmutableField %} are using an internal implementation of an additional
AST pass that adds dataflow information. The implementation can be found in
[net.sourceforge.pmd.lang.java.rule.internal.DataflowPass](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/rule/internal/DataflowPass.java).

0 comments on commit e65f10b

Please sign in to comment.