Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Refactor CPD #4397

Merged
merged 97 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
d4c05d1
Make pmd language have a hook to launch CPD
oowekyala Feb 10, 2023
cf81809
Change a ton of stuff in CPD
oowekyala Feb 10, 2023
27a4aba
Progress
oowekyala Feb 11, 2023
65d953b
Progress
oowekyala Feb 11, 2023
0cab976
Remove SourceCode
oowekyala Feb 11, 2023
e7503d9
Fix CPDReport
oowekyala Feb 11, 2023
f2cfd8f
More refactorings
oowekyala Feb 11, 2023
d972e4a
Fix scala module
oowekyala Feb 11, 2023
9e9de58
Delete CPD
oowekyala Feb 11, 2023
1828fae
Fix some modules
oowekyala Feb 11, 2023
add5970
Don't forget EOF token
oowekyala Feb 12, 2023
9f35966
Refactor EOF handling
oowekyala Feb 12, 2023
8541fb7
Fix pmd-core
oowekyala Feb 12, 2023
8fbd830
Style and renamings
oowekyala Feb 12, 2023
fb9f496
Delete old CPD Language interface
oowekyala Feb 12, 2023
ddbfc90
Fix build
oowekyala Feb 12, 2023
519e9d3
Fix java tests
oowekyala Feb 12, 2023
51b5016
Cleanups
oowekyala Feb 12, 2023
d6ec427
Doc
oowekyala Feb 12, 2023
9c3434a
Split cpd/pmd specific methods into...
oowekyala Feb 13, 2023
c572cb8
Rename package cpd.internal to cpd.impl
oowekyala Feb 13, 2023
30a7f07
Cleanups
oowekyala Feb 13, 2023
2ef44be
Update doc
oowekyala Feb 15, 2023
6c27d46
Cleanup
oowekyala Feb 18, 2023
b7a3f80
Merge branch '7.0.x' into clem.pmd7-refactor-cpd
oowekyala Feb 18, 2023
62beb2b
Update TSQL module
oowekyala Feb 18, 2023
eb37388
Fix problem on windows
oowekyala Feb 18, 2023
de7ff21
Fix bug with renderer encoding
oowekyala Feb 19, 2023
046812c
Remove useless methods on Match
oowekyala Feb 19, 2023
0134f5e
Use Path instead of File in CPDConf
oowekyala Feb 19, 2023
a3831e9
move more things into AbstractConfiguration
oowekyala Feb 19, 2023
a12bbf8
Remove duplicated options in AbstractConfiguration
oowekyala Feb 19, 2023
11e2a97
Introduce ts language module
oowekyala Feb 20, 2023
60f28c5
Fix cli tests
oowekyala Feb 20, 2023
40aa9de
Checkstyle
oowekyala Feb 20, 2023
6eb5086
Fix last tests
oowekyala Feb 20, 2023
8770ad6
Merge branch '7.0.x' into clem.pmd7-refactor-cpd
oowekyala Feb 26, 2023
a8a6dd2
Merge branch '7.0.x' into clem.pmd7-refactor-cpd
oowekyala Feb 26, 2023
0b2f151
Merge branch '7.0.x' into clem.pmd7-refactor-cpd
oowekyala Feb 26, 2023
b00e152
Merge branch 'master' into clem.pmd7-refactor-cpd
oowekyala Mar 4, 2023
c44ce26
Revert forgotten thing
oowekyala Mar 4, 2023
255fdf0
Fix compil
oowekyala Mar 4, 2023
0f17cc8
Add back default version for CPD languages
oowekyala Mar 11, 2023
d6de5ca
Fix VF module
oowekyala Mar 14, 2023
fae08a8
delete leftover file
oowekyala Mar 14, 2023
590c46b
Fix reported CPD languages test
oowekyala Mar 17, 2023
5db8be4
Merge remote-tracking branch 'upstream/master' into clem.pmd7-refacto…
oowekyala Mar 17, 2023
344b2cc
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala Mar 20, 2023
6eabac7
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala Mar 20, 2023
f2dc380
Cleanups
oowekyala Mar 20, 2023
837c795
Merge branch 'master' into clem.pmd7-refactor-cpd
oowekyala Apr 4, 2023
0233897
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala Apr 20, 2023
e66f78b
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala Apr 20, 2023
60f313f
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala Apr 20, 2023
b89970d
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala Apr 29, 2023
b297538
Fix merge
oowekyala Apr 29, 2023
72740a8
Lint
oowekyala Apr 29, 2023
4034c3d
Merge remote-tracking branch 'origin/clem.pmd7-refactor-cpd' into cle…
oowekyala Apr 29, 2023
f2cd5ab
Merge branch 'master' into clem.pmd7-refactor-cpd
oowekyala May 26, 2023
2f067dd
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala May 26, 2023
bf64735
Merge branch 'pmd7-textfile-display-name' into clem.pmd7-refactor-cpd
oowekyala May 28, 2023
aa716ac
Fix CPD renderer tests on windows
oowekyala May 28, 2023
ac33663
Fix CPD cli tests
oowekyala May 29, 2023
bd42296
Fix distribution IT
oowekyala May 29, 2023
5c4a566
Doc for CPD
oowekyala May 29, 2023
5c436c7
Fix cpd outputting unix paths on windows
oowekyala May 29, 2023
885ab6c
Lint
oowekyala May 29, 2023
c54d3bb
Merge branch 'master' into clem.pmd7-refactor-cpd
oowekyala Jun 10, 2023
4ef43e9
Fixups
oowekyala Jun 10, 2023
629e3b4
Consolidate CPD packages
oowekyala Jun 10, 2023
287a9a2
Move forgotten things into language specific packages
oowekyala Jun 10, 2023
6f6608d
Delete cpp default version
oowekyala Jun 10, 2023
efecee4
Add deprecated to Tokens ctor
oowekyala Jun 10, 2023
894d9fb
Merge branch 'master' into pr-4397
adangel Aug 17, 2023
86fdc44
Suppress PMD warnings (UnnecessaryConstructor)
adangel Aug 17, 2023
a9ed11b
[doc] Fix dead links
adangel Aug 17, 2023
10a50b8
[ci] Use adjusted m-pmd-p for dogfood
adangel Aug 17, 2023
9ce9b24
[doc] Mention pmd-languages-deps and module
adangel Aug 24, 2023
df08d08
[apex] Remove cpd property CASE_SENSITIVE
adangel Aug 24, 2023
67cbb94
[cli] Move option "--relativize-paths-with" up to AbstractAnalysisPmd…
adangel Aug 24, 2023
c6a63da
[cli] Remove todo about slf4j
adangel Aug 24, 2023
5200cc9
[cli] Fix javadoc
adangel Aug 24, 2023
3984dc2
[core] Configurations - keep fields private
adangel Aug 24, 2023
932ac33
Merge branch 'master' into pr-4397
adangel Aug 24, 2023
681c528
[core] Create CpdLanguageProperties
adangel Aug 24, 2023
6298d87
[core] Add minimal javadoc for CpdAnalysis
adangel Aug 24, 2023
8511c7b
[core] Move PmdCapableLanguage to n.sf.pmd.lang
adangel Aug 24, 2023
41ff4be
[php] Recognize "//" as eol comment
adangel Aug 24, 2023
8085673
Fix checkstyle
adangel Aug 24, 2023
32afa53
All language modules: getInstance()
adangel Aug 24, 2023
908f480
[core] CPD GUI - fix empty SourceManager
adangel Aug 26, 2023
c1109b4
[core] Fix potential NPE in SourceManager
adangel Aug 26, 2023
1aefe47
Add correct tokenizers for vm, pom, wsdl and xsl
adangel Aug 26, 2023
a2765a1
[doc] Update CPD supported languages
adangel Aug 27, 2023
ad4a19f
[doc] CPD Language Properties
adangel Aug 27, 2023
f65a7cb
[doc] Release notes and API changes (#3919, #4204, #4323, #4397)
adangel Aug 27, 2023
fc0b3ad
[doc] Release notes (#4397)
adangel Aug 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .ci/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ function pmd_ci_dogfood() {
sed -i 's/<version>[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}.*<\/version>\( *<!-- pmd.dogfood.version -->\)/<version>'"${PMD_CI_MAVEN_PROJECT_VERSION}"'<\/version>\1/' pom.xml
if [ "${PMD_CI_MAVEN_PROJECT_VERSION}" = "7.0.0-SNAPSHOT" ]; then
sed -i 's/pmd-dogfood-config\.xml/pmd-dogfood-config7.xml/' pom.xml
mpmdVersion=(-Denforcer.skip=true -Dpmd.plugin.version=3.21.1-pmd-7-SNAPSHOT)
mpmdVersion=(-Denforcer.skip=true -Dpmd.plugin.version=3.21.1-pmd-7.0.0-SNAPSHOT)
fi
./mvnw verify --show-version --errors --batch-mode "${PMD_MAVEN_EXTRA_OPTS[@]}" \
"${mpmdVersion[@]}" \
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ Scala is supported, but there are currently no Scala rules available.

Additionally, it includes **CPD**, the copy-paste-detector. CPD finds duplicated code in
C/C++, C#, Dart, Fortran, Gherkin, Go, Groovy, HTML, Java, JavaScript, JSP, Kotlin, Lua, Matlab, Modelica,
Objective-C, Perl, PHP, PLSQL, Python, Ruby, Salesforce.com Apex and Visualforce, Scala, Swift, T-SQL and XML.
Objective-C, Perl, PHP, PLSQL, Python, Ruby, Salesforce.com Apex and Visualforce, Scala, Swift, T-SQL,
Apache Velocity, and XML.

In the future we hope to add support for data/control flow analysis and automatic (quick) fixes where
it makes sense.
Expand Down
4 changes: 2 additions & 2 deletions docs/_plugins/jdoc_namespace_tag.rb
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,9 @@ def self.parse_fqcn(fqcn, var_ctx, allow_sym = true)
private

JDOC_NAMESPACE_MAP = "jdoc_nspaces"
RESERVED_NSPACES = ['ant', 'apex', 'core', 'cpp', 'cs', 'dart', 'dist', 'doc', 'fortran', 'go', 'groovy', 'java',
RESERVED_NSPACES = ['ant', 'apex', 'cli', 'core', 'cpp', 'cs', 'dart', 'dist', 'doc', 'fortran', 'go', 'groovy', 'java',
'javascript', 'jsp',
'kotlin', 'lua', 'matlab', 'objectivec', 'perl', 'php', 'plsql', 'python', 'ruby', 'scala', 'swift',
'kotlin', 'lang-test', 'lua', 'matlab', 'objectivec', 'perl', 'php', 'plsql', 'python', 'ruby', 'scala', 'swift',
'test', 'test-schema', 'ui',
'modelica', 'visualforce', 'vm', 'xml'].flat_map {|m| [m, "pmd-" + m]}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
* Make sure to add your new module to PMD's parent pom as `<module>` entry, so that it is built alongside the
other languages.
* Also add your new module to the dependencies list in "pmd-languages-deps/pom.xml", so that the new language
is automatically available in the binary distribution (pmd-dist) as well as for the shell-completion
in the pmd-cli module.
is automatically available in the binary distribution (pmd-dist).


## 2. Implement an AST parser for your language
Expand Down Expand Up @@ -120,13 +119,13 @@ definitely don't come for free. It is much effort and requires perseverance to i
* This is needed to support CPD (copy paste detection)
* We provide a default implementation using [`AntlrTokenManager`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java).
* You must create your own "AntlrTokenizer" such as we do with
[`SwiftTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/cpd/SwiftTokenizer.java).
[`SwiftTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/cpd/SwiftTokenizer.java).
* If you wish to filter specific tokens (e.g. comments to support CPD suppression via "CPD-OFF" and "CPD-ON")
you can create your own implementation of
[`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/token/AntlrTokenFilter.java).
[`AntlrTokenFilter`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenFilter.java).
You'll need to override then the protected method `getTokenFilter(AntlrTokenManager)`
and return your custom filter. See the tokenizer for C# as an exmaple:
[`CsTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/cpd/CsTokenizer.java).
[`CsTokenizer`](https://github.com/pmd/pmd/blob/master/pmd-cs/src/main/java/net/sourceforge/pmd/lang/cs/cpd/CsTokenizer.java).

If you don't need a custom token filter, you don't need to override the method. It returns the default
`AntlrTokenFilter` which doesn't filter anything.
Expand Down Expand Up @@ -169,7 +168,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
## 10. Create an abstract rule class for the language
* You need to create your own abstract rule class in order to interface your language with PMD's generic rule
execution.
* See [`AbstractSwiftRule`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/AbstractSwiftRule.java) as an example.
* See [`AbstractSwiftRule`](https://github.com/pmd/pmd/blob/master/pmd-swift/src/main/java/net/sourceforge/pmd/lang/swift/rule/AbstractSwiftRule.java) as an example.
* The rule basically just extends
[`AbstractVisitorRule`](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/lang/rule/AbstractVisitorRule.java)
and only redefines the abstract `buildVisitor()` method to return our own type of visitor.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ definitely don't come for free. It is much effort and requires perseverance to i
* Make sure to add your new module to PMD's parent pom as `<module>` entry, so that it is built alongside the
other languages.
* Also add your new module to the dependencies list in "pmd-languages-deps/pom.xml", so that the new language
is automatically available in the binary distribution (pmd-dist) as well as for the shell-completion
in the pmd-cli module.
is automatically available in the binary distribution (pmd-dist).

## 2. Implement an AST parser for your language
* Ideally an AST parser should be implemented as a JJT file *(see VmParser.jjt or Java.jjt for example)*
Expand Down
161 changes: 81 additions & 80 deletions docs/pages/pmd/devdocs/major_contributions/adding_new_cpd_language.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,105 +2,111 @@
title: How to add a new CPD language
short_title: Add a new CPD language
tags: [devdocs, extending]
summary: How to add a new CPD language
last_updated: March 18, 2019 (6.13.0)
summary: How to add a new language module with CPD support.
last_updated: 2023-02-13 (7.0.0)
permalink: pmd_devdocs_major_adding_new_cpd_language.html
author: Matías Fraga <fragamati@gmail.com>
author: Matías Fraga, Clément Fournier
---

First of all, thanks for the contribution!
## Adding support for a CPD language

Happily for you, to add CPD support for a new language is now easier than ever!
CPD works generically on the tokens produced by a {% jdoc core::cpd.Tokenizer %}.
To add support for a new language, the crucial piece is writing a tokenizer that
splits the source file into the tokens specific to your language. Thankfully you
can use a stock [Antlr grammar](https://github.com/antlr/grammars-v4) or JavaCC
grammar to generate a lexer for you. If you cannot use a lexer generator, for
instance because you are wrapping a lexer from another library, it is still relatively
easy to implement the Tokenizer interface.

{% include callout.html content="**Pro Tip**: If you wish to add a new language, there are more than 50 languages you could easily add with just an [Antlr grammar](https://github.com/antlr/grammars-v4)." type="primary" %}
Use the following guide to set up a new language module that supports CPD.

All you need to do is follow this few steps:
1. Create a new Maven module for your language. You can take [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml) as an example.
- Make sure to add your new module to the parent pom as `<module>` entry, so that it is built alongside the
other languages.
- Also add your new module to the dependencies list in "pmd-languages-deps/pom.xml", so that the new language
is automatically available in the binary distribution (pmd-dist).

1. Create a new module for your language, you can take [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go) as an example
* Make sure to add your new module to the parent pom as `<module>` entry, so that it is built alongside the
other languages.
* Also add your new module to the dependencies list in "pmd-languages-deps/pom.xml", so that the new language
is automatically available in the binary distribution (pmd-dist) as well as for the shell-completion
in the pmd-cli module.
adangel marked this conversation as resolved.
Show resolved Hide resolved
2. Implement a {% jdoc core::cpd.Tokenizer %}.
- For Antlr grammars you can take the grammar from [antlr/grammars-v4](https://github.com/antlr/grammars-v4) and place it in `src/main/antlr4` followed by the package name of the language. You then need to call the appropriate ant wrapper to generate
the lexer from the grammar. To do so, edit `pom.xml` (eg like [the Golang module](https://github.com/pmd/pmd/tree/master/pmd-go/pom.xml)).
Once that is done, `mvn generate-sources` should generate the lexer sources for you.

2. Create a Tokenizer

- For Antlr grammars you can take the grammar from [here](https://github.com/antlr/grammars-v4) and
extend [AntlrTokenizer](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/AntlrTokenizer.java)
taking Go as an example

```java
public class GoTokenizer extends AntlrTokenizer {

@Override protected AntlrTokenManager getLexerForSource(SourceCode sourceCode) {
CharStream charStream = AntlrTokenizer.getCharStreamFromSourceCode(sourceCode);
return new AntlrTokenManager(new GolangLexer(charStream), sourceCode.getFileName());
}
}
You can now implement a tokenizer, for instance by extending {% jdoc core::cpd.impl.AntlrTokenizer %}. The following reproduces the Go implementation:
```java
// mind the package convention if you are going to make a PR
package net.sourceforge.pmd.lang.go.cpd;

public class GoTokenizer extends AntlrTokenizer {

@Override
protected Lexer getLexerForSource(CharStream charStream) {
return new GolangLexer(charStream);
}
}
```

- For JavaCC grammars you should subclass [JavaCCTokenizer](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/impl/JavaCCTokenizer.java)
which has many examples you could follow, you should also take the
[Python implementation](https://github.com/pmd/pmd/blob/master/pmd-python/src/main/java/net/sourceforge/pmd/cpd/PythonTokenizer.java) as reference
- For any other scenario you can use [AnyTokenizer](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/AnyTokenizer.java)

If you're using Antlr or JavaCC, update the pom.xml of your submodule to use the appropriate ant wrapper. See `pmd-go/pom.xml` and `pmd-python/pom.xml` for examples.
- For JavaCC grammars, place your grammar in `etc/grammar` and edit the `pom.xml` like the [Python implementation](https://github.com/pmd/pmd/blob/master/pmd-python/pom.xml) does.
You can then subclass {% jdoc core::cpd.impl.JavaCCTokenizer %} instead of AntlrTokenizer.
- For any other scenario just implement the interface however you can. Look at the Scala or Apex module for existing implementations.

3. Create your [Language](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/AbstractLanguage.java) class
3. Create a {% jdoc core::lang.Language %} implementation, and make it implement {% jdoc core::cpd.CpdCapableLanguage %}.
If your language only supports CPD, then you can subclass {% jdoc core::lang.impl.CpdOnlyLanguageModuleBase %} to get going:

```java
public class GoLanguage extends AbstractLanguage {
```java
// mind the package convention if you are going to make a PR
package net.sourceforge.pmd.lang.go;

public class GoLanguageModule extends CpdOnlyLanguageModuleBase {

public GoLanguage() {
super("Go", "go", new GoTokenizer(), ".go");
}
// A public noarg constructor is required.
public GoLanguageModule() {
super(LanguageMetadata.withId("go").name("Go").extensions("go"));
}

@Override
public Tokenizer createCpdTokenizer(LanguagePropertyBundle bundle) {
// This method should return an instance of the tokenizer you created.
return new GoTokenizer();
}
}
```

{% include callout.html content="**Pro Tip**: Yes, keep looking at Go!" type="primary" %}

**You are almost there!**

4. Update the list of supported languages
```

- Write the fully-qualified name of your Language class to the file `src/main/resources/META-INF/services/net.sourceforge.pmd.cpd.Language`
To make PMD find the language module at runtime, write the fully-qualified name of your language class into the file `src/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language`.

- Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java)
At this point the new language module should be available in {% jdoc core::lang.LanguageRegistry#CPD %} and usable by CPD like any other language.

5. Please don't forget to add some test, you can again.. look at Go implementation ;)

If you read this far, I'm keen to think you would also love to support some extra CPD configuration (ignore imports or crazy things like that)
If that's your case , you came to the right place!

6. You can add your custom properties using a Token filter

- For Antlr grammars all you need to do is implement your own [AntlrTokenFilter](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/token/AntlrTokenFilter.java)

And by now, I know where you are going to look...

**WRONG**

Why do you want GO to solve all your problems?

You should take a look to [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/cpd/KotlinTokenizer.java)

- For non-Antlr grammars you can use [BaseTokenFilter](https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/cpd/token/internal/BaseTokenFilter.java) directly or take a peek to [Java's token filter](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/cpd/JavaTokenizer.java)
4. Update the test that asserts the list of supported languages by updating the `SUPPORTED_LANGUAGES` constant in [BinaryDistributionIT](https://github.com/pmd/pmd/blob/master/pmd-dist/src/test/java/net/sourceforge/pmd/it/BinaryDistributionIT.java).

5. Add some tests for your tokenizer by following the [section below](#testing-your-implementation).

### Declaring tokenizer options

To make the tokenizer configurable, first define some property descriptors using
{% jdoc core::properties.PropertyFactory %}. Look at {% jdoc core::cpd.Tokenizer %}
for some predefined ones which you can reuse (prefer reusing property descriptors if you can).
You need to override {% jdoc core::Language#newPropertyBundle() %}
and call `definePropertyDescriptor` to register the descriptors.
After that you can access the values of the properties from the parameter
of {% jdoc core::cpd.CpdCapableLanguage#createCpdTokenizer(core::lang.LanguagePropertyBundle) %}.

To implement simple token filtering, you can use {% jdoc core::cpd.impl.BaseTokenFilter %}
as a base class, or another base class in {% jdoc_package core::cpd.impl %}.
Take a look at the [Kotlin token filter implementation](https://github.com/pmd/pmd/blob/master/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/cpd/KotlinTokenizer.java), or the [Java one](https://github.com/pmd/pmd/blob/master/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/cpd/JavaTokenizer.java).


### Testing your implementation

Add a Maven dependency on `pmd-lang-test` (scope `test`) in your `pom.xml`.
This contains utilities to test your Tokenizer.

For simple tests, create a test class extending from `CpdTextComparisonTest`.
That class is written in Kotlin, but you can extend it in Java as well.
This contains utilities to test your tokenizer.

Create a test class extending from {% jdoc lang-test::cpd.test.CpdTextComparisonTest %}.
To add tests, you need to write regular JUnit `@Test`-annotated methods, and
call the method `doTest` with the name of the test file.

For example, for the Dart language:

```java
package net.sourceforge.pmd.lang.dart.cpd;

public class DartTokenizerTest extends CpdTextComparisonTest {

Expand All @@ -110,20 +116,15 @@ public class DartTokenizerTest extends CpdTextComparisonTest {


public DartTokenizerTest() {
super(".dart"); // the file extension for the dart language
super("dart", ".dart"); // the ID of the language, then the file extension used by test files
}

@Override
protected String getResourcePrefix() {
// If your class is in src/test/java /some/package
// you need to place the test files in src/test/resources/some/package/cpdData
return "cpdData";
}

@Override
public Tokenizer newTokenizer() {
// Override this abstract method to return the correct tokenizer
return new DartTokenizer();
// "testdata" is the default value, you don't need to override.
// This specifies that you should place the test files in
// src/test/resources/net/sourceforge/pmd/lang/dart/cpd/testdata
return "testdata";
}

/**************
Expand Down