[core] Improve benchmarking #4365

jsotuyod · 2023-01-26T14:18:59Z

Describe the PR

Rule benchmarking:

Recover the old behavior, where the number of rule applications is split from the number of tree transversals.
- This means that rules that are applied, but didn't match any node where previously unlisted, but know show they were applied to all files, and evaluated 0 nodes.
- This also allows to more neatly understand which rules are making use of the rulechain and which don't.

Language Processing Stages:

Remove numbers from Java LPS names
- Only Java used them, so it was inconsistent
- The numbers relate to the order they are executed, but don't necessarily imply a required order (ie: Comment Assignment could be done at any point in reality)
- Since the benchmark report sorts by time spent on each one, the numbered labels are simply confusing
- AST Disambiguation is only tracked in the dedicated phase started by the AstProcessor, having the AstDissambiguation Pass track this was inconsistent.
  - Since these passes can be triggered by the symbol table LPS phase, we ended up double counting total time (2 nested "LPS phases" were counting total time on top of each other).
  - This however puts all disambiguation passes done for symbol table as symbol table cost, which although accurate,
    may not help to identify speed up opportunities as clearly, but the benchmark is not a profiling tool.

Unaccounted:

Unaccounted time was significantly larger than it used to be, and in proportion took about 10% of total execution time. Properly account attaining ruleset copies, TextDocuments, and overall parser and rule setup as processing time.

Ready?

Added unit tests for fixed bug/feature
Passing all unit tests
Complete build ./mvnw clean verify passes (checked automatically by github actions)
Added (in-code) documentation (if needed)

- Recover the old behavior, where the number of rule applications is split from the number of tree transversals. - This means that rules that are applied, but didn't match any node where previously unlisted, but know show they were applied to all files, and evaluated 0 nodes. - We also count towards the rule's benchmark the cost of getting to the nodes it want's to evaluate, so poor xpath selectors actually show for the particular rule.

- The numbers relate to how they are executed, but don't necessarilly imply a required order (ie: Comment Assignment could be done at any point) - Since the benchmark report sorts by time spent on each one, the numbered labels are simply confusing

- First off, all LPS are benchmarked in the AstProcessor, having this here was inconsistent - Since these passes can be triggered by the symbol table LPS phase, we ended up double counting total time. - This however puts all disambiguation passes done for symbol table as symbol table cost, which although accurate, may not help to identify speed up opportunities as clearly, but the benchmark is not a profiling tool.

- Unnacounted time was significantly larger than it used to be, and in proportion tok about 10% of total execution time. Properly account attaining ruleset copies, TextDocuments, and overal parser and rule setup as processing time.

pmd-test · 2023-01-26T15:09:14Z

	2 Messages
📖	Compared to pmd/7.0.x: This changeset changes 6 violations, introduces 1 new violations, 0 new errors and 0 new configuration errors, removes 1 violations, 0 errors and 0 configuration errors. Full report
📖	Compared to master: This changeset changes 49641 violations, introduces 33909 new violations, 1446 new errors and 0 new configuration errors, removes 195793 violations, 4 errors and 7 configuration errors. Full report
✅	Compared to pmd/7.0.x: This changeset changes 6 violations, introduces 1 new violations, 0 new errors and 0 new configuration errors, removes 1 violations, 0 errors and 0 configuration errors. Full report
✅	Compared to master: This changeset changes 49641 violations, introduces 33909 new violations, 1446 new errors and 0 new configuration errors, removes 195793 violations, 4 errors and 7 configuration errors. Full report

Generated by 🚫 Danger

oowekyala

LGTM, thanks!

jsotuyod added 4 commits January 25, 2023 17:00

Remove numbers from Java LPS names

c5d4103

- The numbers relate to how they are executed, but don't necessarilly imply a required order (ie: Comment Assignment could be done at any point) - Since the benchmark report sorts by time spent on each one, the numbered labels are simply confusing

Ensure file processing time doesn't go unaccounted

0d71f60

- Unnacounted time was significantly larger than it used to be, and in proportion tok about 10% of total execution time. Properly account attaining ruleset copies, TextDocuments, and overal parser and rule setup as processing time.

jsotuyod added this to the 7.0.0 milestone Jan 26, 2023

Fix PMD warning

34c00c5

oowekyala approved these changes Jan 29, 2023

View reviewed changes

oowekyala merged commit cde72a6 into pmd:pmd/7.0.x Jan 29, 2023

jsotuyod deleted the improve-benchmarking branch January 29, 2023 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Improve benchmarking #4365

[core] Improve benchmarking #4365

jsotuyod commented Jan 26, 2023 •

edited

Loading

pmd-test commented Jan 26, 2023 •

edited

Loading

oowekyala left a comment

[core] Improve benchmarking #4365

[core] Improve benchmarking #4365

Conversation

jsotuyod commented Jan 26, 2023 • edited Loading

Describe the PR

Rule benchmarking:

Language Processing Stages:

Unaccounted:

Ready?

pmd-test commented Jan 26, 2023 • edited Loading

oowekyala left a comment

Choose a reason for hiding this comment

jsotuyod commented Jan 26, 2023 •

edited

Loading

pmd-test commented Jan 26, 2023 •

edited

Loading