Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Text documents #3893

Merged
merged 240 commits into from
Jul 17, 2022
Merged

[core] Text documents #3893

merged 240 commits into from
Jul 17, 2022

Conversation

oowekyala
Copy link
Member

@oowekyala oowekyala commented Apr 1, 2022

Describe the PR

(this now contains #3892)

Changelog

  • pmd-apex

    • Text position handling is simplified (SourceCodePositioner is now part of TextDocument)
    • The Location attribute is removed
    • EmptyCatchBlock now considers doc comments as regular comments
    • Some nodes now have a different report location: eg for methods and classes it's just their identifier, like for pmd-java, ref [java] Single Line Warnings #730
  • pmd-core

    • New API: Chars, a string slice type. Supports efficient string-like operations without copying the char array of the file. Makes comment handling more efficient, in a future PR, will be used by Javacc tokens.
    • TextFile is improved
      • Line terminators are normalized to \n
      • Reading the file computes a checksum which is used by incremental analysis later
      • TextFile replaces DataSource everywhere in the internals of PMD
    • New API: TextDocument, built on top of TextFile
    • New API: FileLocation. This represents the line/column coordinates of a place in a TextDocument. Used for reporting.
    • New API: Reportable. This is an interface for anything that can provide a FileLocation (Nodes and Tokens currently)
    • New API: TextRegion. This represents a range of text in a TextDocument with offset/length, which is useful for tokens and for TextAvailableNodes in general
    • changes to Node interface:
      • getTextRegion is a new abstract method. getReportLocation is not abstract anymore. TextAvailableNode refines the contract of getTextRegion to mean that the text region is the exact boundaries of the node in the text, which is eg not the case for the apex module, but is the case for Javacc-based modules.
  • Some usage of Path and File are replaced with String in eg Ruleset#applies (file name filters) because TextFile uses strings as path names.

  • CPD

    • CPD uses FileLocation and TextDocument
    • The end goal is to replace SourceCode with TextDocument and TextFile
    • This refactoring is incomplete for now, some transitional APIs are in the class CpdCompat. Removing them requires [core] Merge CPD and PMD language #3919
    • Antlr tokens are now easier to build because they can use a TextDocument
  • pmd-java

  • pmd-lang-test

    • Things to represent and test text locations are replaced with standard pmd-core APIs (FileLocation/TextRegion)
  • pmd-test

    • Change to how the "code" element is trimmed, before it was just String::trim() which doesn't preserve indent on the first line. Now it uses basically a clone of String::stripIndent (method used by text blocks in JDK 16+) so that the code behaves like a text block (without escapes)
  • pmd-cpp

    • The "skip block" feature is temporarily dysfunctional, it's fixed in the followup branch, I just ignored the tests

Future refactorings:

  • violation decorators are needed to remove RuleViolationFactory
  • finish the CPD refactoring (SourceCode -> TextDocument)
  • integrate TextDocument more cleanly within our javacc support

Related issues

Ready?

  • Added unit tests for fixed bug/feature
  • Passing all unit tests
  • Complete build ./mvnw clean verify passes (checked automatically by github actions)
  • Added (in-code) documentation (if needed)
  • Investigate required changes in the designer -> Compat with pmd/pmd#3893 pmd-designer#51

Copy link
Member

@adangel adangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave merging this up to you.
Thanks for the work!

@adangel
Copy link
Member

adangel commented May 6, 2022

Ok, the regression tester ran now... but with some more work to do: It seems, the reported file names are now different.... all violations have been removed and added.

It's easier to look at a smaller project, like https://chunk.io/pmd/23054915090a48adb1f094dcfc0e981a/diff1/Schedul-o-matic-9000/index.html

The xml report (baseline): https://chunk.io/pmd/23054915090a48adb1f094dcfc0e981a/diff1/Schedul-o-matic-9000/base_pmd_report.xml

<file name="/home/runner/work/pmd/target/repositories/Schedul-o-matic-9000/force-app/main/default/classes/Dao.cls">
<violation beginline="8" endline="8" begincolumn="14" endcolumn="16" rule="ApexSharingViolations" ruleset="Security" externalInfoUrl="https://pmd.github.io/pmd/pmd_rules_apex_security.html#apexsharingviolations" priority="3">
Apex classes should declare a sharing model if DML or SOQL/SOSL is used
</violation>
...

The xml report (patch): https://chunk.io/pmd/23054915090a48adb1f094dcfc0e981a/diff1/Schedul-o-matic-9000/patch_pmd_report.xml

<file name="target/repositories/Schedul-o-matic-9000/force-app/main/default/classes/Dao.cls">
<violation beginline="8" endline="8" begincolumn="14" endcolumn="17" rule="ApexSharingViolations" ruleset="Security" externalInfoUrl="https://pmd.github.io/pmd/pmd_rules_apex_security.html#apexsharingviolations" priority="3">
Apex classes should declare a sharing model if DML or SOQL/SOSL is used
</violation>
...

The filenames are now relativized already... but only to the current working directory (probably).

@oowekyala
Copy link
Member Author

oowekyala commented May 6, 2022

Thanks for looking into this Andreas. One thing to keep in mind is that this PR changes the implementation of the "short file name" functionality. The TextFiles are all created with a "display name", which is relativized if the short file name option is enabled, but it's not absolutized otherwise -> this is probably the reason for the difference. So the display name is absolute only if you use an absolute path as a CLI argument, which the regression tester is probably not doing.

I think this is sensible behaviour (at least it's consistent), but maybe we should think more about this... I think the good thing about this is that the report is maybe more portable. But some renderers might mandate relative or absolute paths, like #3798 shows

@adangel
Copy link
Member

adangel commented May 6, 2022

I think, both behaviors make sense. But since we are on PMD 7, we can change it...

I think the good thing about this is that the report is maybe more portable.

👍

Hm... theoretically, we don't need to change the regression tester. It would work after this PR is merged, then the baseline also has the same paths in the report...

@oowekyala oowekyala mentioned this pull request May 7, 2022
4 tasks
@oowekyala oowekyala marked this pull request as ready for review July 17, 2022 11:02
@oowekyala oowekyala merged commit 451cbb6 into pmd:pmd/7.0.x Jul 17, 2022
PMD 7 automation moved this from In progress to Done Jul 17, 2022
@oowekyala oowekyala deleted the text-utils-simple branch July 18, 2022 10:44
@adangel adangel mentioned this pull request Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
PMD 7
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants