[java][core] Provide report statistics at multiple files or directory level #2116

linusjf · 2019-11-16T22:59:31Z

Affects PMD Version:
All

Rule:
All.

Description:
PMD must generate report statistics across the project level. This issue can be used to discuss which stats are to be generated and triage the most relevant and useful ones first.

#2033 (comment)

adangel · 2019-11-22T08:48:41Z

From #2033 (comment) :

Another point that's unrelated, is PMD, at some point , going to introduce a feature that allows
users to track rules and the metrics associated with each rule as to how many times it occurs in
the code base? This would be internal with an option to enable sharing that data with PMD via
some sort of web service. Just a thought.

Maybe. I guess that requires some sort of "multi file analysis" then, since you want to get an
overview of the whole project. Currently the rules operate only on single files and don't see the big
picture. For certain metrics, this is enough, but for others (e.g. calculating the average class size
and compare each class to this average) we need this multi file analysis. Reporting these metrics
then another point: Currently PMD is a standalone application, so we would probably extend/add
report formats for metrics.

https://docs.pylint.org/en/1.6.0/output.html#reports-section

Pylint's reports are quite impressive.

What I'd really like is a way to send an error report to a PMD web service
that logs the error information from a PMD error with the user only having
to say yes or no. It should default to No. But there could be a way to
configure this to Yes by the user. Would that be too intrusive?

Why wouldn't users want an overview of the whole project? I, for one, would be very interested in knowing which rule is violated most. Is it a pointer to inadequate training or coding skills or that the rule is irrelevant or unworkable (as it is) and can be dropped or needs to be modified?
What are the other advantages of having an overview?
A grouping of number of rules violated by severity levels would also be useful.
Which metrics would be more useful?

adangel · 2019-11-22T08:55:01Z

This requirement is pretty vague. What we would need know in order to decide is a more detailed specification, about: which statistics (we could start with implementing one at a time), understanding what is "project level" (PMD has no understanding of a project...), how these statistics are presented (reporting).

Statistics: Are these PMD processing statistics (benchmarks, performance counters, so kind of "internal" statistics), or "project processing statics": how many files have been analyzed, or a statistic about "how many violations per rule have been found", or project metrics such as "average LOC per class, per method, average number of methods per class, average number of classes per package" and so on?

E.g. if you are only interested in "how many violations per rule have been found", that can be solved in the reporters - since it is just a statistic over the generated report.

Update: Another question: for which language? I've added for now Java, but depending on the statistics, this might be cross-language

linusjf · 2019-11-22T11:56:34Z

My usecase is to use a named directory as a project. All sources under that are analysed. You could use filters to exclude or include file name patterns. I don't see much value in running PMD on individual files. Knowing which files has most errors or number of errors per file can help users decide which ones to prioritise to fix and influences the order in which fixes are affixed. Couple this with severity codes and you have decision criteria. If you want a higher level, you can group statistics by packages and drill down into individual files or classes. I don't see developers fixing violations all at once. They'd do it in batches. Having a progress indicator gives them a sense of achievement. This could well be experimental, initially. The java language appears to have the most number of rules. However, I haven't used PMD for any other language. Much. What exactly do you mean by what language? As far as I can see, PMD is mostly java even though it terms itself a cross-language static analysis tool. The rules for other programming languages are minuscule and I come away with the impression that the rules for these languages exist to support the java functionality as technologies used from java applications and apps. How many violations per rule found may not have much significance to users since it generally points to too many false positives that users are not bothering to correct. That would be more useful to PMD designers if it can be captured and transmitted. I'm not really happy with mean stats. Mode and median would be more informative. Information about suppressions and number of errors grouped by severity and name would be useful to the user. No, I can't help you with this any further since my stats is very rusty and refreshing theory does not work without practice.

linusjf · 2019-12-11T04:06:21Z

I've taken a shot at listing the kind of stats that could be reported. It's not fully mature but let's take it as a starting point:

Suppressions:
Total number of suppressions:
Grouping by rule category and within that rule name

Violations:
Total number of violations
Grouping by severity
Grouping by category and within that rule name

Files:
Total number of files
Grouping by package name + file name listing number of errors
drill down to severities in each file

Rules:
Total number of rules violated
Grouping by rule name and count of each rule violation

Rule categories:
Total number of categories violated
Grouping by rule category name and count of violations in each group
Drill down to rules in each group and count for each rule

Total number of lines of code scanned (NCSSCount)
Maximum file size scanned with largest file name provided.
Median number of lines of code in a file.

Total number of classes in code base
Maximum size of class
Minimum size of class
Median size of class
Maximum number of methods in class
Median number of methods in a class
Maximum number of fields in a class
Median number of fields in a class
Maximum size of methods in classes
Median size of methods in classes

Summary stats for packages, classes, interfaces.
How many interfaces defined?
How many interfaces implemented?
How many abstract classes?
How many stand-alone classes i.e., not extending or implementing any?
How many final classes?
How many main classes?
How many utility classes?
How many unit test classes?
How many unit test suites?
How many classes use logggers?
How many data classes? Include direct access structures and unsuppressed Data Class types as identified by PMD.
How many lambdas?
How many switch statements?
How many try-with-resource statements?
How many anonymous classes?
How many inner classes?
How many nested classes?
How many library classes used i.e., non-JDK classes? How many packages referenced?
How many classes are Serializable?
List of annotations used.
List of annotations defined.

Stats for EJBs:
Number of session, entity and message beans.
More?

(You might want to weigh in about Unit tests).

You could similarly draw up stats for Java metrics. Similarly for Apex and JSP.

https://pmd.github.io/pmd-6.20.0/pmd_java_metrics_index.html

The report can be broken up into two versions: an abbreviated format and a long or full listing.

PMD can also consider incorporating CKJM metrics as well. ckjm-ext, however, functions at the byte code level.

http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/metric.html

SonarSource has ckjm integrated. I'm unaware if that's the basic or extended version since I haven't been exposed to it.

https://www.spinellis.gr/sw/ckjm/

http://sonarqube-archive.15.x6.nabble.com/sonar-dev-Sonar-2-0-CKJM-td4528938.html

Probably not any longer.

adangel changed the title ~~[core] PMD must generate report statistics at the project level.~~ [java][core] Provide report statistics at project level Nov 22, 2019

adangel added the a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community label Nov 22, 2019

adangel added the a:suggestion An idea, with little analysis on feasibility, to be considered label Nov 22, 2019

linusjf mentioned this issue Dec 9, 2019

[java] AvoidLiteralsInIfCondition: false negative for expressions #2140

Closed

linusjf changed the title ~~[java][core] Provide report statistics at project level~~ [java][core] Provide report statistics at multiple files or directory level Dec 19, 2019

linusjf mentioned this issue Dec 22, 2019

[apex] Cognitive Complexity rule #2162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[java][core] Provide report statistics at multiple files or directory level #2116

[java][core] Provide report statistics at multiple files or directory level #2116

linusjf commented Nov 16, 2019

adangel commented Nov 22, 2019

adangel commented Nov 22, 2019 •

edited

linusjf commented Nov 22, 2019 via email •

edited

linusjf commented Dec 11, 2019 •

edited

[java][core] Provide report statistics at multiple files or directory level #2116

[java][core] Provide report statistics at multiple files or directory level #2116

Comments

linusjf commented Nov 16, 2019

adangel commented Nov 22, 2019

adangel commented Nov 22, 2019 • edited

linusjf commented Nov 22, 2019 via email • edited

linusjf commented Dec 11, 2019 • edited

adangel commented Nov 22, 2019 •

edited

linusjf commented Nov 22, 2019 via email •

edited

linusjf commented Dec 11, 2019 •

edited