Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java][core] Provide report statistics at multiple files or directory level #2116

Open
linusjf opened this issue Nov 16, 2019 · 4 comments
Open
Labels
a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community a:suggestion An idea, with little analysis on feasibility, to be considered

Comments

@linusjf
Copy link

linusjf commented Nov 16, 2019

Affects PMD Version:
All

Rule:
All.

Description:
PMD must generate report statistics across the project level. This issue can be used to discuss which stats are to be generated and triage the most relevant and useful ones first.

#2033 (comment)

@adangel adangel changed the title [core] PMD must generate report statistics at the project level. [java][core] Provide report statistics at project level Nov 22, 2019
@adangel adangel added the a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community label Nov 22, 2019
@adangel
Copy link
Member

adangel commented Nov 22, 2019

From #2033 (comment) :

Another point that's unrelated, is PMD, at some point , going to introduce a feature that allows
users to track rules and the metrics associated with each rule as to how many times it occurs in
the code base? This would be internal with an option to enable sharing that data with PMD via
some sort of web service. Just a thought.

Maybe. I guess that requires some sort of "multi file analysis" then, since you want to get an
overview of the whole project. Currently the rules operate only on single files and don't see the big
picture. For certain metrics, this is enough, but for others (e.g. calculating the average class size
and compare each class to this average) we need this multi file analysis. Reporting these metrics
then another point: Currently PMD is a standalone application, so we would probably extend/add
report formats for metrics.

https://docs.pylint.org/en/1.6.0/output.html#reports-section

Pylint's reports are quite impressive.

What I'd really like is a way to send an error report to a PMD web service
that logs the error information from a PMD error with the user only having
to say yes or no. It should default to No. But there could be a way to
configure this to Yes by the user. Would that be too intrusive?

Why wouldn't users want an overview of the whole project? I, for one, would be very interested in knowing which rule is violated most. Is it a pointer to inadequate training or coding skills or that the rule is irrelevant or unworkable (as it is) and can be dropped or needs to be modified?
What are the other advantages of having an overview?
A grouping of number of rules violated by severity levels would also be useful.
Which metrics would be more useful?

@adangel
Copy link
Member

adangel commented Nov 22, 2019

This requirement is pretty vague. What we would need know in order to decide is a more detailed specification, about: which statistics (we could start with implementing one at a time), understanding what is "project level" (PMD has no understanding of a project...), how these statistics are presented (reporting).

Statistics: Are these PMD processing statistics (benchmarks, performance counters, so kind of "internal" statistics), or "project processing statics": how many files have been analyzed, or a statistic about "how many violations per rule have been found", or project metrics such as "average LOC per class, per method, average number of methods per class, average number of classes per package" and so on?

E.g. if you are only interested in "how many violations per rule have been found", that can be solved in the reporters - since it is just a statistic over the generated report.

Update: Another question: for which language? I've added for now Java, but depending on the statistics, this might be cross-language

@adangel adangel added the a:suggestion An idea, with little analysis on feasibility, to be considered label Nov 22, 2019
@linusjf
Copy link
Author

linusjf commented Nov 22, 2019 via email

@linusjf
Copy link
Author

linusjf commented Dec 11, 2019

I've taken a shot at listing the kind of stats that could be reported. It's not fully mature but let's take it as a starting point:

Suppressions:
Total number of suppressions:
Grouping by rule category and within that rule name

Violations:
Total number of violations
Grouping by severity
Grouping by category and within that rule name

Files:
Total number of files
Grouping by package name + file name listing number of errors
drill down to severities in each file

Rules:
Total number of rules violated
Grouping by rule name and count of each rule violation

Rule categories:
Total number of categories violated
Grouping by rule category name and count of violations in each group
Drill down to rules in each group and count for each rule

Total number of lines of code scanned (NCSSCount)
Maximum file size scanned with largest file name provided.
Median number of lines of code in a file.

Total number of classes in code base
Maximum size of class
Minimum size of class
Median size of class
Maximum number of methods in class
Median number of methods in a class
Maximum number of fields in a class
Median number of fields in a class
Maximum size of methods in classes
Median size of methods in classes

Summary stats for packages, classes, interfaces.
How many interfaces defined?
How many interfaces implemented?
How many abstract classes?
How many stand-alone classes i.e., not extending or implementing any?
How many final classes?
How many main classes?
How many utility classes?
How many unit test classes?
How many unit test suites?
How many classes use logggers?
How many data classes? Include direct access structures and unsuppressed Data Class types as identified by PMD.
How many lambdas?
How many switch statements?
How many try-with-resource statements?
How many anonymous classes?
How many inner classes?
How many nested classes?
How many library classes used i.e., non-JDK classes? How many packages referenced?
How many classes are Serializable?
List of annotations used.
List of annotations defined.

Stats for EJBs:
Number of session, entity and message beans.
More?

(You might want to weigh in about Unit tests).

You could similarly draw up stats for Java metrics. Similarly for Apex and JSP.

https://pmd.github.io/pmd-6.20.0/pmd_java_metrics_index.html

The report can be broken up into two versions: an abbreviated format and a long or full listing.

PMD can also consider incorporating CKJM metrics as well. ckjm-ext, however, functions at the byte code level.

http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/metric.html

SonarSource has ckjm integrated. I'm unaware if that's the basic or extended version since I haven't been exposed to it.

https://www.spinellis.gr/sw/ckjm/

http://sonarqube-archive.15.x6.nabble.com/sonar-dev-Sonar-2-0-CKJM-td4528938.html

Probably not any longer.

@linusjf linusjf changed the title [java][core] Provide report statistics at project level [java][core] Provide report statistics at multiple files or directory level Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community a:suggestion An idea, with little analysis on feasibility, to be considered
Projects
None yet
Development

No branches or pull requests

2 participants