Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add integration tests against real projects #360

Closed
adangel opened this issue Apr 21, 2017 · 10 comments
Closed

[core] Add integration tests against real projects #360

adangel opened this issue Apr 21, 2017 · 10 comments
Labels
in:pmd-internals Affects PMD's internals

Comments

@adangel
Copy link
Member

adangel commented Apr 21, 2017

Ideally we would run the current PMD against a few other (open-source) projects like Spring, Solr, openjdk, ...

This should help in finding NPE, ClassCastExceptions, Parser errors earlier.

@ryan-gustafson
Copy link
Contributor

ryan-gustafson commented Apr 21, 2017

I'm experimenting with a Gradle build script to try this, since it's pretty easy to pull down and analyze dependencies dynamically, allowing multiple versions of PMD and real projects. My thought would be to run a combination matrix of sorts, with some analysis of results between PMD versions. Not far into it, I'll share when I have something interesting. Anyone, feel free to reach out if you'd like to pick it up in the meantime.

@ryan-gustafson
Copy link
Contributor

I've got something working and producing a report. Many things I could keep polishing and improving, but I'd like to get feedback before putting further effort in. To that end, this weekend I'm going to clean it up, collect and organize the loose ends, and submit a PR for discussion purposes.

In the meantime, I've attached an example report, produced using PMD versions 5.0.0 to 5.6.1, against some Hibernate, Solr, and Spring Framework dependencies, using all the Java rules grabbed from java-core/.../rulesets.properties. It took about 40 minutes to produce on my box (which is a bit dated). The report has 4 sections:

  1. The PMDs used.
  2. The Source used.
  3. A matrix of PMD vs Source runs, including the PMD text report, stdout, stderr.
  4. A matrix of diffs between the PMD text reports between adjacent PMD versions and Source.

No analysis is performed, although if a file is empty a link is omitted from the table.

Feel free to share thoughts!

Also, good news is, I was able to reliably manifest the problem from #364 in 5.6.0 when running with more than 1 thread.

This file was compressed by 7-Zip, to reduce to less than 3MB, it's over 30MB with ZIP reports.zip, extract and open the index.html file.

@jsotuyod
Copy link
Member

@ryan-gustafson sorry it took me this long to look at it, but that report is amazing! It would be of great help to both avoid regressions, and battle test fixes and improvements beyond our test cases before a release.

One interesting thing of those diffs is the number of differences between builds for DFA results.... specially considering we haven't touched that code directly in quite some time (check this and this)... seems that module is more fragile than I ever thought...

We definitely need to move this forward with master vs PR for PRs. We could upload diffs to chunk.io for free from Travis :) Please, contact me if you need help to set this up.

@ryan-gustafson
Copy link
Contributor

@jsotuyod I've been so busy lately I've not been able to get back to this. This weekend however is looking rather clear, so I'll see about getting a PR up, that should enable progress on other fronts. I've not looked at all into chunk.io or Travis, I assume one of you guys could work on that.

Glad you found it interesting. My hope is it has practical promise for allowing greatly expanding coverage and regression detection. But not just between release, but for comparing your local dev against latest CI SNAPSHOT build, or on a PR basis.

The two thinks I know I'd like to add, but likely not before I send a PR, would be:

  • Multiple language support. Currently just doing Java, but that's a gimme with Gradle and Maven repositories. Any non-Java repositories out there?
  • The recent CPD related issue [cpp] CPD gives wrong duplication blocks for CPP code #431 makes me think that would be good to add a CPR report too.

As for DFA, it's heavily dependent upon analysis of the symbol table data (here), so changes there could indirectly change DFA results (for better or worse).

@jsotuyod
Copy link
Member

@jsotuyod I've been so busy lately I've not been able to get back to this. This weekend however is looking rather clear, so I'll see about getting a PR up, that should enable progress on other fronts. I've not looked at all into chunk.io or Travis, I assume one of you guys could work on that.

That's exactly the kind of things I was offering my assistance with. Let me know if you need anything.

Glad you found it interesting. My hope is it has practical promise for allowing greatly expanding coverage and regression detection. But not just between release, but for comparing your local dev against latest CI SNAPSHOT build, or on a PR basis.

Definitely, as I said, I'm really looking forward to have this on all PRs by making Travis do PR vs master.

  • Multiple language support. Currently just doing Java, but that's a gimme with Gradle and Maven repositories. Any non-Java repositories out there?

Not sure how you are getting the sources now, but for JS at least there are several big open source projects to look at. For Apex, Visualforce, PLSQL, Apache Velocity, XML and XSL things may be harder... But we can ask our Salesforce guys if there is a good OSS for Apex / VF to use as benchmark.

Definitely, but as you said, at a later stage. Just rolling this out as is is most valuable.

As for DFA, it's heavily dependent upon analysis of the symbol table data (here), so changes there could indirectly change DFA results (for better or worse).

I had no idea, good to know. Reports should be better then, assuming the DFA code is right, since we improved symbol table a lot for some scenarios such as anonymous inner classes.

@jsotuyod jsotuyod changed the title [core] Add integration tests agains real projects [core] Add integration tests against real projects Jan 8, 2018
@jsotuyod
Copy link
Member

@ryan-gustafson any chance we can get our hands on this, whatever state it's in? We would love for this to see the light, maybe even as part of GSoC 2018, and what you had shown us would be an amazing starting point.

oowekyala added a commit to oowekyala/pmd that referenced this issue Mar 26, 2018
This could be extended to other languages once we
tackle pmd#360
@ryan-gustafson
Copy link
Contributor

@jsotuyod I totally missed the ask for the code on this. My apologies! GSoC is in flight already, is it too late for the code to be useful? I could dig it up this week sometime yet.

@jsotuyod
Copy link
Member

jsotuyod commented May 8, 2018

@ryan-gustafson it's never late! @djydewang has already started on his own version, but yours may give him some insight or ideas.

@ryan-gustafson
Copy link
Contributor

See attached ZIP. It is Gradle Groovy based, version 3.5 using the Gradle wrapper. Depending on the available Source dependencies/configurations, and PMD versions, it will dynamically create the appropriate tasks (a lot of them!). The pmdRegressionDiffReport task will run everything, it can take a long time depending on product of the number of PMD versions and Source dependencies. There are other smaller tasks for the different parts, you need to look at the code to understand how they are all wired together. Roughly the parts are:

  • a clean target to delete the regression directory
  • extract source code for all dependencies into regression directory
  • run all PMD versions setup against all dependencies, save off stdout, stderr, and pmd report
  • generate diffs between adjacent PMD versions for a given dependency
  • generate the HTML report

The code isn't pretty, but it worked. There's no shortage of kludges and work arounds, not all PMD versions worked right. The FIXME comments, in the code that creates the task that runs PMD, outlines quite a few of the considerations I never got to.

pmd-regression.zip

@djydewang
Copy link
Member

@ryan-gustafson Amazing! I have never thought of using dependency to generate PMD reports. Maybe I can refactor my code to generate PMD reports in this way. But since I'm not familiar with gradle, I may not be able to use the code directly. There is no doubt that your code has inspired me a lot e.g. TODO and FIXME in the code are worth thinking about. Thank you for showing us a amazing starting point again:)

@jsotuyod jsotuyod closed this as completed Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:pmd-internals Affects PMD's internals
Projects
None yet
Development

No branches or pull requests

4 participants