Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cache results #1011

Merged
merged 59 commits into from Jul 2, 2023

Conversation

frankdekker
Copy link
Contributor

Type: feature
Issue: Resolves #534
Breaking change: no.

Feature:

Added the ability to enable (default disabled, enable via --cache) result caching. This will write on the first run a cache file (.phpmd.result-cache.php, configurable via --cache-file) and will be used in the next run. Any files that have not been changed since then based on either the sha1 of content or file modified timestamp (can be set via --cache-strategy). Any changes to the configuration will invalidate the entire cache. Cache key is based on: strict mode, baseline file, composer.json, composer.lock modified, any modification to the rules and changes to PHP version.

Command line options:

  • --cache: if set will enable reading and writing the result cache.
  • --cache-file <file>: will change the result cache file location. Default ./.phpmd.result-cache.php
  • --cache-strategy [content|timestamp]: changes the strategy if a file is modified is determined. Default content

Performance:

Some basic performance tests on a large codebase result in:

Cache disabled:             2m54.095s
Cache enabled, cache miss:  2m53.310s
Cache enabled, cache hit:   0m0.789s

Technical

As the structure of PHPMD didnt quite allow a clean integration for result caching, I looked into PDepend how to hook into the file inspection foreach. I found PDepend\Input\Filter, which I added as last filter in PHPMD. When PDepend now iterates over each file, the ResultCacheFileFilter will check if the file is modified or not. If the file is modified, the file will not be filtered out and be inspected by PDepend. If however a file is not modified, we keep track of this file to later add the possible stored violations to the final report.

The ResultCacheEngine is the main control of the cache, and consists of 3 stages:

  • first the FileFilter that hooks into PDepend.
  • second the Updater that will transfer the violations stored in the cache to the report.
  • and third the Writer, which will write the result cache state back to file.

Technical challenges

As described above, I needed to add RuleViolation class to the report based on data from the cache. I changed the RuleViolation constructor signature to avoid including the AbstractNode type and replaced it with NodeInfo class. This class contains the same information that was used from the AbstractNode, and is far easier to construct.

Second problem was that RuleViolation also requires a Rule class. As PHPMD::processFiles method was passed a RuleSetFactory, which I also needed before PHPMD::processFiles was invoked, I moved the method
calls to RuleSetFactory outside the PHPMD::processFiles method.

Changes:

  • Added /Cache/ namespace with several classes to read, write, and calculate the result cache.
  • Added /Cache/Model for some data models.
  • Added coverage for my changes
  • Updated documentation regarding the cli options
  • See my comments on the specific files for the reason of the changes.

@@ -396,7 +397,7 @@ protected function addViolation(
'args' => $args,
);

$ruleViolation = new RuleViolation($this, $node, $message, $metric);
$ruleViolation = new RuleViolation($this, NodeInfoFactory::fromNode($node), $message, $metric);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to create a RuleViolation without Node, changed signature to NodeInfo object that's serializable.

* @param \PHPMD\RuleSetFactory $ruleSetFactory
* @param \PHPMD\Report $report
* @param \PHPMD\RuleSet[] $ruleSetList
* @param \PHPMD\Report $report
* @return void
*/
public function processFiles(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed signature to allow $ruleSetFactory->createRuleSet be called before processFiles is called.

@@ -88,6 +89,7 @@ private function init(Engine $pdepend, PHPMD $phpmd)
$this->initInput($pdepend, $phpmd);
$this->initIgnores($pdepend, $phpmd);
$this->initExtensions($pdepend, $phpmd);
$this->initResultCache($pdepend, $phpmd);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hook into pdepend filter system to filter files from pdepend inspection if cache is fresh.

@frankdekker
Copy link
Contributor Author

frankdekker commented Jun 18, 2023

No idea why SARIFRendererTest is failing in appveyor. I didnt do any changes near that test.

@frankdekker frankdekker marked this pull request as ready for review June 18, 2023 17:39
@kylekatarnls
Copy link
Member

No worry for the sarif test it will be handled in #1007

Copy link
Member

@AJenbo AJenbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, I didn't find anything worth mentioning

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 90.07% and project coverage change: +0.44 🎉

Comparison is base (b4d6e1c) 88.65% compared to head (efab9d0) 89.10%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1011      +/-   ##
============================================
+ Coverage     88.65%   89.10%   +0.44%     
- Complexity     1076     1160      +84     
============================================
  Files            96      107      +11     
  Lines          2698     2956     +258     
============================================
+ Hits           2392     2634     +242     
- Misses          306      322      +16     
Impacted Files Coverage Δ
src/main/php/PHPMD/TextUI/CommandLineOptions.php 67.48% <34.61%> (-3.94%) ⬇️
src/main/php/PHPMD/PHPMD.php 82.35% <76.92%> (-3.37%) ⬇️
src/main/php/PHPMD/ParserFactory.php 90.90% <80.00%> (-1.40%) ⬇️
src/main/php/PHPMD/Cache/ResultCacheKeyFactory.php 85.71% <85.71%> (ø)
...rc/main/php/PHPMD/Cache/Model/ResultCacheState.php 95.00% <95.00%> (ø)
src/main/php/PHPMD/AbstractRule.php 76.66% <100.00%> (ø)
src/main/php/PHPMD/Cache/Model/ResultCacheKey.php 100.00% <100.00%> (ø)
src/main/php/PHPMD/Cache/ResultCacheEngine.php 100.00% <100.00%> (ø)
.../main/php/PHPMD/Cache/ResultCacheEngineFactory.php 100.00% <100.00%> (ø)
src/main/php/PHPMD/Cache/ResultCacheFileFilter.php 100.00% <100.00%> (ø)
... and 8 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kylekatarnls kylekatarnls merged commit 9d9e29f into phpmd:master Jul 2, 2023
26 checks passed
@frankdekker frankdekker deleted the Add-support-for-cache-results branch July 2, 2023 20:21
@mvoelker
Copy link

Hi @frankdekker,

thanks a lot for implementing the phpmd result cache feature! Good job.

Happily set the new options in our (huge) project ... but without any speed improvement.
Cloned the phpmd repo and used the source code to run it to be able to find reasons.

It might be a bit specific for our project but others might have the same problems?

a) We run phpmd from a scripts folder on our CI system to analyse multiple paths - neither composer.json nor composer.lock are located in the scripts folder.

b) We do not use a baseline file.

Tracked the problem down to /Cache/ResultCacheStateFactory.php::createCacheKey. As the files from a) and b) are not available the built cache[key] array structure contains NULL for baselineHash and composer.
As the isset call in the mentioned method returns false for array keys with value NULL ... the whole caching does not work (in our case).

Any chances to make both composer files and the baseline file optional?

(Commented the isset checks for $data['baselineHash'] and $data['composer'] and the caching worked as expected.)

Best,

mvoelker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Option to cache results in between the runs
6 participants