Faster `IgnoredError::shouldIgnore()` #737

staabm · 2021-10-25T15:45:58Z

this fix is not very elegant, but as it improves the analysis by 7-8% I think its at least worth discussing.

running phpstan 1505153 leads to the following profile:

this means we still see a bottleneck caused by regex-matching of ignore rules.

this PR suggests a fast-exit condition based on the error-messages first character.
if the first char does not match - and its not a regex-meta char, the following regex-matching-logic can never match.

this optimization leads to the following comparison profile:

we can see a ~1s improvement while also saving ~20% memory.

dktapps · 2021-10-25T16:49:31Z

I think at this point considering the prevalence of generated baselines, we should start considering a non-regex-based ignoreErrors, if regex is this much of a bottleneck.

Regex based error filtering made sense when you had to write all error patterns by hand, but now that we're generating baseline so much it doesn't make much sense anymore.

staabm · 2021-10-25T17:29:15Z

Agree. For the baseline case we should not at all need regex matching. @ondrejmirtes do you agree?

Would you accept a pr changing baseline generation and matching with plain strings instead of regex?

dktapps · 2021-10-25T17:32:49Z

To be clear, regex should still be supported, but using regex for baseline makes no sense.
Also, supporting plain string in ignoreErrors would make it easier to manually ignore specific errors (no need to fiddle around manually adding \ to things, just copy paste the error directly).

staabm · 2021-10-27T05:50:01Z

I will give it a try

ondrejmirtes · 2021-10-27T07:26:00Z

Hi, how many ignoreErrors do you have? I don't really consider regexes as a bottleneck.

staabm · 2021-10-27T07:27:46Z

how many ignoreErrors do you have?

~15 000 (as can be seen in the profiles of this PRs description, in my case its the slowest part of the analysis run)

ondrejmirtes · 2021-10-27T07:29:24Z

Wouldn't it be better to have a lower level and smaller baseline? In this case you have to regenerate it very often I'd say.

In my experience baseline works really well if you have hundreds of errors in it, not thousands. Because it assumes that you have to be able to write new code cleanly which is hard on level 7 and 8 if the rest of the application isn't typed well.

staabm · 2021-10-27T07:34:37Z

to be precise: we have 15000 lines of baseline, which means ~3750 errors
our codebase is on level 6 and we are in the process of adding types everywhere.

IMO if the code-base is "big enough" you will have that much errors. you have to start somewhere and going below 6 will leave too much room in new code IMO.
(we fixed big chunks of errors on level 1,2,3,4 before though)

ondrejmirtes · 2021-10-27T07:36:23Z

Alright, so how much time does the analysis of the whole codebase takes (without result cache) and how much time from that is spent on filtering ignored errors?

staabm · 2021-10-27T07:43:34Z

php ../phpstan-src/bin/phpstan clear-result-cache -c phpstan.neon.dist &&
blackfire run --ignore-exit-status php ../phpstan-src/bin/phpstan analyze -c phpstan.neon.dist

the whole run is 27-28secs. error ignoring is 2 secs on 2e32030

with a populated result cache relatively its way more:

the whole run is 11secs. error ignoring is 2 secs on 2e32030

dktapps · 2021-10-27T13:59:14Z

FWIW, I didn't see any clear evidence that the regex is a bottleneck in the PM case (less than 1% of CPU time spent, even if a full result cache is hit). But PM has somewhere in the ballpark of 1-2K errors and widely distributed over many files, with usually only a handful of errors per affected file. I'd imagine that given the file segregation it wouldn't have much impact unless you have large ratio of errors to number of files, since it only tries to match error patterns designated for that file specifically.

Some numbers for context (these are from Windows, both of them assuming that the nette.configurator cache is not flooded):

Full analysis ~13sec (with 16 workers)
Analysis with result cache 1.3sec

So, less than 1% in either of these cases is pretty insignificant, so I don't really have a dog in this fight.

staabm · 2021-11-01T12:55:43Z

btw: seems like we are not the only ones with such a big baseline: https://twitter.com/Kalle_/status/1455146801632272393

dktapps · 2021-11-01T13:47:07Z

I don't think the size of baseline matters. What matters more is the number of ignored errors per file. You can see here it only attempts to check against ignoreError patterns with a matching filename:

phpstan-src/src/Analyser/IgnoredErrorHelperResult.php

Lines 139 to 150 in 1e50372

    
           $errors = array_values(array_filter($errors, function (Error $error) use ($processIgnoreError): bool { 
        
           	$filePath = $this->fileHelper->normalizePath($error->getFilePath()); 
        
           	if (isset($this->ignoreErrorsByFile[$filePath])) { 
        
           		foreach ($this->ignoreErrorsByFile[$filePath] as $ignoreError) { 
        
           			$i = $ignoreError['index']; 
        
           			$ignore = $ignoreError['ignoreError']; 
        
           			$result = $processIgnoreError($error, $i, $ignore); 
        
           			if (!$result) { 
        
           				return false; 
        
           			} 
        
           		} 
        
           	}

However, the algorithm used does appear to have an O(n^2) complexity.

Some off-the-bat maths:

Assume you have 1 file with 1000 unique errors
You have 1000 unique error patterns for this file
You could end up matching 1M times (1000 errors to 1000 patterns, worst case).

I think at most PM has maybe 20-30 errors in the worst file, so this doesn't really affect PM's analysis time at all.

clxmstaab added 3 commits October 25, 2021 17:43

Faster IgnoredError::shouldIgnore()

c4fd0b5

Update IgnoredError.php

90c617d

Update IgnoredError.php

6b19d1c

staabm closed this Oct 27, 2021

staabm deleted the faster-ignore branch October 27, 2021 05:49

staabm mentioned this pull request Oct 27, 2021

Added BaselineNeonV2ErrorFormatter which does not rely on PCRE #744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster `IgnoredError::shouldIgnore()` #737

Faster `IgnoredError::shouldIgnore()` #737

staabm commented Oct 25, 2021 •

edited

dktapps commented Oct 25, 2021

staabm commented Oct 25, 2021

dktapps commented Oct 25, 2021

staabm commented Oct 27, 2021

ondrejmirtes commented Oct 27, 2021

staabm commented Oct 27, 2021 •

edited

ondrejmirtes commented Oct 27, 2021

staabm commented Oct 27, 2021

ondrejmirtes commented Oct 27, 2021

staabm commented Oct 27, 2021 •

edited

dktapps commented Oct 27, 2021 •

edited

staabm commented Nov 1, 2021

dktapps commented Nov 1, 2021

Faster IgnoredError::shouldIgnore() #737

Faster IgnoredError::shouldIgnore() #737

Conversation

staabm commented Oct 25, 2021 • edited

dktapps commented Oct 25, 2021

staabm commented Oct 25, 2021

dktapps commented Oct 25, 2021

staabm commented Oct 27, 2021

ondrejmirtes commented Oct 27, 2021

staabm commented Oct 27, 2021 • edited

ondrejmirtes commented Oct 27, 2021

staabm commented Oct 27, 2021

ondrejmirtes commented Oct 27, 2021

staabm commented Oct 27, 2021 • edited

dktapps commented Oct 27, 2021 • edited

staabm commented Nov 1, 2021

dktapps commented Nov 1, 2021

Faster `IgnoredError::shouldIgnore()` #737

Faster `IgnoredError::shouldIgnore()` #737

staabm commented Oct 25, 2021 •

edited

staabm commented Oct 27, 2021 •

edited

staabm commented Oct 27, 2021 •

edited

dktapps commented Oct 27, 2021 •

edited