Perf: cache UseStatementsTrait getUseStatements + bound throw class-name lookup#64
Merged
Merged
Conversation
UseStatementsTrait::getUseStatements() iterates every token in the file via foreach($tokens as ...) looking for top-level T_USE tokens. The result is stable across calls during a single phpcs pass over a file, but three sniffs (DocBlockThrowsSniff, DocBlockVarSniff, UseWithAliasingSniff) call it once per T_FUNCTION. On a large method- heavy controller that turns a per-file O(file_size) walk into per- method O(methods * file_size). Cache the parsed result so each file is parsed at most once. Cache invalidation has to survive phpcbf's fix loop, which re-tokenizes the same file in-place on the same File object after each fix iteration. Token count alone is not sufficient because an in-place alias rename (use Foo as Bar; -> use Foo as Baz;) leaves the count unchanged. The cache therefore also stores a content fingerprint of each use statement range and re-verifies it against the live tokens on every cache hit. Verification is O(num_uses * avg_use_length) - a handful of token reads. Separately, DocBlockThrowsSniff::extractExceptions() looked up T_DOUBLE_COLON with an upper bound of $scopeCloser (the method's closing brace). On long bodies that meant scanning past the current throw statement and picking up an unrelated `::` from a later statement, then silently using it to gate exception collection for this throw. Bound the lookup to the next `;` instead - both faster and more correct. Measured on a CakePHP 5 app whose biggest controller is 11k lines with ~100 actions, parallel=1, --report=performance: DocBlockThrows on ResidentsController.php alone: 3.25s -> 0.08s DocBlockThrows across 1095-file codebase: 11.32s -> 0.86s Existing test suite (100 tests / 122 assertions) passes unchanged. Output diff across all 1095 files: identical violations before vs after.
This was referenced May 13, 2026
dereuromark
added a commit
that referenced
this pull request
May 13, 2026
UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.
dereuromark
added a commit
that referenced
this pull request
May 13, 2026
…#66) UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #62 and #63. Two changes in
UseStatementsTrait/DocBlockThrowsSniff:getUseStatements()per file inUseStatementsTrait. The result is currently rebuilt once perT_FUNCTIONby every sniff that uses the trait, but it's a per-file quantity. On a large method-heavy controller this turns a per-file O(file_size) walk into per-method O(methods × file_size). The trait is used byDocBlockThrowsSniff,DocBlockVarSniff, andUseWithAliasingSniff, so all three benefit.T_DOUBLE_COLONlookup inextractExceptions()to the current statement. The original code searched up to$scopeCloser(the method's closing brace), which on long bodies could pick up an unrelated::operator from a later statement and silently use it to gate exception collection for this throw. Bounding to the next;is both faster and more correct.Why this was slow
getUseStatements()iterates every token in the file:That's an O(file_size) walk per call.
DocBlockThrowsSniff::compareExceptionsAndAnnotations()calls it once per method, so an 11k-token controller with ~100 methods does ~1.1M token reads per file just to repeatedly rebuild the same use list. Same shape of bug we caught inAbstractSniff::getClassName()in #62.Cache invalidation
The static cache has to survive phpcbf's fix loop, which calls
$file->reloadContent(); $file->process();on the same File object after each fix iteration. Token count alone isn't enough: a fix that renames an alias (use Foo as Bar;->use Foo as Baz;) leaves the count unchanged, and other sniffs would then read stale aliases out of the cache. (Caught in codex review on the first draft.)The cache therefore stores a content fingerprint of each use statement range (
start..endtoken contents concatenated) and re-verifies it against the live tokens on every cache hit. The verification is O(num_uses × avg_use_length) — typically a handful of token reads — so the cache stays cheap.Benchmarks
Measured on the same 1095-file CakePHP 5 app from #62 / #63 (
ResidentsController.php, 11k LOC, ~100 actions). Thephpcs --report=performancefigures are the per-sniff CPU times reported by phpcs itself, so they're not affected by background system load the way wall-clock figures are:ResidentsController.php(median, 3 runs)parallel=1(median, 3 runs)DocBlockThrows was the #1 hot spot in the perf report after #62 / #63 landed (~19% of total time). It's now down to ~1–2%.
Why not the combined-scan refactor
An earlier draft of this PR replaced the four overlapping body scans in
process()(extractExceptions/containsComplexThrowToken/containsThrowToken/containsCatchToken) with a single combined pass that tracked nested-closure boundaries via a running cursor. It worked correctly, but a controlled 3-run measurement showed it brought the sniff from 3.25s to 3.21s onResidentsController.php— completely within noise. The four-scan structure looks expensive but the inner-loop short-circuits made it cheap in practice; the actual hot path turned out to be the un-cachedgetUseStatements()call called fromcompareExceptionsAndAnnotations(). So the refactor was dropped; the existing four helpers are kept untouched.Verification
composer cs-checkandcomposer stanon this repo: clean.codex review --uncommittedon the final patch: addressed the one P2 finding (cache invalidation strength) before commit.