Skip to content

Perf: cache UseStatementsTrait getUseStatements + bound throw class-name lookup#64

Merged
dereuromark merged 1 commit into
masterfrom
perf-docblock-throws-combined-scan
May 13, 2026
Merged

Perf: cache UseStatementsTrait getUseStatements + bound throw class-name lookup#64
dereuromark merged 1 commit into
masterfrom
perf-docblock-throws-combined-scan

Conversation

@dereuromark
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #62 and #63. Two changes in UseStatementsTrait / DocBlockThrowsSniff:

  1. Cache getUseStatements() per file in UseStatementsTrait. The result is currently rebuilt once per T_FUNCTION by every sniff that uses the trait, but it's a per-file quantity. On a large method-heavy controller this turns a per-file O(file_size) walk into per-method O(methods × file_size). The trait is used by DocBlockThrowsSniff, DocBlockVarSniff, and UseWithAliasingSniff, so all three benefit.
  2. Bound the T_DOUBLE_COLON lookup in extractExceptions() to the current statement. The original code searched up to $scopeCloser (the method's closing brace), which on long bodies could pick up an unrelated :: operator from a later statement and silently use it to gate exception collection for this throw. Bounding to the next ; is both faster and more correct.

Why this was slow

getUseStatements() iterates every token in the file:

foreach ($tokens as $index => $token) {
    if ($token['code'] !== T_USE || $token['level'] > 0) {
        continue;
    }
    // ... parse the use line
}

That's an O(file_size) walk per call. DocBlockThrowsSniff::compareExceptionsAndAnnotations() calls it once per method, so an 11k-token controller with ~100 methods does ~1.1M token reads per file just to repeatedly rebuild the same use list. Same shape of bug we caught in AbstractSniff::getClassName() in #62.

Cache invalidation

The static cache has to survive phpcbf's fix loop, which calls $file->reloadContent(); $file->process(); on the same File object after each fix iteration. Token count alone isn't enough: a fix that renames an alias (use Foo as Bar; -> use Foo as Baz;) leaves the count unchanged, and other sniffs would then read stale aliases out of the cache. (Caught in codex review on the first draft.)

The cache therefore stores a content fingerprint of each use statement range (start..end token contents concatenated) and re-verifies it against the live tokens on every cache hit. The verification is O(num_uses × avg_use_length) — typically a handful of token reads — so the cache stays cheap.

Benchmarks

Measured on the same 1095-file CakePHP 5 app from #62 / #63 (ResidentsController.php, 11k LOC, ~100 actions). The phpcs --report=performance figures are the per-sniff CPU times reported by phpcs itself, so they're not affected by background system load the way wall-clock figures are:

Before (master) After this PR
DocBlockThrows on ResidentsController.php (median, 3 runs) ~3.25s ~0.08s (≈ 40× speedup)
DocBlockThrows across all 1095 files, parallel=1 (median, 3 runs) ~11.32s ~0.86s (≈ 13× speedup)

DocBlockThrows was the #1 hot spot in the perf report after #62 / #63 landed (~19% of total time). It's now down to ~1–2%.

Why not the combined-scan refactor

An earlier draft of this PR replaced the four overlapping body scans in process() (extractExceptions / containsComplexThrowToken / containsThrowToken / containsCatchToken) with a single combined pass that tracked nested-closure boundaries via a running cursor. It worked correctly, but a controlled 3-run measurement showed it brought the sniff from 3.25s to 3.21s on ResidentsController.php — completely within noise. The four-scan structure looks expensive but the inner-loop short-circuits made it cheap in practice; the actual hot path turned out to be the un-cached getUseStatements() call called from compareExceptionsAndAnnotations(). So the refactor was dropped; the existing four helpers are kept untouched.

Verification

  • Existing PHPUnit suite passes (100 tests / 122 assertions).
  • composer cs-check and composer stan on this repo: clean.
  • Output diff across all sniffs and all 1095 files of the activity-pro test corpus: identical violations before vs after.
  • Cache invalidation: a fixture that renames an aliased use statement in place (same token count, different alias) correctly produces fresh parse output on the next call; the cache rejects the stale entry via fingerprint mismatch.
  • codex review --uncommitted on the final patch: addressed the one P2 finding (cache invalidation strength) before commit.

UseStatementsTrait::getUseStatements() iterates every token in the
file via foreach($tokens as ...) looking for top-level T_USE tokens.
The result is stable across calls during a single phpcs pass over a
file, but three sniffs (DocBlockThrowsSniff, DocBlockVarSniff,
UseWithAliasingSniff) call it once per T_FUNCTION. On a large method-
heavy controller that turns a per-file O(file_size) walk into per-
method O(methods * file_size).

Cache the parsed result so each file is parsed at most once. Cache
invalidation has to survive phpcbf's fix loop, which re-tokenizes
the same file in-place on the same File object after each fix
iteration. Token count alone is not sufficient because an in-place
alias rename (use Foo as Bar; -> use Foo as Baz;) leaves the count
unchanged. The cache therefore also stores a content fingerprint of
each use statement range and re-verifies it against the live tokens
on every cache hit. Verification is O(num_uses * avg_use_length) -
a handful of token reads.

Separately, DocBlockThrowsSniff::extractExceptions() looked up
T_DOUBLE_COLON with an upper bound of $scopeCloser (the method's
closing brace). On long bodies that meant scanning past the current
throw statement and picking up an unrelated `::` from a later
statement, then silently using it to gate exception collection for
this throw. Bound the lookup to the next `;` instead - both faster
and more correct.

Measured on a CakePHP 5 app whose biggest controller is 11k lines
with ~100 actions, parallel=1, --report=performance:

  DocBlockThrows on ResidentsController.php alone:  3.25s -> 0.08s
  DocBlockThrows across 1095-file codebase:        11.32s -> 0.86s

Existing test suite (100 tests / 122 assertions) passes unchanged.
Output diff across all 1095 files: identical violations before vs
after.
@dereuromark dereuromark merged commit dcc5efc into master May 13, 2026
4 checks passed
@dereuromark dereuromark deleted the perf-docblock-throws-combined-scan branch May 13, 2026 21:32
dereuromark added a commit that referenced this pull request May 13, 2026
UseStatementSniff has its own getUseStatements() implementation that
duplicates UseStatementsTrait::getUseStatements() (cached in #64). It
already has an instance-level cache via existingStatements, which
covers repeated calls within a single phpcs pass; the new static
cache adds coverage across phpcbf fix iterations, where
populateTokenListeners() creates fresh sniff instances per pass and
resets the instance cache.

Cache invalidation follows the same fingerprint-based scheme as #64:
token count alone is not strong enough, since an alias rename keeps
it constant. Cached entries record a content fingerprint of each
use statement range and re-verify them against the live tokens
before being trusted. The cache also refuses to serve an empty
result so it cannot return stale state for a file where a fix added
a first use statement while another simultaneous fix happened to
keep the file's overall token count unchanged.

Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65
(parallel=1, --report=performance):

  PhpCollective.Namespaces.UseStatement   3.36s -> 2.08s

Existing test suite (100 tests / 122 assertions) passes unchanged.

The FQCN cache changes from the earlier draft of this PR were
dropped in favour of the cache that landed via #65.
dereuromark added a commit that referenced this pull request May 13, 2026
…#66)

UseStatementSniff has its own getUseStatements() implementation that
duplicates UseStatementsTrait::getUseStatements() (cached in #64). It
already has an instance-level cache via existingStatements, which
covers repeated calls within a single phpcs pass; the new static
cache adds coverage across phpcbf fix iterations, where
populateTokenListeners() creates fresh sniff instances per pass and
resets the instance cache.

Cache invalidation follows the same fingerprint-based scheme as #64:
token count alone is not strong enough, since an alias rename keeps
it constant. Cached entries record a content fingerprint of each
use statement range and re-verify them against the live tokens
before being trusted. The cache also refuses to serve an empty
result so it cannot return stale state for a file where a fix added
a first use statement while another simultaneous fix happened to
keep the file's overall token count unchanged.

Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65
(parallel=1, --report=performance):

  PhpCollective.Namespaces.UseStatement   3.36s -> 2.08s

Existing test suite (100 tests / 122 assertions) passes unchanged.

The FQCN cache changes from the earlier draft of this PR were
dropped in favour of the cache that landed via #65.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant