Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff by dereuromark · Pull Request #63 · php-collective/code-sniffer

dereuromark · 2026-05-13T19:41:41Z

Summary

Follow-up to #62. Replaces three expensive backward token scans in ConsistentIndentSniff with O(1) lookups against the pre-computed conditions map, plus a per-file cache of T_FN scope ranges. No behaviour change; substantially faster on large files.

Background

ConsistentIndentSniff registers on T_WHITESPACE, so it runs against every whitespace token in the file. For each token at the start of a line it called three helpers that walked the token stream backwards:

Helper	Original scan window	What it looked for
`isInsideClosure`	up to 2000 tokens	T_CLOSURE / T_FN / named-function boundary
`isInsideSwitchCase`	up to 500 tokens	T_SWITCH
`couldBeInCaseBlock`	up to 500 tokens	T_CASE / T_DEFAULT

Each of those tokens IS in fact tracked by phpcs's PHP tokenizer in the per-token conditions map:

T_CLOSURE — PHP::processAdditional() rewrites the conditions of every inner token from T_FUNCTION to T_CLOSURE for anonymous functions (tokenizer source: PHP.php around line 2840).
T_SWITCH, T_CASE, T_DEFAULT — all scope openers in PHP::$scopeOpeners, so createLevelMap() already adds them to inner tokens' conditions.

So the inner-loop scans were redundant work. An empirical probe sniff confirms it:

Line  4 inside method:                       T_CLASS, T_FUNCTION
Line  7 inside switch + case:                T_CLASS, T_FUNCTION, T_SWITCH, T_CASE
Line 11 inside closure:                      T_CLASS, T_FUNCTION, PHPCS_T_CLOSURE
Line 14 inside closure + switch + case:      T_CLASS, T_FUNCTION, PHPCS_T_CLOSURE, T_SWITCH, T_CASE

The outdated comment "Case blocks don't show up in conditions, so expected might be wrong" was wrong for current phpcs; it has been removed.

Changes

isInsideClosure: O(1) walk of conditions for T_CLOSURE. For T_FN, which is the one case where phpcs does not propagate the token into inner tokens' conditions, build a per-file cache of [scope_opener, scope_closer] ranges on first use (single O(file) walk, then O(arrow_count) per query — typically a handful per file). The cache stores the token count alongside the ranges so it self-invalidates if the file is re-tokenized during a phpcbf fix loop.
isInsideSwitchCase: pure O(1) walk of conditions for T_SWITCH. The old backward scan was completely replaceable.
couldBeInCaseBlock: pure O(1) walk of conditions for T_CASE / T_DEFAULT. Same.

Why a cache and not a scan limit

An earlier draft used a 500-token bounded scan for the T_FN fallback. Codex review correctly flagged that as a behaviour regression: arrow function bodies can legitimately exceed 500 tokens (a long match expression, a deeply nested array literal as the body, a long method chain), and a bounded scan would then fail to detect that we're inside the arrow and could emit false-positive over-indentation errors on lines the original implementation correctly skipped. The cached-ranges approach is both unbounded and faster, so it strictly dominates.

Benchmarks

Measured on the same 1095-file CakePHP 5 app from #62 (ResidentsController.php, 11k LOC):

Run	Before #62	After #62 (master)	After this PR	Total delta
`parallel=1`, `ResidentsController.php`	40s	20s	6.5s	-84%
`parallel=16`, whole codebase wall	2m08s	1m30s	27.9s	-78%

Sniff-level perf report on ResidentsController.php:

Sniff	Before this PR	After this PR
`PhpCollective.WhiteSpace.ConsistentIndent`	4.12s (24.2%)	0.1s (~2%)

The full-codebase speedup is larger than just "the sniff got faster on the slow file" — much of it comes from the cheap empty-array iteration in files with no arrow functions at all, which previously paid for a backward scan per indented line.

Verification

Existing PHPUnit suite passes (100 tests / 122 assertions). ConsistentIndentSniffTest exercises the closure + switch/case scenarios.
composer cs-check and composer stan on this repo: clean.
Output diff (all sniffs, all files) on a real 1095-file codebase: identical violations before vs after.
Hand-crafted fixture with intentional over-indents inside named method / closure / switch case / default / arrow function: identical results in both versions.
Long-arrow-body stress fixture (multi-thousand-token match arrow expression): both versions correctly skip indent checks throughout — confirming the codex regression concern is addressed by the cached-ranges approach.
codex review --uncommitted on the final patch: clean.

Three helpers in ConsistentIndentSniff (isInsideClosure, isInsideSwitchCase, couldBeInCaseBlock) walked the token stream backwards up to 2000 / 500 / 500 tokens for every indented line. The tokens they searched for are already tracked in the per-token 'conditions' map built by the phpcs PHP tokenizer: - T_CLOSURE: PHP::processAdditional() rewrites the conditions of every token inside an anonymous function from T_FUNCTION to T_CLOSURE (PHP.php around line 2840). - T_SWITCH / T_CASE / T_DEFAULT: all scope openers, so createLevelMap() adds them to inner tokens' conditions. So the inner-loop scans were redundant. Replace with O(depth) walks of the conditions array (typically <10 entries). T_FN is the one exception — arrow functions get scope_opener/scope_closer but don't propagate to inner tokens' conditions — so keep a bounded fallback scan for it. Removed the outdated "Case blocks don't show up in conditions" comment; they do. Measured on the same workload as #62 (CakePHP 5 app, 1095 files, biggest controller 11k lines): parallel=1, single file: 20s -> 16.7s parallel=16, whole codebase: 1m30s -> 1m16s Sniff perf on the slowest file: 4.12s (24%) -> 1.09s (8%). Existing test suite (100 tests / 122 assertions) passes unchanged. Output diff across 1095 real-world files: identical violations.

Initial draft used a 500-token backward scan to find a containing T_FN (arrow function) when the conditions-based fast path didn't match. Codex review correctly flagged that as a behaviour regression: arrow function bodies can legitimately exceed 500 tokens (a long match expression, a deeply nested array as the body, a long method chain), and the bounded scan would then fail to detect that we're inside the arrow and could emit false-positive over-indentation errors on lines the original implementation correctly skipped. Replace the bounded scan with a per-file cache of [scope_opener, scope_closer] ranges for every T_FN token. Built lazily on first use in O(file) and checked in O(arrow_count) per query (typically a handful of arrows per file). No scan limit, no false negatives. Cache stores the token count alongside the ranges so it self- invalidates if the file is re-tokenized inside a phpcbf fix loop. Net effect vs the bounded-scan draft: parallel=1 ResidentsController.php: 16.7s -> 6.5s parallel=16 whole codebase wall: 1m16s -> 27.9s The big extra speedup comes from cheap empty-array iteration in files with no arrow functions at all, which was previously paying for a 500-token scan per indented line.

UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.

…#66) UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.

dereuromark added 2 commits May 13, 2026 21:31

dereuromark merged commit 78e4146 into master May 13, 2026
4 checks passed

dereuromark deleted the perf-consistent-indent-conditions branch May 13, 2026 19:51

This was referenced May 13, 2026

Perf: cache UseStatementsTrait getUseStatements + bound throw class-name lookup #64

Merged

Perf: cache docblock FQCN lookups #65

Merged

Perf: cache UseStatementSniff getUseStatements across fix iterations #66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff#63

Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff#63
dereuromark merged 2 commits into
masterfrom
perf-consistent-indent-conditions

dereuromark commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dereuromark commented May 13, 2026

Summary

Background

Changes

Why a cache and not a scan limit

Benchmarks

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant