Skip to content

Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff#63

Merged
dereuromark merged 2 commits into
masterfrom
perf-consistent-indent-conditions
May 13, 2026
Merged

Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff#63
dereuromark merged 2 commits into
masterfrom
perf-consistent-indent-conditions

Conversation

@dereuromark
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #62. Replaces three expensive backward token scans in ConsistentIndentSniff with O(1) lookups against the pre-computed conditions map, plus a per-file cache of T_FN scope ranges. No behaviour change; substantially faster on large files.

Background

ConsistentIndentSniff registers on T_WHITESPACE, so it runs against every whitespace token in the file. For each token at the start of a line it called three helpers that walked the token stream backwards:

Helper Original scan window What it looked for
isInsideClosure up to 2000 tokens T_CLOSURE / T_FN / named-function boundary
isInsideSwitchCase up to 500 tokens T_SWITCH
couldBeInCaseBlock up to 500 tokens T_CASE / T_DEFAULT

Each of those tokens IS in fact tracked by phpcs's PHP tokenizer in the per-token conditions map:

  • T_CLOSUREPHP::processAdditional() rewrites the conditions of every inner token from T_FUNCTION to T_CLOSURE for anonymous functions (tokenizer source: PHP.php around line 2840).
  • T_SWITCH, T_CASE, T_DEFAULT — all scope openers in PHP::$scopeOpeners, so createLevelMap() already adds them to inner tokens' conditions.

So the inner-loop scans were redundant work. An empirical probe sniff confirms it:

Line  4 inside method:                       T_CLASS, T_FUNCTION
Line  7 inside switch + case:                T_CLASS, T_FUNCTION, T_SWITCH, T_CASE
Line 11 inside closure:                      T_CLASS, T_FUNCTION, PHPCS_T_CLOSURE
Line 14 inside closure + switch + case:      T_CLASS, T_FUNCTION, PHPCS_T_CLOSURE, T_SWITCH, T_CASE

The outdated comment "Case blocks don't show up in conditions, so expected might be wrong" was wrong for current phpcs; it has been removed.

Changes

  • isInsideClosure: O(1) walk of conditions for T_CLOSURE. For T_FN, which is the one case where phpcs does not propagate the token into inner tokens' conditions, build a per-file cache of [scope_opener, scope_closer] ranges on first use (single O(file) walk, then O(arrow_count) per query — typically a handful per file). The cache stores the token count alongside the ranges so it self-invalidates if the file is re-tokenized during a phpcbf fix loop.
  • isInsideSwitchCase: pure O(1) walk of conditions for T_SWITCH. The old backward scan was completely replaceable.
  • couldBeInCaseBlock: pure O(1) walk of conditions for T_CASE / T_DEFAULT. Same.

Why a cache and not a scan limit

An earlier draft used a 500-token bounded scan for the T_FN fallback. Codex review correctly flagged that as a behaviour regression: arrow function bodies can legitimately exceed 500 tokens (a long match expression, a deeply nested array literal as the body, a long method chain), and a bounded scan would then fail to detect that we're inside the arrow and could emit false-positive over-indentation errors on lines the original implementation correctly skipped. The cached-ranges approach is both unbounded and faster, so it strictly dominates.

Benchmarks

Measured on the same 1095-file CakePHP 5 app from #62 (ResidentsController.php, 11k LOC):

Run Before #62 After #62 (master) After this PR Total delta
parallel=1, ResidentsController.php 40s 20s 6.5s -84%
parallel=16, whole codebase wall 2m08s 1m30s 27.9s -78%

Sniff-level perf report on ResidentsController.php:

Sniff Before this PR After this PR
PhpCollective.WhiteSpace.ConsistentIndent 4.12s (24.2%) 0.1s (~2%)

The full-codebase speedup is larger than just "the sniff got faster on the slow file" — much of it comes from the cheap empty-array iteration in files with no arrow functions at all, which previously paid for a backward scan per indented line.

Verification

  • Existing PHPUnit suite passes (100 tests / 122 assertions). ConsistentIndentSniffTest exercises the closure + switch/case scenarios.
  • composer cs-check and composer stan on this repo: clean.
  • Output diff (all sniffs, all files) on a real 1095-file codebase: identical violations before vs after.
  • Hand-crafted fixture with intentional over-indents inside named method / closure / switch case / default / arrow function: identical results in both versions.
  • Long-arrow-body stress fixture (multi-thousand-token match arrow expression): both versions correctly skip indent checks throughout — confirming the codex regression concern is addressed by the cached-ranges approach.
  • codex review --uncommitted on the final patch: clean.

Three helpers in ConsistentIndentSniff (isInsideClosure,
isInsideSwitchCase, couldBeInCaseBlock) walked the token stream
backwards up to 2000 / 500 / 500 tokens for every indented line.

The tokens they searched for are already tracked in the per-token
'conditions' map built by the phpcs PHP tokenizer:

  - T_CLOSURE: PHP::processAdditional() rewrites the conditions
    of every token inside an anonymous function from T_FUNCTION
    to T_CLOSURE (PHP.php around line 2840).
  - T_SWITCH / T_CASE / T_DEFAULT: all scope openers, so
    createLevelMap() adds them to inner tokens' conditions.

So the inner-loop scans were redundant. Replace with O(depth) walks
of the conditions array (typically <10 entries). T_FN is the one
exception — arrow functions get scope_opener/scope_closer but don't
propagate to inner tokens' conditions — so keep a bounded fallback
scan for it.

Removed the outdated "Case blocks don't show up in conditions"
comment; they do.

Measured on the same workload as #62 (CakePHP 5 app, 1095 files,
biggest controller 11k lines):

  parallel=1, single file:       20s -> 16.7s
  parallel=16, whole codebase:  1m30s -> 1m16s

Sniff perf on the slowest file: 4.12s (24%) -> 1.09s (8%).

Existing test suite (100 tests / 122 assertions) passes unchanged.
Output diff across 1095 real-world files: identical violations.
Initial draft used a 500-token backward scan to find a containing
T_FN (arrow function) when the conditions-based fast path didn't
match. Codex review correctly flagged that as a behaviour regression:
arrow function bodies can legitimately exceed 500 tokens (a long
match expression, a deeply nested array as the body, a long method
chain), and the bounded scan would then fail to detect that we're
inside the arrow and could emit false-positive over-indentation
errors on lines the original implementation correctly skipped.

Replace the bounded scan with a per-file cache of [scope_opener,
scope_closer] ranges for every T_FN token. Built lazily on first use
in O(file) and checked in O(arrow_count) per query (typically a
handful of arrows per file). No scan limit, no false negatives.

Cache stores the token count alongside the ranges so it self-
invalidates if the file is re-tokenized inside a phpcbf fix loop.

Net effect vs the bounded-scan draft:
  parallel=1 ResidentsController.php:  16.7s -> 6.5s
  parallel=16 whole codebase wall:     1m16s -> 27.9s

The big extra speedup comes from cheap empty-array iteration in
files with no arrow functions at all, which was previously paying
for a 500-token scan per indented line.
@dereuromark dereuromark merged commit 78e4146 into master May 13, 2026
4 checks passed
@dereuromark dereuromark deleted the perf-consistent-indent-conditions branch May 13, 2026 19:51
dereuromark added a commit that referenced this pull request May 13, 2026
UseStatementSniff has its own getUseStatements() implementation that
duplicates UseStatementsTrait::getUseStatements() (cached in #64). It
already has an instance-level cache via existingStatements, which
covers repeated calls within a single phpcs pass; the new static
cache adds coverage across phpcbf fix iterations, where
populateTokenListeners() creates fresh sniff instances per pass and
resets the instance cache.

Cache invalidation follows the same fingerprint-based scheme as #64:
token count alone is not strong enough, since an alias rename keeps
it constant. Cached entries record a content fingerprint of each
use statement range and re-verify them against the live tokens
before being trusted. The cache also refuses to serve an empty
result so it cannot return stale state for a file where a fix added
a first use statement while another simultaneous fix happened to
keep the file's overall token count unchanged.

Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65
(parallel=1, --report=performance):

  PhpCollective.Namespaces.UseStatement   3.36s -> 2.08s

Existing test suite (100 tests / 122 assertions) passes unchanged.

The FQCN cache changes from the earlier draft of this PR were
dropped in favour of the cache that landed via #65.
dereuromark added a commit that referenced this pull request May 13, 2026
…#66)

UseStatementSniff has its own getUseStatements() implementation that
duplicates UseStatementsTrait::getUseStatements() (cached in #64). It
already has an instance-level cache via existingStatements, which
covers repeated calls within a single phpcs pass; the new static
cache adds coverage across phpcbf fix iterations, where
populateTokenListeners() creates fresh sniff instances per pass and
resets the instance cache.

Cache invalidation follows the same fingerprint-based scheme as #64:
token count alone is not strong enough, since an alias rename keeps
it constant. Cached entries record a content fingerprint of each
use statement range and re-verify them against the live tokens
before being trusted. The cache also refuses to serve an empty
result so it cannot return stale state for a file where a fix added
a first use statement while another simultaneous fix happened to
keep the file's overall token count unchanged.

Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65
(parallel=1, --report=performance):

  PhpCollective.Namespaces.UseStatement   3.36s -> 2.08s

Existing test suite (100 tests / 122 assertions) passes unchanged.

The FQCN cache changes from the earlier draft of this PR were
dropped in favour of the cache that landed via #65.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant