Skip to content

Fix GH-8739: opcache_reset()/invalidate() races via epoch-based reclamation#21778

Open
superdav42 wants to merge 1 commit intophp:PHP-8.4from
superdav42:fix/opcache-ebr-reset-safety-84
Open

Fix GH-8739: opcache_reset()/invalidate() races via epoch-based reclamation#21778
superdav42 wants to merge 1 commit intophp:PHP-8.4from
superdav42:fix/opcache-ebr-reset-safety-84

Conversation

@superdav42
Copy link
Copy Markdown

Summary

Fixes the long-standing race in opcache_reset()/opcache_invalidate() that causes zend_mm_heap corrupted crashes under concurrent load in both ZTS (FrankenPHP, parallel) and FPM configurations.

The writer destructively mutates ZCSG(hash) (memset) and the interned strings buffer while readers may still hold pointers into that memory. The fix uses epoch-based reclamation (EBR) — a lock-free pattern used by RCU in the Linux kernel and by Crossbeam in Rust — to defer the destructive cleanup until all pre-reset readers have completed their requests.

Why EBR (and not PR #14803's locking)

@dstogov rejected the prior attempt (#14803) because it took zend_shared_alloc_lock() on every cache lookup:

Please think about a different solution. This one should make a big slowdown. opcache should be able to do lock-free cache lookups. Probably, the modification of ZCSG(hash) should be done more careful (to keep consistency for concurrent readers)

This patch satisfies that constraint: readers remain lock-free. The per-request cost is two atomic stores to cache-line-padded slots (no false sharing). Writers wait for readers to drain instead of the other way around.

Design

Readers (accel_activate, accel_post_deactivate):

  • On request start: atomically publish ZCSG(current_epoch) to this thread's slot.
  • On request end: atomically store ACCEL_EPOCH_INACTIVE to this thread's slot.

Writers (the deferred-restart path in accel_activate when accel_is_inactive() is false):

  • Record drain_epoch = current_epoch, increment current_epoch (new readers publish a higher epoch).
  • Set accelerator_enabled = false, reset_deferred = true.
  • Do not call zend_accel_hash_clean() or accel_interned_strings_restore_state() yet.

Drain completion (called from both accel_activate and accel_post_deactivate):

  • If min(slot[*].epoch) > drain_epoch and no overflow readers are active, take the SHM lock and run the deferred cleanup.

Safety under slot exhaustion

The per-slot pool is 256 slots × 64 cache-padded bytes = 16 KiB of SHM. Installations with more than 256 concurrent threads/processes (large FPM pools) hit the overflow path: the aggregate ZCSG(epoch_overflow_active) counter is incremented instead, and deferred reclamation is held back until it reaches zero. This preserves correctness — overflow readers never disappear silently.

Test results

All tests pass with the change applied (--enable-debug --enable-zts --enable-opcache):

Suite Result
ext/opcache/tests 876/876 pass
Zend/tests (with opcache loaded) 4963/4963 pass
Full make test (28 extensions built) 14643 pass, 1 pre-existing environmental failure (verified fails without opcache too)

Concurrent dev-server test: ~250 parallel requests with mixed opcache_reset() calls — server remained responsive, no crashes, no heap corruption.

Regression tests added

  • ext/opcache/tests/gh8739_reset_epoch_safety.phpt — cached class remains accessible after reset
  • ext/opcache/tests/gh8739_reset_multiple.phpt — multiple resets don't crash
  • ext/opcache/tests/gh8739_invalidate_epoch_safety.phpt — invalidate doesn't crash while holding cached pointer

Fixes

Test plan

…clamation

opcache_reset() and opcache_invalidate() destroy shared memory (hash
table memset, interned strings restore, SHM watermark reset) while
concurrent reader threads/processes still hold pointers into that
memory. This causes zend_mm_heap corrupted crashes under concurrent
load in both ZTS (FrankenPHP, parallel) and FPM configurations.

A prior fix attempt (PR php#14803) wrapped cache lookups in
zend_shared_alloc_lock(), which was rejected by Dmitry Stogov because
readers must remain lock-free. This patch introduces epoch-based
reclamation (EBR), a proven lock-free pattern also used by RCU in the
Linux kernel and by Crossbeam in Rust:

- Readers publish their epoch on request start (one atomic store) and
  clear it on request end (one atomic store). No locks acquired.
- Writers (opcache_reset path) increment the global epoch and defer the
  actual hash/interned-string cleanup until all pre-epoch readers have
  completed. The deferred reset completes at the next request boundary
  after the drain is complete.
- The existing immediate-cleanup fast path is retained for the case
  when accel_is_inactive() reports no active readers.

Additional safety:

- When the per-slot pool (256 slots) is exhausted, readers fall back
  to an aggregate overflow counter that unconditionally blocks
  deferred reclamation while non-zero. This preserves safety for
  installations with more concurrent readers than slots.
- UNASSIGNED vs OVERFLOW sentinels distinguish "slot not yet attempted"
  from "allocation failed", avoiding per-request atomic contention on
  the slot-next counter once slots are exhausted.
- A full memory barrier after zend_accel_hash_clean()'s memset ensures
  readers on weak-memory architectures see the zeroed table before any
  subsequent insert.
- A defensive ZCG(locked) check in accel_try_complete_deferred_reset()
  prevents the ZEND_ASSERT(!ZCG(locked)) failure that would otherwise
  fire if the completer is re-entered while the SHM lock is held.

Fixes phpGH-8739
Fixes phpGH-14471
Fixes phpGH-18517

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@superdav42 superdav42 force-pushed the fix/opcache-ebr-reset-safety-84 branch from 048c11e to e110a90 Compare April 16, 2026 20:31
superdav42 added a commit to superdav42/static-php-cli that referenced this pull request Apr 16, 2026
Bundles the EBR reset-safety patch from upstream PR #21778 as a
source-extract hook so tarball builds pick it up automatically. The
patch fixes zend_mm_heap corruption crashes triggered by
opcache_reset()/opcache_invalidate() racing concurrent readers under
ZTS (FrankenPHP) and FPM.

- src/globals/patch/php84_opcache_ebr_reset_safety.patch: verbatim
  from PR #21778 commit e110a901 (NEWS hunk stripped for release
  tarball compatibility)
- SourcePatcher::patchPhpOpcacheEbrResetSafety gated on 80400 <= ver
  < 80500 with an idempotency probe against accel_try_complete_deferred_reset

Remove once the fix is merged and tagged into an 8.4.x release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant