Skip to content

JIT: Inline ZEND_BW_NOT#22560

Open
millerphp wants to merge 1 commit into
php:PHP-8.4from
millerphp:gh-22559-jit-inline-bw-not
Open

JIT: Inline ZEND_BW_NOT#22560
millerphp wants to merge 1 commit into
php:PHP-8.4from
millerphp:gh-22559-jit-inline-bw-not

Conversation

@millerphp

Copy link
Copy Markdown

JIT: inline ZEND_BW_NOT (fixes tracing-JIT miscompilation of ~ in a masked expression)

Fixes #22559.

Summary

Under the tracing JIT, the bitwise-NOT operator ~ is not inlined - it is emitted as a call to the ZEND_BW_NOT_SPEC VM helper. In a side trace, the helper's result is written to a stack slot that can alias a spilled, loop-carried CV, silently clobbering that CV. The result is a wrong value for code as ordinary as a flags computation, e.g. an integer that is provably a byte comes out negative.

This PR inlines ZEND_BW_NOT for the LONG case (as x ^ -1), the same way ZEND_BW_OR/AND/XOR are already inlined. That removes the helper call and its temporary result slot, so the collision can't happen - and it's a small perf
win. A regression test is included.

Interpreter and function-JIT were always correct; only the tracing JIT was wrong.

1. Symptom

<?php
final class C {
    /** @var int[] */ public array $t = [];
    public int $a = 0;
    public int $f = 0;
    public function __construct() { for ($i = 0; $i < 256; $i++) $this->t[$i] = $i & 0xA8; }
    public function add8(int $value, int $carry): void {
        $a = $this->a;
        $total = $a + $value + $carry;
        $result = $total & 0xFF;
        // Every OR term is masked to a byte, so $this->f is provably in 0..255.
        $this->f = $this->t[$result]
            | (($total & 0x100) ? 0x01 : 0)
            | (((($a & 0x0F) + ($value & 0x0F) + $carry) & 0x10) ? 0x10 : 0)
            | ((~($a ^ $value) & ($a ^ $result) & 0x80) ? 0x04 : 0); // <- masked to bit 7
        $this->a = $result;
    }
}
$c = new C();
for ($i = 0; $i < 100000; $i++) {
    $c->a = $i & 0xFF;
    $c->add8(($i >> 8) & 0xFF, 0);
    if (($c->f & ~0xFF) !== 0) {                 // $f must never have bits above the low byte
        printf("MISCOMPILED at i=%d: \$f = %d (0x%X)\n", $i, $c->f, $c->f);
        break;
    }
}
echo "done\n";
$ php -d opcache.jit=tracing  …   MISCOMPILED at i=638: $f = -105 (0x…FF97)
$ php -d opcache.jit=function …   done      # correct
$ php -d opcache.enable_cli=0  …   done      # correct

Every operand of the | chain is byte-sized (the ~…&0x80 term is 0 or 0x04), so $f is provably in 0..0xBD. Under the tracing JIT it becomes negative - the correct low byte (0x97) with all upper bits set, i.e. a full-width ~ value.

2. Research — pinning the affected versions

Built each branch from git and tested with the JIT verified active
(opcache_get_status()['jit']['on'] === true):

PHP JIT backend tracing JIT
8.3 DynASM PASS (not affected)
8.4 IR FAIL
8.5 IR FAIL
8.6 IR FAIL

So this is an IR-JIT regression, introduced with the IR JIT in 8.4 - 8.3's DynASM JIT is fine. (Two methodology notes that cost us time and are worth recording: a --disable-all build omits opcache on optional-opcache versions, so the JIT never runs and the repro falsely passes — always verify jit.on; and the bug needs opcache.jit_hot_side_exit at its default (8), because the harness default of 1 compiles side traces so eagerly the buggy one never forms.)

This is not #22115 (loop-PHI CV register dropped from a side-exit SNAPSHOT). Building that fix resolves its reproducer but leaves this one failing - same subsystem, different defect.

3. Proof — from the disassembly

The buggy side trace (opcache.jit_debug + capstone), annotated:

movq  %r13, 0xa0(%r14)      ; spill the loop-carried $f accumulator -> frame slot 0xa0
orq   $0x10, %r13
movq  %rbx, %rax
xorq  0x50(%r14), %rax      ; ($a ^ $value)
movq  %rax, 0xc0(%r14)      ; stage it as the '~' operand
callq ZEND_BW_NOT_SPEC_TMPVARCV_HANDLER   ; '~' is a HELPER CALL; result -> slot 0xa0
xorq  %r12, %rbx            ; ($a ^ $result)
leaq  0xa0(%r14), %rdx
andq  (%rdx), %rbx          ; & the '~' result
testq $0x80, %rbx
je    jit$$trace_exit_0
movq  0xa0(%r14), %r13      ; reload $f FROM 0xa0 — but 0xa0 now holds ~($a^$value)!
orq   $0x14, %r13
movq  %r13, (%rdx)          ; store the corrupted $f

The register allocator assigned the same frame slot 0xa0 to (a) the spilled loop-carried CV holding $f and (b) the result of the ZEND_BW_NOT helper call. The helper result clobbers the spilled $f; the reload then loads ~($a^$value)

  • exactly the observed negative value. (IR confirms it: the CV RLOAD … {%r13} # BIND(0xa0) shares slot 0xa0 with the BW_NOT operand/result.)

This is ~-specific because ~ is the only bitwise op the JIT does not inline; AND/OR/XOR are inlined, so (x ^ 0x80) in place of ~x avoids the helper and works - which is the one-operator source-level workaround.

4. The fix

Inline ZEND_BW_NOT for the LONG case, mirroring the other bitwise ops:

  • ext/opcache/jit/zend_jit_ir.c - new zend_jit_bw_not(): emits res = ir_XOR_L(op1, -1) (~x == x ^ -1), stores the LONG result. No helper, no temp slot.
  • ext/opcache/jit/zend_jit.c - function-JIT dispatch: case ZEND_BW_NOT (definitely-LONG fast path; otherwise falls through to the VM handler as before).
  • ext/opcache/jit/zend_jit_trace.c - tracing-JIT codegen dispatch case ZEND_BW_NOT (with CHECK_OP1_TRACE_TYPE()), plus ZEND_BW_NOT added to the ADD_OP1_TRACE_GUARD() group so the trace guards op1's type.

55 insertions, no deletions. The inlined path is taken only when op1 is definitely LONG; any other type keeps the previous helper-based behaviour.

5. Why this fix

  • It's the natural fix: BW_NOT is the only bitwise operator not inlined by the JIT (BW_OR/AND/XOR already are). Inlining it removes the helper call entirely - so there is no result slot to alias a spilled CV - and it's a small performance win, not just a correctness patch.
  • Minimal and low-risk: It reuses the existing, well-tested inlined-XOR path (ir_XOR_L), touches only the JIT, and gates on a known-LONG operand; anything else is unchanged.
  • Alternative considered: fixing the trace register allocator so a helper-call result can never alias a live spilled CV (zend_jit_trace_allocate_registers). That is more general but deeper and riskier; inlining BW_NOT fixes the observed
    bug directly and is the more upstream-friendly change. (Maintainers may of course prefer to also harden the allocator.)

Testing

  • New regression test ext/opcache/tests/jit/gh22558.phpt (overrides opcache.jit_hot_side_exit back to 8 so the bug reproduces under run-tests).
  • Repro now prints ok; the ~ result is byte-identical between the tracing JIT and the interpreter across -300..300 and edge values (PHP_INT_MIN/MAX).
  • Full ext/opcache/tests suite on 8.4: 883 tests, 0 failed (853 passed, 29 skipped).

Under the tracing JIT, ~ (ZEND_BW_NOT) was not inlined and fell back to the ZEND_BW_NOT_SPEC helper, whose result could be allocated to a stack slot aliasing a spilled loop-carried CV, clobbering it and producing a wrong
result. Inline ZEND_BW_NOT for the LONG case (x ^ -1) like the other bitwise ops, removing the helper call and its temporary slot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants