Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize observers #13649

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Optimize observers #13649

wants to merge 11 commits into from

Conversation

bwoebi
Copy link
Member

@bwoebi bwoebi commented Mar 9, 2024

Inline the lookup whether a function is observed at all. This strategy is also used for FRAMELESS calls. If the frameless call is observed, we instead allocate a call frame and push the arguments, to call the the function afterwards. Doing so is still a performance benefit as opposed to executing individual INIT_FCALL+SEND_VAL ops. Thus, even if the frameless call turns out to be observed, the call overhead is slightly lower than before. If the internal function is not observed at all, the unavoidable overhead is fetching the FLF zend_function pointer and the run-time cache needs to be inspected.

As part of this work, it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode. This is a bit unusual in comparison to all other ZEND_OP_DATA usages, but seems to not pose problems overall.

There is also a small issue resolved: trampolines would always use the ZEND_CALL_TRAMPOLINE_SPEC_OBSERVER function due to zend_observer_fcall_op_array_extension being set to -1 too late.

The same inlining of operations has been applied to JIT as well.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand what the benefit of moving result to OP_DATA is. Can you elaborate on this point?

I also didn't fully understand why ZEND_OBSERVER_NONE_OBSERVED is needed. From what I understand, zend_observer_remove_handler moves the tail to -1, so if the first handler is unset, no handler should be installed. Tbh the observer.{h,c} code is quite hard to read, but I didn't spend too much time on it.

Did you run some benchmarks? Can you notice a difference for frameless functions, in particular?

Zend/Optimizer/sccp.c Outdated Show resolved Hide resolved
Zend/Optimizer/zend_inference.c Outdated Show resolved Hide resolved
return *handler == ZEND_OBSERVER_NONE_OBSERVED;
}

static zend_always_inline bool zend_observer_fcall_has_no_observers(zend_execute_data *execute_data, bool allow_generator, zend_observer_fcall_begin_handler **handler) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Avoid double negation? E.g. !zend_observer_fcall_has_no_observers.

Copy link
Member Author

@bwoebi bwoebi Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming problem here is that it's not a binary state, but "no observers", "some observers" and "observers undetermined", and I want to distinguish "no observers" vs "some or undetermined observers". I'm not sure how to better name that.

ZEND_API void ZEND_FASTCALL zend_observer_fcall_begin(zend_execute_data *execute_data);
void ZEND_FASTCALL zend_observer_fcall_begin_prechecked(zend_execute_data *execute_data, zend_observer_fcall_begin_handler *observer_data);

static zend_always_inline bool zend_observer_handler_is_unobserved(zend_observer_fcall_begin_handler *handler) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, avoid double negation?

@bwoebi
Copy link
Member Author

bwoebi commented Mar 9, 2024

I did not understand what the benefit of moving result to OP_DATA is. Can you elaborate on this point?

Hm, have I been unclear in the PR description? "it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode" - DO_ICALL cannot jump an extra opline (skipping OP_DATA that is) at its end, and accesses result on its opline.

I also didn't fully understand why ZEND_OBSERVER_NONE_OBSERVED is needed

Simply an optimization to avoid reading both start and end handlers when none is set, i.e. saves a read in the hot path of checking whether a function is being observed.

@bwoebi
Copy link
Member Author

bwoebi commented Mar 10, 2024

Regarding rough benchmarking numbers, without debug, without opcache (wall time; v: valgrind instructions), with this script:

Baseline of empty loop: 1 sec (v: 1,007,435,926)

(Using 20000000 iterations for valgrind)

$s = 1; for ($i = 0; $i < 500000000; ++$i) { $a = min($s, 2); }

master, without observers: 2.8 sec (v: 2,727,435,251)
master, with observers: 5.7 sec (v: 5,527,657,794)
this PR, with observers: 2.9 sec (v: 3,047,665,900)

So, yes, the frameless functions are much faster to call.

Now, the same code with pow instead of min (which is not using FLF currently):

master, without observers: 4.9 sec (v: 4,687,435,657)
master, with observers: 6.5 sec (v: 6,387,657,790)
this PR, with observers: 5.9 sec (v: 5,587,666,190)

The generic improvement of inlining the runtime cache check (and just checking one value instead of two) is quite significant on its own already, but using frameless calls just completely blows it out of the water.

I'm also amazed that apparently for frameless calls it uses only 0.1 sec (3% of time) more even though it needs 10% more valgrind instructions.

@bwoebi bwoebi force-pushed the optimize-observers branch 2 times, most recently from 0dd6584 to ee5d6ed Compare March 10, 2024 20:14
@iluuu1994
Copy link
Member

it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode

I missed this point. Unfortunately, I don't have a better idea without significant complications that would nullify the performance benefit, or duplicate the DO_ICALL handler.

Simply an optimization to avoid reading both start and end handlers when none is set, i.e. saves a read in the hot path of checking whether a function is being observed.

Thanks for the explanation. 🙂 The naming could probably be improved could be improved. Or at least add a short comment to ZEND_OBSERVER_NONE_OBSERVED ("The function has neither begin nor end observers"). I assumed it had to do with holes in the list, but the code indicated that this can't happen.

Comment on lines -31 to -32
#define ZEND_OBSERVABLE_FN(function) \
(ZEND_MAP_PTR(function->common.run_time_cache) && !(function->common.fn_flags & ZEND_ACC_CALL_VIA_TRAMPOLINE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this check effectively move to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to zend_observer_fcall_has_no_observers in zend_observer.h

@bwoebi bwoebi force-pushed the optimize-observers branch 7 times, most recently from 2b3c7ec to f623714 Compare March 20, 2024 00:54
@bwoebi bwoebi marked this pull request as ready for review March 20, 2024 00:54
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be committed. I don't like the fact that the engine started to depend on observer API more and more.

I didn't review the zend_jit_ir.c part yet. Can it be implemented using JIT helpers (C function) to make the most dangerous part of the patch less? I don't think bloating of that JIT code may make any positive effect. I expect this patch should slow-down the real life apps.

As an optimization patch it requires benchmark results to prove its profitability.

@iluuu1994 could you please also take a look.

if (ZEND_OBSERVER_ENABLED || zend_execute_internal) {
if (zend_execute_internal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had agreed to fallback using ICALL instead of FRAMELESS in case of OBSERVER.

The actual speed-up of FRAMELESS calls wasn't very significant and the increase of the complexity already made questions. I expect, any observer makes much more significant slowdown and I don't see reasons in attempt
to win 0.1% of performance when we already losing 10%.

This patch looks much more complex...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the proper benchmarking homework now - frameless makes about 1% in wall time for wordpress, with observers.
And yes, we were losing 5% (wall time). Now we're losing 2.5%.
I would agree on 0.1% speedup not making sense for 10% loss. But that's not the case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I remember frameless calls gave ~0.8% win on symfony demo amd wordpress (let it be 1%).
How the observer might lose 5% in the back direction? It should be the same 1%.

Copy link
Member Author

@bwoebi bwoebi Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.

I was just saying that:

I don't see reasons in attempt to win 0.1% of performance when we already losing 10%.

is exaggerated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.

OK. This sounds more fair. Framiteless functions give 1% speedup, but only without observer. This proposal attempts to give the same 1% speedup when observed.

I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED + zend_observer_handler_is_unobserved).
Does this patch makes real improvement without JIT? (Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%).

I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?

I'll try to analyse your benchmark results more careful...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%.

The instr difference on this patch (with frameless) vs current master is 6-7%. The perf difference is 2.5%.
The instr difference on this patch with frameless vs this patch without frameless is 2%. The perf difference is 1%.

For both JIT and non-JIT.

I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?

The generated code is essentially if (needs observing) { push frame (without arg copy); call FLF handler; pop frame } else { call FLF handler; }. So it's not quite falling back to ICALL in JIT.

I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED + zend_observer_handler_is_unobserved).

The ZEND_OBSERVER_ENABLED part is specialized away. Only zend_observer_handler_is_unobserved is checked, and that one is inlined - i.e. the full check for FLF functions is just if (*ZEND_OBSERVER_DATA(ZEND_FLF_FUNC(opline)) == ZEND_OBSERVER_NONE_OBSERVED).

which is a very low-cost check, comparatively to ICALL. We can assume the run time cache for FLF functions being set and the func not be a generator or trampoline, saving a fetch and two comparisons too.

Yes, if we had the same handling than we had for FCALL, then the speedup would not quite be the same.

Comment on lines +1125 to +1129
case ZEND_FRAMELESS_ICALL_3:
// Copy result to OP_DATA to ensure dispatch in observer fallback can be aligned with the last opcode of the call
opline[1].result.var = opline->result.var;
opline[1].result_type = opline->result_type;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why this is necessary. For me this looks like a hack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the fallback to ZEND_DO_ICALL (when we VM_DISPATCH) to work, the opcode after opline must be the next actual opline after and not OP_DATA. Hence ZEND_DO_ICALL must be dispatched on the OP_DATA opcode for ZEND_FRAMELESS_ICALL_3. And thus result must be on the OP_DATA opcode as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to avoid this hack.
It may confuse some of Optimizer pass (e.g. optimize_temp_vars_5.c) that don't expect repeatable definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why this hack is removed in undo_pass_two. I.e. for optimizer purposes it doesn't know about that hack. But it works fine with VM :-)

Comment on lines -9605 to +9618
ZVAL_NULL(EX_VAR(opline->result.var));
zval *result = EX_VAR(opline->result.var);
ZVAL_NULL(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this really change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, in ZEND_FRAMELESS_ICALL_0 (actually the line below should also use result). For the other frameless opcodes, it was about not refetching from opline->result after the unobserved check.

if (ZEND_OBSERVER_ENABLED) {
zend_function *fbc = ZEND_FLF_FUNC(opline);
if (UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(fbc)) == false)) {
zend_execute_data *call = _zend_vm_stack_push_call_frame_ex(zend_vm_calc_used_stack(0, fbc), ZEND_CALL_NESTED_FUNCTION, fbc, 0, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we avoid call to zend_vm_calc_used_stack(0, fbc). The size of the stack should be known statically.
May be I'm wrong... I don't remember when we got zend_internal_function->T and what does it mean...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the temporary the observers use to store the prev_observed_frame.
For FLF functions we can probably hardcode it to (ZEND_CALL_FRAME_SLOT + X + 1) * sizeof(zval), at least for now given that we don't have a mechanism to dynamically allocate temps for arbitrary functions.

This is outside the hot path of not-observed functions, so I don't consider this as critical, but can change it.

Comment on lines -9634 to +9662
if (EG(exception)) {
if (UNEXPECTED(EG(exception) != NULL)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your patch is already huge enough. Mixing the real changes with unrelated code refactoring is annoying.

@bwoebi
Copy link
Member Author

bwoebi commented Mar 25, 2024

Some valgrind benchmark.php numbers:

This PR (without observers, virtually the same then baseline without observers):

{
    "Zend\/bench.php": {
        "instructions": "2230512408"
    },
    "Zend\/bench.php JIT": {
        "instructions": "608925423"
    },
    "Symfony Demo 2.2.3": {
        "instructions": "98897712"
    },
    "Symfony Demo 2.2.3 JIT": {
        "instructions": "100896997"
    },
    "Wordpress 6.2": {
        "instructions": "265831175"
    },
    "Wordpress 6.2 JIT": {
        "instructions": "224894366"
    }
}

This PR with -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0':

{
    "Zend\/bench.php": {
        "instructions": "2385706119"
    },
    "Zend\/bench.php JIT": {
        "instructions": "656269578"
    },
    "Symfony Demo 2.2.3": {
        "instructions": "103402360"
    },
    "Symfony Demo 2.2.3 JIT": {
        "instructions": "104049744"
    },
    "Wordpress 6.2": {
        "instructions": "281125030"
    },
    "Wordpress 6.2 JIT": {
        "instructions": "231714316"
    }
}

Note that for Zend/bench.php, it adds 155M instrs to have observer enabled at all. For the JIT variant it adds only 48M instrs.
Similarly for Symfony demo it adds 5.6M instrs to have observer enabled. With JIT it adds only 3.2M instrs.

So having extra JIT support halves the overhead for symfony.

Having frameless+observer support in JIT has virtually no improvement for symfony. Having frameless+observer outside of JIT shows about 0.8% improvement on the Symfony Demo (104191895 instructions when frameless is disabled).
For wordpress having frameless disabled is 287232509 instrs; with JIT it's 235009436 instrs. I.e. 2-4% of the improvement comes from frameless.

For comparison the baseline numbers, with observer enabled:

{
    "Zend\/bench.php": {
        "instructions": "2514733906"
    },
    "Zend\/bench.php JIT": {
        "instructions": "805609640"
    },
    "Symfony Demo 2.2.3": {
        "instructions": "106351366"
    },
    "Symfony Demo 2.2.3 JIT": {
        "instructions": "105622352"
    },
    "Wordpress 6.2": {
        "instructions": "297501230"
    },
    "Wordpress 6.2 JIT": {
        "instructions": "246740281"
    }
}

I.e. compared to baseline Symfony is 3% faster. JIT is 1.5% faster.
Wordpress is 6% faster, JIT is 7% faster.
On the bench.php it's 5% faster; bench JIT is 8% faster.

The relative overhead of observer itself on bench.php JIT is only 5M (6%) instrs instead of 20M (25%) instrs now.
On Wordpress (no JIT) it's 15M (6%) instrs instead of 32M (12%) instrs now. With JIT we have 7M (3%) instead of 22M (10%) instrs,
With Symfony (no JIT) the relative gain is less, but still, dropping from 8% to 5%.

Overall it feels worth the bit of extra complexity - bench.php gets significant improvement, wordpress has also quite significant improvement (a 10% overhead is more than halved); Symfony gets a moderate improvement. In terms of wall time it was like nearly ~1% improvement for me on Symfony JIT. Wordpress JIT got a solid 2% faster in wall time (and up to 10% in cpu time according to time!). bench.php is 7% faster in wall time.

@bwoebi bwoebi force-pushed the optimize-observers branch 5 times, most recently from 07ad243 to 343d450 Compare April 2, 2024 16:39
@bwoebi
Copy link
Member Author

bwoebi commented Apr 7, 2024

@dstogov Looks like all tests pass now (minus flaky ones).

Inline the lookup whether a function is observed at all.
This strategy is also used for FRAMELESS calls. If the frameless call is observed, we instead allocate a call frame and push the arguments, to call the the function afterwards.
Doing so is still a performance benefit as opposed to executing individual INIT_FCALL+SEND_VAL ops. Thus, even if the frameless call turns out to be observed, the call overhead is slightly lower than before.
If the internal function is not observed at all, the unavoidable overhead is fetching the FLF zend_function pointer and the run-time cache needs to be inspected.

As part of this work, it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode.
This is a bit unusual in comparison to all other ZEND_OP_DATA usages, but seems to not pose problems overall.

There is also a small issue resolved: trampolines would always use the ZEND_CALL_TRAMPOLINE_SPEC_OBSERVER function due to zend_observer_fcall_op_array_extension being set to -1 too late.

Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make the code-review. I just run tests in different configurations...

make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.observe_all=1 -d zend_test.observer.show_output=0 -d opcache.jit=1205  -j16 Zend tests ext/opcache ext/zend_test"

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
property_exists() tests [Zend/tests/011.phpt]
testing reusing anons that implement an interface [Zend/tests/anon/005.phpt]
Bug #81626: Error on use static:: in __сallStatic() wrapped to Closure::fromCallable() [Zend/tests/bug81626.phpt]
Calling bindTo() on __call() closure [Zend/tests/closure_call_bind.phpt]
Bug #80929: Method name corruption related to zend_closure_call_magic [Zend/tests/closures/bug80929.phpt]
Testing Closure::fromCallable() functionality: Basic [Zend/tests/closures/closure_from_callable_basic.phpt]
Argument unpacking in new arguments in static variable [Zend/tests/constexpr/new_arg_unpack.phpt]
Invalid operation in new arg in const expr [Zend/tests/constexpr/new_invalid_operation_in_arg.phpt]
First Class Callable from Magic [Zend/tests/first_class_callable_005.phpt]
First class callables and &__call() [Zend/tests/first_class_callable_016.phpt]
Argument of new on class without constructor are evaluated [Zend/tests/new_args_without_ctor.phpt]
Trampoline closure created from magic method accepts named arguments [Zend/tests/trampoline_closure_named_arguments.phpt]
Bug #75969: Assertion failure in live range DCE due to block pass misoptimization [ext/opcache/tests/bug75969.phpt]
JIT FETCH_DIM_FUNC_ARG: 002 [ext/opcache/tests/jit/fetch_dim_func_arg_002.phpt]
...
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -d opcache.jit=1205  -j16 Zend tests ext/opcache ext/zend_test"

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
testing reusing anons that implement an interface [Zend/tests/anon/005.phpt]
Bug #81626: Error on use static:: in __сallStatic() wrapped to Closure::fromCallable() [Zend/tests/bug81626.phpt]
Calling bindTo() on __call() closure [Zend/tests/closure_call_bind.phpt]
Bug #80929: Method name corruption related to zend_closure_call_magic [Zend/tests/closures/bug80929.phpt]
Testing Closure::fromCallable() functionality: Basic [Zend/tests/closures/closure_from_callable_basic.phpt]
Argument unpacking in new arguments in static variable [Zend/tests/constexpr/new_arg_unpack.phpt]
Invalid operation in new arg in const expr [Zend/tests/constexpr/new_invalid_operation_in_arg.phpt]
First Class Callable from Magic [Zend/tests/first_class_callable_005.phpt]
First class callables and &__call() [Zend/tests/first_class_callable_016.phpt]
Argument of new on class without constructor are evaluated [Zend/tests/new_args_without_ctor.phpt]
Trampoline closure created from magic method accepts named arguments [Zend/tests/trampoline_closure_named_arguments.phpt]
Bug #75969: Assertion failure in live range DCE due to block pass misoptimization [ext/opcache/tests/bug75969.phpt]
JIT FETCH_DIM_FUNC_ARG: 002 [ext/opcache/tests/jit/fetch_dim_func_arg_002.phpt]
Test type inference of class consts - reference by static and self [ext/opcache/tests/opt/type_inference_class_consts1.phpt]
Test type inference of class consts - reference by parent [ext/opcache/tests/opt/type_inference_class_consts2.phpt]
Test type inference of class consts - reference by class name [ext/opcache/tests/opt/type_inference_class_consts3.phpt]
Test type inference of class consts - interface consts [ext/opcache/tests/opt/type_inference_class_consts4.phpt]
Assign elision exception safety: ICALL [ext/opcache/tests/ssa_bug_004.phpt]
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.observe_all=1 -d zend_test.observer.show_output=0 -d opcache.jit=1254 --repeat 3  -j16 Zend tests ext/opcache ext/zend_test"

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Bug #26737 (Protected and private variables are not saved on serialization when a user defined __sleep is used) [tests/classes/bug26737.phpt]
Bug #79668 (get_defined_functions(true) may miss functions) [Zend/tests/bug79668.phpt]
Backed Enum with multiple implementing interfaces [Zend/tests/enum/backed-implements-multiple.phpt]
Backed Enum implements [Zend/tests/enum/backed-implements.phpt]
Enum implements [Zend/tests/enum/implements.phpt]
Enum __CLASS__ [Zend/tests/enum/__class__.phpt]
Enum __FUNCTION__ [Zend/tests/enum/__function__.phpt]
Enum __invoke [Zend/tests/enum/__invoke.phpt]
Enum __METHOD__ [Zend/tests/enum/__method__.phpt]
Enum namespace [Zend/tests/enum/namespaces.phpt]
Enum methods [Zend/tests/enum/basic-methods.phpt]
Enum supports static methods [Zend/tests/enum/static-methods.phpt]
Enum can use traits having constants [Zend/tests/enum/traits-constants.phpt]
Enum can use traits [Zend/tests/enum/traits.phpt]
GH-13712 (Segmentation fault for enabled observers when calling trait method of internal trait when opcache is loaded) [ext/opcache/tests/gh13712.phpt]
Observer: Basic fiber switching [ext/zend_test/tests/observer_fiber_01.phpt]
Observer: Unfinished fiber [ext/zend_test/tests/observer_fiber_02.phpt]
Observer: Nested fibers [ext/zend_test/tests/observer_fiber_03.phpt]
Observer: Nested fibers with unfinished fiber [ext/zend_test/tests/observer_fiber_04.phpt]
Observer: Nested fibers with both unfinished [ext/zend_test/tests/observer_fiber_05.phpt]
Observer: Throwing fiber [ext/zend_test/tests/observer_fiber_06.phpt]
Observer: Basic observability of includes only (no functions) [ext/zend_test/tests/observer_basic_04.phpt]
Observer: Frameless calls are properly observable [ext/zend_test/tests/observer_frameless.phpt]
Observer: Basic observability of functions only (no includes) [ext/zend_test/tests/observer_basic_05.phpt]
Observer: Basic observability of functions only (with run-time swapping) [ext/zend_test/tests/observer_basic_06.phpt]
Observer: Observe function and class declarations with file_cache_only [ext/zend_test/tests/observer_declarations_file_cache.phpt]
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -d opcache.jit=1254 --repeat 3  -j16 Zend tests ext/opcache ext/zend_test"

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Bug #26737 (Protected and private variables are not saved on serialization when a user defined __sleep is used) [tests/classes/bug26737.phpt]
Bug #79668 (get_defined_functions(true) may miss functions) [Zend/tests/bug79668.phpt]
Backed Enum with multiple implementing interfaces [Zend/tests/enum/backed-implements-multiple.phpt]
Backed Enum implements [Zend/tests/enum/backed-implements.phpt]
Enum implements [Zend/tests/enum/implements.phpt]
Enum __CLASS__ [Zend/tests/enum/__class__.phpt]
Enum __FUNCTION__ [Zend/tests/enum/__function__.phpt]
Enum __invoke [Zend/tests/enum/__invoke.phpt]
Enum __METHOD__ [Zend/tests/enum/__method__.phpt]
Enum methods [Zend/tests/enum/basic-methods.phpt]
Enum namespace [Zend/tests/enum/namespaces.phpt]
Enum supports static methods [Zend/tests/enum/static-methods.phpt]
Enum can use traits having constants [Zend/tests/enum/traits-constants.phpt]
Enum can use traits [Zend/tests/enum/traits.phpt]
=====================================================================

All tests passes if JIT or OBSERVER disabled.

Please, fix the tests failures first...

@bwoebi
Copy link
Member Author

bwoebi commented Apr 9, 2024

Ignore this comment, apparently the order on ir_PHI_2() is relevant.

Changing

run_time_cache = ir_PHI_2(IR_ADDR, run_time_cache2, run_time_cache);

to

run_time_cache = ir_PHI_2(IR_ADDR, run_time_cache, run_time_cache2);

fixes it. I was under the assumption this would be irrelevant? What does the order depend on or is it really an IR bug?


@dstogov I've tried fixing this (see latest push) but the IR library seems to miscompile it. Tried distilling it into a dedicated IR test, but couldn't reproduce it trivially.

Either codegen or the step before is wrong (I believe the inferred DESSA instruction is wrong).

   0x0000555551401c17:  mov    0x18(%r15),%rax
   0x0000555551401c1b:  mov    0x38(%rax),%rax
   0x0000555551401c1f:  test   $0x1,%rax
   0x0000555551401c26:  je     0x555551401d27
   0x0000555551401c2c:  lea    0x5c6f7f5(%rip),%rcx        # 0x555557071428 <compiler_globals+488>
   0x0000555551401c33:  mov    (%rcx),%rcx
   0x0000555551401c36:  mov    (%rcx,%rax,1),%rbx
   0x0000555551401c3a:  mov    %rax,%rbx
   0x0000555551401c3d:  cmpq   $0x3,(%rbx)
   0x0000555551401c41:  je     0x555551401c4e
   0x0000555551401c43:  mov    %r15,%rdi
   0x0000555551401c46:  mov    %rbx,%rsi
   0x0000555551401c49:  call   0x555555ec8563 <zend_observer_fcall_begin_prechecked>
   0x0000555551401c4e:  mov    %rsp,%rbx
...
   0x0000555551401d27:  test   %rax,%rax
   0x0000555551401d2a:  je     0x555551401c4e
   0x0000555551401d30:  jmp    0x555551401c3d

It compiles an erroneous 0x0000555551401c3a: mov %rax,%rbx when resolving PHIs.

From the after register allocation output:

#BB32: end=l_160, idom=BB5(3), pred(1)=[BB5], succ(2)=[BB33, BB34]
        l_156 = IF_FALSE(l_34); # RULE(NOP:SKIPPED);
        l_157 = STORE(l_156, c_21 {%rax}, d_13 {%r15}); # RULE(STORE_INT);
        uint32_t d_158, l_158 = LOAD(l_157, d_20); # RULE(LOAD:FUSED:SIMPLE);
        uint32_t d_159 = AND(d_158, c_32); # RULE(TEST_INT:FUSED);
        l_160 = IF(l_158, d_159); # RULE(TEST_AND_BRANCH_INT);
#BB33: end=l_162, idom=BB32(4), pred(1)=[BB32], succ(1)=[BB42]
        l_161 = IF_TRUE(l_160); # RULE(NOP:SKIPPED);
        l_162 = END(l_161); # RULE(END);
#BB34: end=l_168, idom=BB32(4), pred(1)=[BB32], succ(2)=[BB35, BB36]
        l_163 = IF_FALSE(l_160); # RULE(NOP:SKIPPED);
        uintptr_t d_164 {R18} {%rax}, l_164 = LOAD(l_163, d_18); # RULE(LOAD_INT);
        uintptr_t d_165 = ADD(d_164 {R18} {%rax}, c_17); # RULE(LEA_OB:FUSED:SIMPLE);
        uintptr_t d_166 {R19} {%rax}, l_166 = LOAD(l_164, d_165); # RULE(LOAD_INT);
        uintptr_t d_167 = AND(d_166 {R19} {%rax}, c_18); # RULE(TEST_INT:FUSED);
        l_168 = IF(l_166, d_167); # RULE(TEST_AND_BRANCH_INT);
#BB35: end=l_173, idom=BB34(5), pred(1)=[BB34], succ(1)=[BB39]
        l_169 = IF_TRUE(l_168); # RULE(NOP:SKIPPED);
        uintptr_t d_170 {R20} {%rcx}, l_170 = LOAD(l_169, c_19 {%rcx}); # RULE(LOAD_INT);
        uintptr_t d_171 = ADD(d_170 {R20} {%rcx}, d_166 {R19} {%rax}); # RULE(LEA_IB:FUSED:SIMPLE);
        uintptr_t d_172 {R21} {%rbx}, l_172 = LOAD(l_170, d_171); # RULE(LOAD_INT);
        l_173 = END(l_172); # RULE(END);
        # DESSA MOV d_166 {R19} {%rax} -> d_182 {R21} {%rbx}
#BB36: end=l_176, idom=BB34(5), pred(1)=[BB34], succ(2)=[BB37, BB38]
        l_174 = IF_FALSE(l_168); # RULE(NOP:SKIPPED);
        bool d_175 = EQ(d_166 {R19} {%rax}, c_1); # RULE(CMP_INT:FUSED);
        l_176 = IF(l_174, d_175); # RULE(CMP_AND_BRANCH_INT);
#BB37: end=l_178, idom=BB36(6), pred(1)=[BB36], succ(1)=[BB42]
        l_177 = IF_TRUE(l_176); # RULE(NOP:SKIPPED);
        l_178 = END(l_177); # RULE(END);
#BB38: end=l_180, idom=BB36(6), pred(1)=[BB36], succ(1)=[BB39]
        l_179 = IF_FALSE(l_176); # RULE(NOP:SKIPPED);
        l_180 = END(l_179); # RULE(END);
#BB39: end=l_185, idom=BB34(5), pred(2)=[BB38, BB35], succ(2)=[BB40, BB41]
        l_181 = MERGE(l_180, l_173); # RULE(NOP:SKIPPED);
        uintptr_t d_182 {R21} {%rbx} = PHI(l_181, d_172 {R21} {%rbx}, d_166 {R19} {%rax}); # RULE(PHI);
        uintptr_t d_183, l_183 = LOAD(l_181, d_182 {R21} {%rbx}); # RULE(LOAD:FUSED:SIMPLE);
        bool d_184 = EQ(d_183, c_33); # RULE(CMP_INT:FUSED);
        l_185 = IF(l_183, d_184); # RULE(CMP_AND_BRANCH_INT);
#BB40: end=l_189, idom=BB39(6), pred(1)=[BB39], succ(1)=[BB42]
        l_186 = IF_FALSE(l_185); # RULE(NOP:SKIPPED);
        l_187 = CALL/2(l_186, c_34, d_13 {%r15}, d_182 {R21} {%rbx}); # RULE(CALL);
        l_189 = END(l_187); # RULE(END);
#BB41: end=l_191, idom=BB39(6), pred(1)=[BB39], succ(1)=[BB42]
        l_190 = IF_TRUE(l_185); # RULE(NOP:SKIPPED);
        l_191 = END(l_190); # RULE(END);
#BB42: end=l_203, idom=BB32(4), pred(4)=[BB33, BB37, BB40, BB41], succ(2)=[BB43, BB44]

I see in BB35 a # DESSA MOV d_166 {R19} {%rax} -> d_182 {R21} {%rbx} which is probably the attempt to resolve this PHI, but doesn't work correctly:

run_time_cache = ir_PHI_2(IR_ADDR, run_time_cache2, run_time_cache);

To reproduce: sapi/cli/php run-tests.php -P -q -d zend_extension=$(pwd)/modules/opcache.so -d opcache.enable_cli=1 -d opcache.jit=1205 -d opcache.protect_memory=1 -d opcache.jit_buffer_size=64M -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 --repeat 2 -j16 --show-diff Zend/tests/anon/005.phpt (dump the instructions surrounding $rip at the crash location)

@bwoebi
Copy link
Member Author

bwoebi commented Apr 9, 2024

@dstogov Thanks for the hint about the different configurations, zif_pass handling was missing (i.e. internal functions without run_time_cache, which shall not be observed anyway). Fixed, works for me in the various configurations now.

@dstogov
Copy link
Member

dstogov commented Apr 15, 2024

fixes it. I was under the assumption this would be irrelevant? What does the order depend on or is it really an IR bug?

The order of PHI inputs is important and should be consistent with the order of MERGE or LOOP_BEGIN inputs.
I didn't get how to reproduce the failure with DESSA mis-compilation.

@dstogov
Copy link
Member

dstogov commented Apr 15, 2024

I sill see few tests failures:

make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -d opcache.jit=1254 --repeat 3 -j16 Zend tests ext/opcache ext/zend_test"

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Bug #26737 (Protected and private variables are not saved on serialization when a user defined __sleep is used) [tests/classes/bug26737.phpt]
Bug #79668 (get_defined_functions(true) may miss functions) [Zend/tests/bug79668.phpt]
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -d opcache.jit=1205 -j16 Zend tests ext/opcache ext/zend_test"
=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Assign elision exception safety: ICALL [ext/opcache/tests/ssa_bug_004.phpt]
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.observe_all=1 -d zend_test.observer.show_output=0 -d opcache.jit=1205 -j16 Zend tests ext/opcache ext/zend_test"
=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
property_exists() tests [Zend/tests/011.phpt]
Assign elision exception safety: ICALL [ext/opcache/tests/ssa_bug_004.phpt]
=====================================================================

jit->track_last_valid_opline = track_last_valid_opline;
jit->use_last_valid_opline = use_last_valid_opline;
jit->last_valid_opline = last_valid_opline;
jit->reuse_ip = reuse_ip;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why do you save and restore these variables.
You probably need to reset jit->last_valid_opline to make JIT reload IP when necessary.
reuse_ip should be set if %r15 contains value of EX(call) and reset if it contains the VM opline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar problem to zend_jit_save_call_chain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. If you keep the old value of jit->last_valid_opline, it's going to be invalid for one of the taken paths. It has to be reset for both paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we have:

// Note: conditional (if_unobserved) opline, zend_jit_set_last_valid_opline() may only be called if the opline is actually unconditionally updated
jit_STORE_IP(jit, ir_CONST_ADDR(jit->last_valid_opline ? jit->last_valid_opline : opline));

So that it's actually pointing to the same opline for both paths, avoiding any burden on the hot (unobserved) path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Now I understood this.
It would be definitely better to keep the related logic together and not spread it around functions.

Comment on lines 16869 to 16876
ir_ref res_ref = jit_ZVAL_ADDR(jit, res_addr);
jit_set_Z_TYPE_INFO(jit, res_addr, IS_NULL);

ir_ref skip_observer = IR_UNUSED;
if (ZEND_OBSERVER_ENABLED) {
skip_observer = jit_frameless_observer(jit, checked_stack, opline, IR_UNUSED, 0, IR_UNUSED, 0, IR_UNUSED, 0, res_ref);
}
ir_CALL_1(IR_VOID, ir_CONST_ADDR((size_t)function), res_ref);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iluuu1994 do frameless functions always have the result?
I remember ICALL might use stack allocated zval for return value.

 cat x.php 
<?php
strpos($x, $y);
?>
$ sapi/cli/php -d opcache.opt_debug_level=0x30000  x.php

$_main:
     ; (lines=3, args=0, vars=2, tmps=1)
     ; (before optimizer)
     ; /home/dmitry/php/php-master/CGI-RELEASE-64/x.php:1-3
     ; return  [] RANGE[0..0]
0000 T2 = FRAMELESS_ICALL_2(strpos) CV0($x) CV1($y)
0001 FREE T2
0002 RETURN int(1)

$_main:
     ; (lines=2, args=0, vars=2, tmps=1)
     ; (after optimizer)
     ; /home/dmitry/php/php-master/CGI-RELEASE-64/x.php:1-3
0000 T2 = FRAMELESS_ICALL_2(strpos) CV0($x) CV1($y)
0001 RETURN int(1)

Is it OK to remove FREE?

Comment on lines 16800 to 16846
// Make call frame externally visible
ir_ref rx = jit_IP(jit);
ir_STORE(jit_CALL(rx, prev_execute_data), jit_FP(jit));
ir_STORE(jit_EG(current_execute_data), rx);

jit_observer_fcall_begin(jit, rx, observer_handler);

// JIT: fbc->internal_function.handler(new_frame, return_value)
ir_CALL_2(IR_VOID, ir_CONST_FC_FUNC(fbc->internal_function.handler), rx, res_ref);

jit_observer_fcall_end(jit, rx, res_ref);

ir_STORE(jit_EG(current_execute_data), ir_LOAD_A(jit_CALL(rx, prev_execute_data)));

for (int i = 0; i < call_num_args; ++i) {
if (zend_jit_needs_arg_dtor(fbc, i, NULL)) {
uint32_t offset = EX_NUM_TO_VAR(i);
zend_jit_addr var_addr = ZEND_ADDR_MEM_ZVAL(ZREG_RX, offset);

jit_ZVAL_PTR_DTOR(jit, var_addr, MAY_BE_ANY|MAY_BE_RC1|MAY_BE_RCN, 0, opline);
}
}

ir_ref allocated_path = IR_UNUSED;

// free the call frame
if (JIT_G(trigger) != ZEND_JIT_ON_HOT_TRACE ||
!JIT_G(current_frame) ||
!JIT_G(current_frame)->call ||
!TRACE_FRAME_IS_NESTED(JIT_G(current_frame)->call)) {
// JIT: zend_vm_stack_free_call_frame(call);
// JIT: if (UNEXPECTED(ZEND_CALL_INFO(call) & ZEND_CALL_ALLOCATED))
ir_ref if_allocated = ir_IF(ir_AND_U8(
ir_LOAD_U8(ir_ADD_OFFSET(rx, offsetof(zend_execute_data, This.u1.type_info) + 2)),
ir_CONST_U8(ZEND_CALL_ALLOCATED >> 16)));
ir_IF_TRUE_cold(if_allocated);

ir_CALL_1(IR_VOID, ir_CONST_FC_FUNC(zend_jit_free_call_frame), rx);

allocated_path = ir_END();
ir_IF_FALSE(if_allocated);
}
ir_STORE(jit_EG(vm_stack_top), rx);

if (allocated_path) {
ir_MERGE_WITH(allocated_path);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use zend_jit_do_fcall(), to minimize the JIT code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicated version doesn't do exactly the same as VM. At least, it misses support for zend_execute_internal().
It's going to be very difficult to maintain this duplicated code.

Copy link
Member Author

@bwoebi bwoebi Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zend_execute_internal prevents frameless alltogether, so we don't need that check here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In VM you falback FRAMELESS_CALL to DO_ICALL so zend_execute_internal() is going to be called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot really use the whole zend_jit_do_fcall(), but I can extract the icall executing part because it's identical (i.e. basically these lines which you highlighted here). Good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this in a separate patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, first merge this one, then improve upon this in a later PR? okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants