New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize observers #13649
base: master
Are you sure you want to change the base?
Optimize observers #13649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand what the benefit of moving result
to OP_DATA
is. Can you elaborate on this point?
I also didn't fully understand why ZEND_OBSERVER_NONE_OBSERVED
is needed. From what I understand, zend_observer_remove_handler
moves the tail to -1, so if the first handler is unset, no handler should be installed. Tbh the observer.{h,c} code is quite hard to read, but I didn't spend too much time on it.
Did you run some benchmarks? Can you notice a difference for frameless functions, in particular?
return *handler == ZEND_OBSERVER_NONE_OBSERVED; | ||
} | ||
|
||
static zend_always_inline bool zend_observer_fcall_has_no_observers(zend_execute_data *execute_data, bool allow_generator, zend_observer_fcall_begin_handler **handler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Avoid double negation? E.g. !zend_observer_fcall_has_no_observers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming problem here is that it's not a binary state, but "no observers", "some observers" and "observers undetermined", and I want to distinguish "no observers" vs "some or undetermined observers". I'm not sure how to better name that.
ZEND_API void ZEND_FASTCALL zend_observer_fcall_begin(zend_execute_data *execute_data); | ||
void ZEND_FASTCALL zend_observer_fcall_begin_prechecked(zend_execute_data *execute_data, zend_observer_fcall_begin_handler *observer_data); | ||
|
||
static zend_always_inline bool zend_observer_handler_is_unobserved(zend_observer_fcall_begin_handler *handler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, avoid double negation?
Hm, have I been unclear in the PR description? "it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode" - DO_ICALL cannot jump an extra opline (skipping OP_DATA that is) at its end, and accesses result on its opline.
Simply an optimization to avoid reading both start and end handlers when none is set, i.e. saves a read in the hot path of checking whether a function is being observed. |
Regarding rough benchmarking numbers, without debug, without opcache (wall time; v: valgrind instructions), with this script: Baseline of empty loop: 1 sec (v: 1,007,435,926) (Using 20000000 iterations for valgrind)
master, without observers: 2.8 sec (v: 2,727,435,251) So, yes, the frameless functions are much faster to call. Now, the same code with pow instead of min (which is not using FLF currently): master, without observers: 4.9 sec (v: 4,687,435,657) The generic improvement of inlining the runtime cache check (and just checking one value instead of two) is quite significant on its own already, but using frameless calls just completely blows it out of the water. I'm also amazed that apparently for frameless calls it uses only 0.1 sec (3% of time) more even though it needs 10% more valgrind instructions. |
0dd6584
to
ee5d6ed
Compare
I missed this point. Unfortunately, I don't have a better idea without significant complications that would nullify the performance benefit, or duplicate the DO_ICALL handler.
Thanks for the explanation. 🙂 The naming could probably be improved could be improved. Or at least add a short comment to |
dfbe08a
to
72d298a
Compare
#define ZEND_OBSERVABLE_FN(function) \ | ||
(ZEND_MAP_PTR(function->common.run_time_cache) && !(function->common.fn_flags & ZEND_ACC_CALL_VIA_TRAMPOLINE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this check effectively move to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to zend_observer_fcall_has_no_observers in zend_observer.h
2b3c7ec
to
f623714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this should be committed. I don't like the fact that the engine started to depend on observer API more and more.
I didn't review the zend_jit_ir.c
part yet. Can it be implemented using JIT helpers (C function) to make the most dangerous part of the patch less? I don't think bloating of that JIT code may make any positive effect. I expect this patch should slow-down the real life apps.
As an optimization patch it requires benchmark results to prove its profitability.
@iluuu1994 could you please also take a look.
if (ZEND_OBSERVER_ENABLED || zend_execute_internal) { | ||
if (zend_execute_internal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had agreed to fallback using ICALL instead of FRAMELESS in case of OBSERVER.
The actual speed-up of FRAMELESS calls wasn't very significant and the increase of the complexity already made questions. I expect, any observer makes much more significant slowdown and I don't see reasons in attempt
to win 0.1% of performance when we already losing 10%.
This patch looks much more complex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done the proper benchmarking homework now - frameless makes about 1% in wall time for wordpress, with observers.
And yes, we were losing 5% (wall time). Now we're losing 2.5%.
I would agree on 0.1% speedup not making sense for 10% loss. But that's not the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remember frameless calls gave ~0.8% win on symfony demo amd wordpress (let it be 1%).
How the observer might lose 5% in the back direction? It should be the same 1%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.
I was just saying that:
I don't see reasons in attempt to win 0.1% of performance when we already losing 10%.
is exaggerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you misunderstand. When enabling a single observer at all have a 5% overall perf loss on the wordpress bench on current master. With this patch, without frameless support, we have a ~3.5% loss. With this patch and frameless support we have ~2.5% loss.
I.e. we do actually gain the same 1% from frameless.
OK. This sounds more fair. Framiteless functions give 1% speedup, but only without observer. This proposal attempts to give the same 1% speedup when observed.
I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED
+ zend_observer_handler_is_unobserved
).
Does this patch makes real improvement without JIT? (Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%).
I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?
I'll try to analyse your benchmark results more careful...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your valgrind numbers show 5% on wordpress, and I can't explain this, because frameless function themselves gave just 1%.
The instr difference on this patch (with frameless) vs current master is 6-7%. The perf difference is 2.5%.
The instr difference on this patch with frameless vs this patch without frameless is 2%. The perf difference is 1%.
For both JIT and non-JIT.
I'm not completely sure what you are doing with JIT. Do you generate code for both frameless and ICALL? Or do you generate the code only for one of them depending on observability at the moment of compilation?
The generated code is essentially if (needs observing) { push frame (without arg copy); call FLF handler; pop frame } else { call FLF handler; }
. So it's not quite falling back to ICALL in JIT.
I don't think it's possible to give the same speedup, because your handler add overhead checking for observability (ZEND_OBSERVER_ENABLED + zend_observer_handler_is_unobserved).
The ZEND_OBSERVER_ENABLED part is specialized away. Only zend_observer_handler_is_unobserved is checked, and that one is inlined - i.e. the full check for FLF functions is just if (*ZEND_OBSERVER_DATA(ZEND_FLF_FUNC(opline)) == ZEND_OBSERVER_NONE_OBSERVED)
.
which is a very low-cost check, comparatively to ICALL. We can assume the run time cache for FLF functions being set and the func not be a generator or trampoline, saving a fetch and two comparisons too.
Yes, if we had the same handling than we had for FCALL, then the speedup would not quite be the same.
case ZEND_FRAMELESS_ICALL_3: | ||
// Copy result to OP_DATA to ensure dispatch in observer fallback can be aligned with the last opcode of the call | ||
opline[1].result.var = opline->result.var; | ||
opline[1].result_type = opline->result_type; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why this is necessary. For me this looks like a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the fallback to ZEND_DO_ICALL (when we VM_DISPATCH) to work, the opcode after opline
must be the next actual opline after and not OP_DATA. Hence ZEND_DO_ICALL must be dispatched on the OP_DATA opcode for ZEND_FRAMELESS_ICALL_3. And thus result must be on the OP_DATA opcode as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to avoid this hack.
It may confuse some of Optimizer pass (e.g. optimize_temp_vars_5.c) that don't expect repeatable definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why this hack is removed in undo_pass_two. I.e. for optimizer purposes it doesn't know about that hack. But it works fine with VM :-)
ZVAL_NULL(EX_VAR(opline->result.var)); | ||
zval *result = EX_VAR(opline->result.var); | ||
ZVAL_NULL(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this really change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing, in ZEND_FRAMELESS_ICALL_0 (actually the line below should also use result
). For the other frameless opcodes, it was about not refetching from opline->result after the unobserved check.
if (ZEND_OBSERVER_ENABLED) { | ||
zend_function *fbc = ZEND_FLF_FUNC(opline); | ||
if (UNEXPECTED(zend_observer_handler_is_unobserved(ZEND_OBSERVER_DATA(fbc)) == false)) { | ||
zend_execute_data *call = _zend_vm_stack_push_call_frame_ex(zend_vm_calc_used_stack(0, fbc), ZEND_CALL_NESTED_FUNCTION, fbc, 0, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we avoid call to zend_vm_calc_used_stack(0, fbc)
. The size of the stack should be known statically.
May be I'm wrong... I don't remember when we got zend_internal_function->T
and what does it mean...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the temporary the observers use to store the prev_observed_frame.
For FLF functions we can probably hardcode it to (ZEND_CALL_FRAME_SLOT + X + 1) * sizeof(zval)
, at least for now given that we don't have a mechanism to dynamically allocate temps for arbitrary functions.
This is outside the hot path of not-observed functions, so I don't consider this as critical, but can change it.
if (EG(exception)) { | ||
if (UNEXPECTED(EG(exception) != NULL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your patch is already huge enough. Mixing the real changes with unrelated code refactoring is annoying.
Some valgrind benchmark.php numbers: This PR (without observers, virtually the same then baseline without observers):
This PR with
Note that for Zend/bench.php, it adds 155M instrs to have observer enabled at all. For the JIT variant it adds only 48M instrs. So having extra JIT support halves the overhead for symfony. Having frameless+observer support in JIT has virtually no improvement for symfony. Having frameless+observer outside of JIT shows about 0.8% improvement on the Symfony Demo (104191895 instructions when frameless is disabled). For comparison the baseline numbers, with observer enabled:
I.e. compared to baseline Symfony is 3% faster. JIT is 1.5% faster. The relative overhead of observer itself on bench.php JIT is only 5M (6%) instrs instead of 20M (25%) instrs now. Overall it feels worth the bit of extra complexity - bench.php gets significant improvement, wordpress has also quite significant improvement (a 10% overhead is more than halved); Symfony gets a moderate improvement. In terms of wall time it was like nearly ~1% improvement for me on Symfony JIT. Wordpress JIT got a solid 2% faster in wall time (and up to 10% in cpu time according to |
07ad243
to
343d450
Compare
389a537
to
a336c47
Compare
a336c47
to
afa2b55
Compare
@dstogov Looks like all tests pass now (minus flaky ones). |
Inline the lookup whether a function is observed at all. This strategy is also used for FRAMELESS calls. If the frameless call is observed, we instead allocate a call frame and push the arguments, to call the the function afterwards. Doing so is still a performance benefit as opposed to executing individual INIT_FCALL+SEND_VAL ops. Thus, even if the frameless call turns out to be observed, the call overhead is slightly lower than before. If the internal function is not observed at all, the unavoidable overhead is fetching the FLF zend_function pointer and the run-time cache needs to be inspected. As part of this work, it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode. This is a bit unusual in comparison to all other ZEND_OP_DATA usages, but seems to not pose problems overall. There is also a small issue resolved: trampolines would always use the ZEND_CALL_TRAMPOLINE_SPEC_OBSERVER function due to zend_observer_fcall_op_array_extension being set to -1 too late. Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
afa2b55
to
9d81640
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make the code-review. I just run tests in different configurations...
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.observe_all=1 -d zend_test.observer.show_output=0 -d opcache.jit=1205 -j16 Zend tests ext/opcache ext/zend_test"
=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
property_exists() tests [Zend/tests/011.phpt]
testing reusing anons that implement an interface [Zend/tests/anon/005.phpt]
Bug #81626: Error on use static:: in __сallStatic() wrapped to Closure::fromCallable() [Zend/tests/bug81626.phpt]
Calling bindTo() on __call() closure [Zend/tests/closure_call_bind.phpt]
Bug #80929: Method name corruption related to zend_closure_call_magic [Zend/tests/closures/bug80929.phpt]
Testing Closure::fromCallable() functionality: Basic [Zend/tests/closures/closure_from_callable_basic.phpt]
Argument unpacking in new arguments in static variable [Zend/tests/constexpr/new_arg_unpack.phpt]
Invalid operation in new arg in const expr [Zend/tests/constexpr/new_invalid_operation_in_arg.phpt]
First Class Callable from Magic [Zend/tests/first_class_callable_005.phpt]
First class callables and &__call() [Zend/tests/first_class_callable_016.phpt]
Argument of new on class without constructor are evaluated [Zend/tests/new_args_without_ctor.phpt]
Trampoline closure created from magic method accepts named arguments [Zend/tests/trampoline_closure_named_arguments.phpt]
Bug #75969: Assertion failure in live range DCE due to block pass misoptimization [ext/opcache/tests/bug75969.phpt]
JIT FETCH_DIM_FUNC_ARG: 002 [ext/opcache/tests/jit/fetch_dim_func_arg_002.phpt]
...
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -d opcache.jit=1205 -j16 Zend tests ext/opcache ext/zend_test"
=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
testing reusing anons that implement an interface [Zend/tests/anon/005.phpt]
Bug #81626: Error on use static:: in __сallStatic() wrapped to Closure::fromCallable() [Zend/tests/bug81626.phpt]
Calling bindTo() on __call() closure [Zend/tests/closure_call_bind.phpt]
Bug #80929: Method name corruption related to zend_closure_call_magic [Zend/tests/closures/bug80929.phpt]
Testing Closure::fromCallable() functionality: Basic [Zend/tests/closures/closure_from_callable_basic.phpt]
Argument unpacking in new arguments in static variable [Zend/tests/constexpr/new_arg_unpack.phpt]
Invalid operation in new arg in const expr [Zend/tests/constexpr/new_invalid_operation_in_arg.phpt]
First Class Callable from Magic [Zend/tests/first_class_callable_005.phpt]
First class callables and &__call() [Zend/tests/first_class_callable_016.phpt]
Argument of new on class without constructor are evaluated [Zend/tests/new_args_without_ctor.phpt]
Trampoline closure created from magic method accepts named arguments [Zend/tests/trampoline_closure_named_arguments.phpt]
Bug #75969: Assertion failure in live range DCE due to block pass misoptimization [ext/opcache/tests/bug75969.phpt]
JIT FETCH_DIM_FUNC_ARG: 002 [ext/opcache/tests/jit/fetch_dim_func_arg_002.phpt]
Test type inference of class consts - reference by static and self [ext/opcache/tests/opt/type_inference_class_consts1.phpt]
Test type inference of class consts - reference by parent [ext/opcache/tests/opt/type_inference_class_consts2.phpt]
Test type inference of class consts - reference by class name [ext/opcache/tests/opt/type_inference_class_consts3.phpt]
Test type inference of class consts - interface consts [ext/opcache/tests/opt/type_inference_class_consts4.phpt]
Assign elision exception safety: ICALL [ext/opcache/tests/ssa_bug_004.phpt]
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.observe_all=1 -d zend_test.observer.show_output=0 -d opcache.jit=1254 --repeat 3 -j16 Zend tests ext/opcache ext/zend_test"
=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Bug #26737 (Protected and private variables are not saved on serialization when a user defined __sleep is used) [tests/classes/bug26737.phpt]
Bug #79668 (get_defined_functions(true) may miss functions) [Zend/tests/bug79668.phpt]
Backed Enum with multiple implementing interfaces [Zend/tests/enum/backed-implements-multiple.phpt]
Backed Enum implements [Zend/tests/enum/backed-implements.phpt]
Enum implements [Zend/tests/enum/implements.phpt]
Enum __CLASS__ [Zend/tests/enum/__class__.phpt]
Enum __FUNCTION__ [Zend/tests/enum/__function__.phpt]
Enum __invoke [Zend/tests/enum/__invoke.phpt]
Enum __METHOD__ [Zend/tests/enum/__method__.phpt]
Enum namespace [Zend/tests/enum/namespaces.phpt]
Enum methods [Zend/tests/enum/basic-methods.phpt]
Enum supports static methods [Zend/tests/enum/static-methods.phpt]
Enum can use traits having constants [Zend/tests/enum/traits-constants.phpt]
Enum can use traits [Zend/tests/enum/traits.phpt]
GH-13712 (Segmentation fault for enabled observers when calling trait method of internal trait when opcache is loaded) [ext/opcache/tests/gh13712.phpt]
Observer: Basic fiber switching [ext/zend_test/tests/observer_fiber_01.phpt]
Observer: Unfinished fiber [ext/zend_test/tests/observer_fiber_02.phpt]
Observer: Nested fibers [ext/zend_test/tests/observer_fiber_03.phpt]
Observer: Nested fibers with unfinished fiber [ext/zend_test/tests/observer_fiber_04.phpt]
Observer: Nested fibers with both unfinished [ext/zend_test/tests/observer_fiber_05.phpt]
Observer: Throwing fiber [ext/zend_test/tests/observer_fiber_06.phpt]
Observer: Basic observability of includes only (no functions) [ext/zend_test/tests/observer_basic_04.phpt]
Observer: Frameless calls are properly observable [ext/zend_test/tests/observer_frameless.phpt]
Observer: Basic observability of functions only (no includes) [ext/zend_test/tests/observer_basic_05.phpt]
Observer: Basic observability of functions only (with run-time swapping) [ext/zend_test/tests/observer_basic_06.phpt]
Observer: Observe function and class declarations with file_cache_only [ext/zend_test/tests/observer_declarations_file_cache.phpt]
=====================================================================
make test TESTS="--asan -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -d opcache.jit=1254 --repeat 3 -j16 Zend tests ext/opcache ext/zend_test"
=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
Bug #26737 (Protected and private variables are not saved on serialization when a user defined __sleep is used) [tests/classes/bug26737.phpt]
Bug #79668 (get_defined_functions(true) may miss functions) [Zend/tests/bug79668.phpt]
Backed Enum with multiple implementing interfaces [Zend/tests/enum/backed-implements-multiple.phpt]
Backed Enum implements [Zend/tests/enum/backed-implements.phpt]
Enum implements [Zend/tests/enum/implements.phpt]
Enum __CLASS__ [Zend/tests/enum/__class__.phpt]
Enum __FUNCTION__ [Zend/tests/enum/__function__.phpt]
Enum __invoke [Zend/tests/enum/__invoke.phpt]
Enum __METHOD__ [Zend/tests/enum/__method__.phpt]
Enum methods [Zend/tests/enum/basic-methods.phpt]
Enum namespace [Zend/tests/enum/namespaces.phpt]
Enum supports static methods [Zend/tests/enum/static-methods.phpt]
Enum can use traits having constants [Zend/tests/enum/traits-constants.phpt]
Enum can use traits [Zend/tests/enum/traits.phpt]
=====================================================================
All tests passes if JIT or OBSERVER disabled.
Please, fix the tests failures first...
Ignore this comment, apparently the order on ir_PHI_2() is relevant. Changing
to
fixes it. I was under the assumption this would be irrelevant? What does the order depend on or is it really an IR bug? @dstogov I've tried fixing this (see latest push) but the IR library seems to miscompile it. Tried distilling it into a dedicated IR test, but couldn't reproduce it trivially. Either codegen or the step before is wrong (I believe the inferred DESSA instruction is wrong).
It compiles an erroneous From the after register allocation output:
I see in BB35 a php-src/ext/opcache/jit/zend_jit_ir.c Line 4519 in b2ade1c
To reproduce: |
@dstogov Thanks for the hint about the different configurations, zif_pass handling was missing (i.e. internal functions without run_time_cache, which shall not be observed anyway). Fixed, works for me in the various configurations now. |
The order of |
I sill see few tests failures:
|
jit->track_last_valid_opline = track_last_valid_opline; | ||
jit->use_last_valid_opline = use_last_valid_opline; | ||
jit->last_valid_opline = last_valid_opline; | ||
jit->reuse_ip = reuse_ip; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why do you save and restore these variables.
You probably need to reset jit->last_valid_opline
to make JIT reload IP when necessary.
reuse_ip
should be set if %r15 contains value of EX(call)
and reset if it contains the VM opline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar problem to zend_jit_save_call_chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work. If you keep the old value of jit->last_valid_opline
, it's going to be invalid for one of the taken paths. It has to be reset for both paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we have:
php-src/ext/opcache/jit/zend_jit_ir.c
Lines 16854 to 16855 in a3ec05c
// Note: conditional (if_unobserved) opline, zend_jit_set_last_valid_opline() may only be called if the opline is actually unconditionally updated | |
jit_STORE_IP(jit, ir_CONST_ADDR(jit->last_valid_opline ? jit->last_valid_opline : opline)); |
So that it's actually pointing to the same opline for both paths, avoiding any burden on the hot (unobserved) path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Now I understood this.
It would be definitely better to keep the related logic together and not spread it around functions.
ir_ref res_ref = jit_ZVAL_ADDR(jit, res_addr); | ||
jit_set_Z_TYPE_INFO(jit, res_addr, IS_NULL); | ||
|
||
ir_ref skip_observer = IR_UNUSED; | ||
if (ZEND_OBSERVER_ENABLED) { | ||
skip_observer = jit_frameless_observer(jit, checked_stack, opline, IR_UNUSED, 0, IR_UNUSED, 0, IR_UNUSED, 0, res_ref); | ||
} | ||
ir_CALL_1(IR_VOID, ir_CONST_ADDR((size_t)function), res_ref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iluuu1994 do frameless functions always have the result?
I remember ICALL might use stack allocated zval for return value.
cat x.php
<?php
strpos($x, $y);
?>
$ sapi/cli/php -d opcache.opt_debug_level=0x30000 x.php
$_main:
; (lines=3, args=0, vars=2, tmps=1)
; (before optimizer)
; /home/dmitry/php/php-master/CGI-RELEASE-64/x.php:1-3
; return [] RANGE[0..0]
0000 T2 = FRAMELESS_ICALL_2(strpos) CV0($x) CV1($y)
0001 FREE T2
0002 RETURN int(1)
$_main:
; (lines=2, args=0, vars=2, tmps=1)
; (after optimizer)
; /home/dmitry/php/php-master/CGI-RELEASE-64/x.php:1-3
0000 T2 = FRAMELESS_ICALL_2(strpos) CV0($x) CV1($y)
0001 RETURN int(1)
Is it OK to remove FREE
?
ext/opcache/jit/zend_jit_ir.c
Outdated
// Make call frame externally visible | ||
ir_ref rx = jit_IP(jit); | ||
ir_STORE(jit_CALL(rx, prev_execute_data), jit_FP(jit)); | ||
ir_STORE(jit_EG(current_execute_data), rx); | ||
|
||
jit_observer_fcall_begin(jit, rx, observer_handler); | ||
|
||
// JIT: fbc->internal_function.handler(new_frame, return_value) | ||
ir_CALL_2(IR_VOID, ir_CONST_FC_FUNC(fbc->internal_function.handler), rx, res_ref); | ||
|
||
jit_observer_fcall_end(jit, rx, res_ref); | ||
|
||
ir_STORE(jit_EG(current_execute_data), ir_LOAD_A(jit_CALL(rx, prev_execute_data))); | ||
|
||
for (int i = 0; i < call_num_args; ++i) { | ||
if (zend_jit_needs_arg_dtor(fbc, i, NULL)) { | ||
uint32_t offset = EX_NUM_TO_VAR(i); | ||
zend_jit_addr var_addr = ZEND_ADDR_MEM_ZVAL(ZREG_RX, offset); | ||
|
||
jit_ZVAL_PTR_DTOR(jit, var_addr, MAY_BE_ANY|MAY_BE_RC1|MAY_BE_RCN, 0, opline); | ||
} | ||
} | ||
|
||
ir_ref allocated_path = IR_UNUSED; | ||
|
||
// free the call frame | ||
if (JIT_G(trigger) != ZEND_JIT_ON_HOT_TRACE || | ||
!JIT_G(current_frame) || | ||
!JIT_G(current_frame)->call || | ||
!TRACE_FRAME_IS_NESTED(JIT_G(current_frame)->call)) { | ||
// JIT: zend_vm_stack_free_call_frame(call); | ||
// JIT: if (UNEXPECTED(ZEND_CALL_INFO(call) & ZEND_CALL_ALLOCATED)) | ||
ir_ref if_allocated = ir_IF(ir_AND_U8( | ||
ir_LOAD_U8(ir_ADD_OFFSET(rx, offsetof(zend_execute_data, This.u1.type_info) + 2)), | ||
ir_CONST_U8(ZEND_CALL_ALLOCATED >> 16))); | ||
ir_IF_TRUE_cold(if_allocated); | ||
|
||
ir_CALL_1(IR_VOID, ir_CONST_FC_FUNC(zend_jit_free_call_frame), rx); | ||
|
||
allocated_path = ir_END(); | ||
ir_IF_FALSE(if_allocated); | ||
} | ||
ir_STORE(jit_EG(vm_stack_top), rx); | ||
|
||
if (allocated_path) { | ||
ir_MERGE_WITH(allocated_path); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use zend_jit_do_fcall()
, to minimize the JIT code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicated version doesn't do exactly the same as VM. At least, it misses support for zend_execute_internal()
.
It's going to be very difficult to maintain this duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_execute_internal prevents frameless alltogether, so we don't need that check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In VM you falback FRAMELESS_CALL to DO_ICALL so zend_execute_internal() is going to be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot really use the whole zend_jit_do_fcall(), but I can extract the icall executing part because it's identical (i.e. basically these lines which you highlighted here). Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this in a separate patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, first merge this one, then improve upon this in a later PR? okay.
db0f653
to
3e9b45c
Compare
Inline the lookup whether a function is observed at all. This strategy is also used for FRAMELESS calls. If the frameless call is observed, we instead allocate a call frame and push the arguments, to call the the function afterwards. Doing so is still a performance benefit as opposed to executing individual INIT_FCALL+SEND_VAL ops. Thus, even if the frameless call turns out to be observed, the call overhead is slightly lower than before. If the internal function is not observed at all, the unavoidable overhead is fetching the FLF zend_function pointer and the run-time cache needs to be inspected.
As part of this work, it turned out to be most viable to put the result operand on the ZEND_OP_DATA instead of ZEND_FRAMELESS_ICALL_3, allowing seamless interoperability with the DO_ICALL opcode. This is a bit unusual in comparison to all other ZEND_OP_DATA usages, but seems to not pose problems overall.
There is also a small issue resolved: trampolines would always use the ZEND_CALL_TRAMPOLINE_SPEC_OBSERVER function due to zend_observer_fcall_op_array_extension being set to -1 too late.
The same inlining of operations has been applied to JIT as well.