-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Indirect call reduction for Jit code #9579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f2ef40a
to
59f0e9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is right. I'm not sure about implementation of IS_NEAR() macro.
ext/opcache/jit/zend_jit_x86.dasc
Outdated
@@ -150,6 +150,9 @@ static size_t tsrm_tls_offset; | |||
|
|||
#define IS_32BIT(addr) (((uintptr_t)(addr)) <= 0x7fffffff) | |||
|
|||
/* Call range is before or after 2GB */ | |||
#define IS_NEAR(end_addr, func) ((((uintptr_t)(end_addr)^(uintptr_t)(func)) >> 33) == 0) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this condition. Why shift to 33 bits?
I think it also should take care about dasm_buf
I would propose:
#define MAY_USE_32BIT_ADDR(addr) \
(IS_SIGNED_32BIT((char*)(addr) - (char*)dasm_buf) && \
IS_SIGNED_32BIT((char*)(addr) - (char*)dasm_end))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstogov Thanks for the proposal and we will use it. And we did make a big mistake here for right shifting 33 bits and we actually should right shift 31 bits, because what we are trying to check is that the highest 33 bits must all be zeros after XOR operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstogov Thanks for your proposal. How about this implementation? Bit operation is a little bit faster ( about 10 TPS in our workload ), though with less code readability.
#define IS_NEAR(end_addr, func) ((((uintptr_t)(end_addr)^(uintptr_t)(func)) >> 31) == 0)
|| if (IS_NEAR(dasm_buf, func) && IS_NEAR(dasm_end, func)) {
| call qword &func
|| }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your implementation guarantees that the top 33 bits are the same, but this check is not accurate enough.
0x80000000 - 0x7fff0000 = 0x10000 (valid 32-bit offset)
(0x80000000 ^ 0x7fff0000) >> 31 = 1 (your implementation thinks it's invalid)
It's possible to create a faster check, but this is not very critical operation.
I wouldn't use name "NEAR" to avoid mess with far/near addresses in 16-bit x86 terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault and thanks a lot. I updated it.
Changing indirect call to direct call for Jit code benefits the branch prediction, which gets 1% performance gain in our workload. Similarly, we change indirect jump to direct jump. Signed-off-by: Su, Tao <tao.su@intel.com> Signed-off-by: Wang, Xue <xue1.wang@intel.com>
59f0e9e
to
df19cca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see problems merging this into PHP-8.2 or even early branches
Merged as 52f4ed16. Thank you ! |
Changing indirect call to direct call for Jit code benefits the branch prediction, which gets 1% performance gain in our workload.
Similarly, we change indirect jump to direct jump.
Signed-off-by: Su, Tao tao.su@intel.com
Signed-off-by: Wang, Xue xue1.wang@intel.com