Skip to content

Indirect call reduction for Jit code #9579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

wxue1
Copy link
Contributor

@wxue1 wxue1 commented Sep 20, 2022

Changing indirect call to direct call for Jit code benefits the branch prediction, which gets 1% performance gain in our workload.
Similarly, we change indirect jump to direct jump.

Signed-off-by: Su, Tao tao.su@intel.com
Signed-off-by: Wang, Xue xue1.wang@intel.com

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is right. I'm not sure about implementation of IS_NEAR() macro.

@@ -150,6 +150,9 @@ static size_t tsrm_tls_offset;

#define IS_32BIT(addr) (((uintptr_t)(addr)) <= 0x7fffffff)

/* Call range is before or after 2GB */
#define IS_NEAR(end_addr, func) ((((uintptr_t)(end_addr)^(uintptr_t)(func)) >> 33) == 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this condition. Why shift to 33 bits?
I think it also should take care about dasm_buf
I would propose:

#define MAY_USE_32BIT_ADDR(addr) \
	(IS_SIGNED_32BIT((char*)(addr) - (char*)dasm_buf) && \
	IS_SIGNED_32BIT((char*)(addr) - (char*)dasm_end))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstogov Thanks for the proposal and we will use it. And we did make a big mistake here for right shifting 33 bits and we actually should right shift 31 bits, because what we are trying to check is that the highest 33 bits must all be zeros after XOR operation.

Copy link
Contributor Author

@wxue1 wxue1 Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstogov Thanks for your proposal. How about this implementation? Bit operation is a little bit faster ( about 10 TPS in our workload ), though with less code readability.

#define IS_NEAR(end_addr, func) ((((uintptr_t)(end_addr)^(uintptr_t)(func)) >> 31) == 0)

||		if (IS_NEAR(dasm_buf, func) && IS_NEAR(dasm_end, func)) {
|			call qword &func
||		}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementation guarantees that the top 33 bits are the same, but this check is not accurate enough.

0x80000000 - 0x7fff0000 =  0x10000 (valid 32-bit offset)
(0x80000000 ^ 0x7fff0000) >> 31 = 1 (your implementation thinks it's invalid)

It's possible to create a faster check, but this is not very critical operation.

I wouldn't use name "NEAR" to avoid mess with far/near addresses in 16-bit x86 terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault and thanks a lot. I updated it.

Changing indirect call to direct call for Jit code
benefits the branch prediction, which gets 1% performance
gain in our workload.
Similarly, we change indirect jump to direct jump.

Signed-off-by: Su, Tao <tao.su@intel.com>
Signed-off-by: Wang, Xue <xue1.wang@intel.com>
@wxue1 wxue1 force-pushed the indirect_call_reduction branch from 59f0e9e to df19cca Compare September 22, 2022 09:26
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see problems merging this into PHP-8.2 or even early branches

@devnexen
Copy link
Member

Merged as 52f4ed16. Thank you !

@devnexen devnexen closed this Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants