Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect script execution time limit when invoking shutdown functions #5543

Closed
wants to merge 2 commits into from

Conversation

alexdowad
Copy link
Contributor

A time limit can be set on PHP script execution via set_time_limit() (or .ini file). When the time limit is reached, the OS will notify PHP and a timed_out flag is set. While this flag is regularly checked when executing PHP code, once the end of the script is reached, it is not checked while invoking shutdown functions (registered via register_shutdown_function).

Of course, if the shutdown functions are implemented in PHP, then the script time limit will be checked while they are running and will take effect. But if the shutdown functions are built-in (implemented in C), it will not.

Since the shutdown functions are invoked through call_user_function_ex, add a check of the timed_out flag there. Then, the script time limit will be respected when entering each shutdown function. The fact still remains that if a shutdown function is built-in and runs for a long time, script execution will not time out until it finishes and the interpreter tries to invoke the next one.

Still, the behavior of scripts with execution time limits will be more consistent after this patch. To make the execution time-out feature work even more precisely, it would be necessary to scrutinize all the built-in functions and add checks of the timed_out flag in any which can run for a long time. That might not be worth the effort, though.

Zend/zend_execute_API.c Outdated Show resolved Hide resolved
@alexdowad alexdowad force-pushed the timeout-in-shutdown-functions branch 2 times, most recently from 392cb1d to 33d8ce6 Compare May 8, 2020 13:16
@alexdowad
Copy link
Contributor Author

Grr. There is what looks like a spurious failure in one of the test jobs on Windows (the other one passed). Ran the same test case more than 10,000 times on my Linux box and still couldn't reproduce the failure. (Though the test case is not skipped on Linux.)

Zend/zend_execute_API.c Outdated Show resolved Hide resolved
@alexdowad alexdowad force-pushed the timeout-in-shutdown-functions branch from 0df76cd to a24b789 Compare May 11, 2020 11:18
@nikic
Copy link
Member

nikic commented May 11, 2020

Diff for the AppVeyor failure was:

========DIFF========
004+ 
004- Shutdown
005+ Fatal error: Maximum execution time of 1 second exceeded in Unknown on line 0
========DONE========

So we have the time limit triggered twice. Something definitely not right there. The flag not getting reset somehow?

@alexdowad
Copy link
Contributor Author

Diff for the AppVeyor failure was:

========DIFF========
004+ 
004- Shutdown
005+ Fatal error: Maximum execution time of 1 second exceeded in Unknown on line 0
========DONE========

So we have the time limit triggered twice. Something definitely not right there. The flag not getting reset somehow?

I'm just studying this one.

@alexdowad
Copy link
Contributor Author

And we're green!

I adjusted the call to zend_timeout in zend_call_function so it is only triggered when invoking an internal function, not a user function.

@nikic
Copy link
Member

nikic commented May 11, 2020

The Windows situation here still looks fishy. Looking at the zend_timeout implementation:

ZEND_API ZEND_NORETURN void ZEND_FASTCALL zend_timeout(int dummy) /* {{{ */

It's quite different on Windows, and indeed doesn't seem to reset timed_out.

@alexdowad
Copy link
Contributor Author

The Windows situation here still looks fishy. Looking at the zend_timeout implementation:

ZEND_API ZEND_NORETURN void ZEND_FASTCALL zend_timeout(int dummy) /* {{{ */

It's quite different on Windows, and indeed doesn't seem to reset timed_out.

Indeed. I have looked at that function many times but haven't completely figured out what is happening on Windows. The unfortunate thing is that since I don't have a Windows box, I can't run it in a debugger and see what it actually does.

* So see whether we timed out while the function was running... */
if (EG(timed_out)) {
zend_timeout();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this a bit, I think instead of checking timed_out specifically here, we should do a VM interrupt check as in

ZEND_VM_HELPER(zend_interrupt_helper, ANY, ANY)
. After all the core issue here applies not just to internal timeouts, but also pcntl signal handling, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think I have figured out what you mean...

pcntl sets a signal handler which sets vm_interrupt to 1. That causes execution to bounce into pcntl's zend_interrupt_function next time the VM checks the interrupt flag. But that only happens when the VM is running PHP bytecode.

If you are running native shutdown functions, and pcntl is enabled, and a signal comes along, it will set the vm_interrupt flag but pcntl's zend_interrupt_function will never be called.

Do you think pcntl really should care about what is happening during shutdown, though? Maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think pcntl really should care about what is happening during shutdown, though? Maybe?

During shutdown probably isn't important, but there are other cases where we could run into this, at least in theory. For example, consider something like array_map('expensive_internal_function', $array). This would allow a signal handler to be executed in the middle of the array_map, rather than at the end.

Maybe it's possible to test this with a combination of InfiniteIterator and iterator_apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. Wow, I never thought of that. OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an extra test in 7f35610. I was hoping to show that this leaks the time_nanosleep return value (which should be an array in this case), but due to the way array_map is implemented this doesn't happen...

I think we should be repeating the EG(exception) check from a few lines above if the VM interrupt is executed, to make sure that code not freeing the retval on exception doesn't leak...

@alexdowad
Copy link
Contributor Author

alexdowad commented May 11, 2020

The Windows situation here still looks fishy. Looking at the zend_timeout implementation:

ZEND_API ZEND_NORETURN void ZEND_FASTCALL zend_timeout(int dummy) /* {{{ */

It's quite different on Windows, and indeed doesn't seem to reset timed_out.

That code was added by @weltling in 2016... wonder if he can comment on why the timed_out flag is not reset on Windows?

@cmb69
Copy link
Contributor

cmb69 commented May 11, 2020

It seems to me that EG(timed_out) is never used on Windows after zend_timeout() has been called (unless for the case where it is reset). Note that zend_timeout_handler() is POSIX only.

Anyhow, CreateTimerQueueTimer() is legacy (imported from threadpoollegacyapiset.h), and setitimer() is obsolescent, so we should consider to replace them. Also, tq_timer_cb() doesn't look thread-safe.

@alexdowad
Copy link
Contributor Author

@nikic @cmb69 As usual, I appreciate the thorough review. I also appreciate your approach of ensuring that the issue at hand is thoroughly understood before merging any attempted solution, even if it seems to work.

It is a fact that I don't understand everything about the existing code. For example, I am puzzled as to why tq_timer_cb accesses executor globals with eq->global = value rather than the EG macro.

I may study it a bit more, do some refactoring, and if these are accepted, rebase this PR on the refactored master.

@cmb69
Copy link
Contributor

cmb69 commented May 12, 2020

For example, I am puzzled as to why tq_timer_cb accesses executor globals with eq->global = value rather than the EG macro.

As I understand it, the callback can be run by an arbitrary thread (i.e. not necessarily the thread which scheduled the timer).

@nikic
Copy link
Member

nikic commented May 12, 2020

Anyhow, CreateTimerQueueTimer() is legacy (imported from threadpoollegacyapiset.h), and setitimer() is obsolescent, so we should consider to replace them. Also, tq_timer_cb() doesn't look thread-safe.

There's also https://bugs.php.net/bug.php?id=79464. On Linux we could switch from setitimer to timer_create(), but this functionality is not available on macos...

@alexdowad
Copy link
Contributor Author

For example, I am puzzled as to why tq_timer_cb accesses executor globals with eq->global = value rather than the EG macro.

As I understand it, the callback can be run by an arbitrary thread (i.e. not necessarily the thread which scheduled the timer).

Wow, I never thought of that. It makes sense now.

I wonder if there should be some kind of memory barrier or something to make sure the changes are visible on the thread which "owns" the executor globals in question.

@alexdowad
Copy link
Contributor Author

There's also https://bugs.php.net/bug.php?id=79464.

Interesting. I think I'll use the OP's C++ code when investigating the issue of script execution timeouts on ZTS builds...

@cmb69
Copy link
Contributor

cmb69 commented May 12, 2020

I wonder if there should be some kind of memory barrier or something to make sure the changes are visible on the thread which "owns" the executor globals in question.

It seems to me this is not a problem (presuming that our TLS usage is correct). However, I think we need some form of synchronization to avoid potential race conditions regarding tq_timer_cb().

@nikic
Copy link
Member

nikic commented May 12, 2020

I went ahead and committed f096087 and f83a23e separately, as those are not really related to this change. I think for the purpose of this PR I'd mainly like to see the switch to a vm_interrupt rather than timed_out check, the other issues are only tangentially related.

@nikic
Copy link
Member

nikic commented May 12, 2020

Might be worthwhile to run PHP under thread sanitizer sometime to check for synchronization issues...

@cmb69
Copy link
Contributor

cmb69 commented May 12, 2020

I'm afraid TSan is not available on Windows.

A time limit can be set on PHP script execution via `set_time_limit` (or .ini file).
When the time limit is reached, the OS will notify PHP and `timed_out` and `vm_interrupt`
flags are set. While these flags are regularly checked when executing PHP code, once the
end of the script is reached, they are not checked while invoking shutdown functions
(registered via `register_shutdown_function`).

Of course, if the shutdown functions are implemented *in* PHP, then the interrupt flag
will be checked while the VM is running PHP bytecode and the timeout will take effect.
But if the shutdown functions are built-in (implemented in C), it will not.

Since the shutdown functions are invoked through `zend_call_function`, add a check of the
`vm_interrupt` flag there. Then, the script time limit will be respected when *entering*
each shutdown function. The fact still remains that if a shutdown function is built-in and
runs for a long time, script execution will not time out until it finishes and the
interpreter tries to invoke the next one.

Still, the behavior of scripts with execution time limits will be more consistent after
this patch. To make the execution time-out feature work even more precisely, it would
be necessary to scrutinize all the built-in functions and add checks of the `vm_interrupt`
flag in any which can run for a long time. That might not be worth the effort, though.

It should be mentioned that this patch does not solely affect shutdown functions, neither
does it solely allow for interruption of running code due to script execution timeout.
Anything else which causes `vm_interrupt` to be set, such as the PHP interpreter receiving
a signal, will take effect when exiting from an internal function. And not just internal
functions which are called because they were registered to run at shutdown; there are
other cases where a series of internal functions might run in the midst of a script. In
all such cases, it will be possible to interrupt the interpreter now.
@alexdowad alexdowad force-pushed the timeout-in-shutdown-functions branch from a24b789 to b9fc273 Compare May 12, 2020 20:49
@alexdowad
Copy link
Contributor Author

I think for the purpose of this PR I'd mainly like to see the switch to a vm_interrupt rather than timed_out check, the other issues are only tangentially related.

OK, trying something along those lines...

@alexdowad
Copy link
Contributor Author

Please have a look now.

@alexdowad
Copy link
Contributor Author

alexdowad commented May 13, 2020 via email

@nikic
Copy link
Member

nikic commented May 13, 2020

Hmm. So both zend_timeout and zend_interrupt_function never return? I believe that that extensions can install their own zend_interrupt_function. As such, we may not be able to tell whether it will return or not. So is it not possible that the retval might still be needed? I think that zend_timeout never returns, so we could certainly free it in that case. Tell me if what I'm saying makes sense or not.

zend_timeout() is noreturn and causes unclean shutdown, so it's okay to leak. The case I have in mind is if zend_interrupt_function throws an exception, which means that zend_call_function throws an exception, and the caller might not free the return value. Though after checking use-sites, it looks like code does free the return value independently of whether an exception was thrown or not. I'll go ahead and merge this in its current form...

@php-pulls php-pulls closed this in 555489d May 13, 2020
@alexdowad
Copy link
Contributor Author

I wonder if there should be some kind of memory barrier or something to make sure the changes are visible on the thread which "owns" the executor globals in question.

It seems to me this is not a problem (presuming that our TLS usage is correct). However, I think we need some form of synchronization to avoid potential race conditions regarding tq_timer_cb().

@cmb69, any suggestion of what kind of synchronization to use? I don't know the codebase well enough yet to know what is already there.

@cmb69
Copy link
Contributor

cmb69 commented May 13, 2020

There is a set of "mutex" functions (actually, these are implemented as critical sections on Windows) which would be appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants