-
Notifications
You must be signed in to change notification settings - Fork 7.9k
True async api stable #19142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
True async api stable #19142
Conversation
Initial version of the asynchronous API for PHP. Includes only the API itself and the necessary core changes required for the API to function.
…ized into a method for retrieving any ClassEntry. Now, using this API function, you can obtain the required class entry by a type descriptor type.
Added API functions for coroutine context management: - zend_async_context_set_t: set value by string or object key - zend_async_context_get_t: get value by string or object key - zend_async_context_has_t: check if key exists - zend_async_context_delete_t: delete key-value pair Includes convenience macros for easier usage: ZEND_ASYNC_CONTEXT_SET_STR/OBJ, GET_STR/OBJ, HAS_STR/OBJ, DELETE_STR/OBJ
Added complete Context API implementation: - Updated zend_async_scheduler_register signature to accept context functions - Added context function pointers and registration in zend_async_API.c - Context API supports both string and object keys - Includes convenience macros for easy usage This completes the core API infrastructure for coroutine context management.
- Remove separate context_set, context_get, context_has, context_delete functions - Add single zend_async_new_context_fn function to create context instances - Move context implementation to ext/async module using zend_async_context_t structure - Update scheduler registration to include new_context_fn parameter - Add context field to zend_async_globals_t and zend_coroutine_s - Simplify Context API macros to ZEND_ASYNC_NEW_CONTEXT and ZEND_ASYNC_CURRENT_CONTEXT
% removal of the current Scope from the global structure
… always needs to be explicitly known. Macros like ZEND_ASYNC_CURRENT_SCOPE were updated. A new macro ZEND_ASYNC_MAIN_SCOPE was added for the main Scope.
…main coroutine can correctly return control.
* add macro START_REACTOR_OR_RETURN for reactor autostart
… is now passed to the main coroutine’s finalize method instead of being displayed on screen. * Fixed an issue with correctly passing the exception into the coroutine.
… captures the exception, marking it as handled in another coroutine.
ee7eb25
to
9486c42
Compare
ZEND_API bool zend_async_reactor_register(char *module, bool allow_override, | ||
zend_async_reactor_startup_t reactor_startup_fn, | ||
zend_async_reactor_shutdown_t reactor_shutdown_fn, | ||
zend_async_reactor_execute_t reactor_execute_fn, | ||
zend_async_reactor_loop_alive_t reactor_loop_alive_fn, | ||
zend_async_new_socket_event_t new_socket_event_fn, | ||
zend_async_new_poll_event_t new_poll_event_fn, | ||
zend_async_new_timer_event_t new_timer_event_fn, | ||
zend_async_new_signal_event_t new_signal_event_fn, | ||
zend_async_new_process_event_t new_process_event_fn, | ||
zend_async_new_thread_event_t new_thread_event_fn, | ||
zend_async_new_filesystem_event_t new_filesystem_event_fn, | ||
zend_async_getnameinfo_t getnameinfo_fn, zend_async_getaddrinfo_t getaddrinfo_fn, | ||
zend_async_freeaddrinfo_t freeaddrinfo_fn, zend_async_new_exec_event_t new_exec_event_fn, | ||
zend_async_exec_t exec_fn, zend_async_new_trigger_event_t new_trigger_event_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to put this into a struct so it can be extended without necessarily breaking the api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s probably a good idea and it’s worth implementing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a partial review so far. I have at least one important comment, but otherwise this looks great. I hope that True Async eventually passes.
However I'm not convinced that this should go into core yet, for the following reasons:
- The C API is large and reflects the same concepts and behaviors as the userland API, while ext/async is the backend. So further discussions on the userland APIs will necessitate changes in the C API as well.
- If we iterate on the API in the 8.5 branch, this will create incompatibilities between different versions of the extension and different PHP releases. People will have a hard time testing the extension, which is the main goal of merging the API into core early. If we don't iterate in the 8.5 branch, then we don't need to merge the C API into 8.5.
- There are no tests, so this code will eventually break during maintenance of 8.5 and master.
- There are actually very few changes in core, so it seems that we could add a few extension points in core, and move most of this code to the extension. Then it will be easier to iterate, and for people to test. These extension points could take the form of function pointers. For example we could make
shutdown_destructors
a function pointer. - I understand that we don't expect other extensions to try to integrate with the API. Therefore the API can be in ext/async, and we don't need this amount of decoupling and indirection.
uint32_t zend_async_internal_context_key_alloc(const char *key_name) | ||
{ | ||
#ifdef ZTS | ||
tsrm_mutex_lock(zend_async_context_mutex); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to allow usage of zend_async_internal_context_key_alloc()
only during the startup phase, so that no mutex is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mutex is needed because the initialization code may run in different PHP threads. If I remember correctly, that’s exactly how it works in extensions during RINIT. Although I understand what you’re getting at — using an earlier phase. I’ll think about that. However, overall I don’t see anything critical in using a mutex — such code is universal and costs nothing special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly a nit pick, but one advantage of allowing zend_async_internal_context_key_alloc() only in startup phase, in addition to not requiring a mutex, is that you can pre-allocate a fixed size context array that you could address directly by offset in the rest of the code, instead of using a hash table. See zend_get_op_array_extension_handles()
, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly a nit pick, but one advantage of allowing zend_async_internal_context_key_alloc() only in startup phase, in addition to not requiring a mutex, is that you can pre-allocate a fixed size context array that you could address directly by offset in the rest of the code, instead of using a hash table. See
zend_get_op_array_extension_handles()
, for example.
I didn’t think of that. That’s a good idea for optimization.
ZEND_API int zend_gc_collect_cycles(void) | ||
{ | ||
int total_count = 0; | ||
bool should_rerun_gc = 0; | ||
bool did_rerun_gc = 0; | ||
if (UNEXPECTED(ZEND_ASYNC_IS_ACTIVE && ZEND_ASYNC_CURRENT_COROUTINE != GC_G(dtor_coroutine))) { | ||
|
||
if (GC_G(dtor_coroutine)) { | ||
return 0; | ||
} | ||
|
||
start_gc_in_coroutine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new behavior seems complex. Let me know if I understand it correctly:
zend_gc_collect_cycles()
is now asynchronous whenZEND_ASYNC_IS_ACTIVE
. It creates a coroutine (GC_G(dtor_coroutine)
) and returns. (This implies that the userland functiongc_collect_cycles()
is async as well and always return0
.)- The coroutine re-enters
zend_gc_collect_cycles()
, but this time it doesn't create a coroutine as it detects that it's running inGC_G(dtor_coroutine)
already - The coroutine creates a microtask that re-creates the coroutine in case the first one suspends. This re-enters
zend_gc_collect_cycles()
again. - Some of
zend_gc_collect_cycles
local variables are moved toGC_G()
because they are shared between multiple invocations
I understand the intent is that if a destructor suspends, we continue in an other coroutine.
I'm not entirely sure of the priority that is given to the GC: Are coroutines added to the front or back of the queue? (I don't see where it is enqueued.) Does the microtask execute immediately after the coroutine is suspended?
I think it's not necessary that the entire GC runs in a coroutine: the scanning part is inherently synchronous, non-reentrant, must not be interrupted, and does not call any userland code; so it's not necessary and it also does not improve concurrency. It is enough that just the "Actually call destructors" part runs in a coroutine, like we do for fibers.
Can this be simplfied like this?
- Run
zend_gc_collect_cycles()
synchronously, outside of a coroutine, as usual - Invoke destructors from a coroutine if necessary, and call the rest of the destructors in another one in case a destructor suspends
- This process should not allow
zend_gc_collect_cycles()
to return before all destructors have been invoked, so it's not necessary to move its local variables toGC_G()
. (Returning before all destructors have returned is fine.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm you see is possibly the “simplest” one. But I hate it. I hate it, yet I can’t come up with anything simpler — at least not without rewriting the entire GC from scratch.
And the most important thing is that it’s essentially no different from the analogous algorithm for Fibers, where the same approach is used.
Let’s break down why it works this way.
Why does GC need a separate coroutine? Because it must ensure the integrity of variable state and the stack for its operation. The same applies to Fibers.
And this does not make the code complex. The main source of complexity is destructors.
Destructors are invoked from GC, but at the same time they can STOP it, which is unacceptable. That’s why a concurrent iteration algorithm is used here:
When a destructor suspends the GC coroutine, the Scheduler runs and activates a microtask.
The microtask sees that GC execution was interrupted. It creates a new GC coroutine, where work continues from the next destructor, while the current coroutine is handed over to the destructor that suspended it. After the destructor finishes, the coroutine will be destroyed.
The variables are structured in such a way that repeated calls to GC work as if they were never interrupted. In other words, an illusion of sequential code is created.
Invoking each destructor in its own coroutine would be wasteful in terms of resources. Moreover, the situation where a destructor suspends a coroutine is rather rare in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you need to execute the entire GC in a coroutine, when only this part can call userland code and potentially suspend:
Lines 2072 to 2078 in 1cff181
/* Actually call destructors. */ | |
zend_hrtime_t dtor_start_time = zend_hrtime(); | |
if (EXPECTED(!EG(active_fiber))) { | |
gc_call_destructors(GC_FIRST_ROOT, end, NULL); | |
} else { | |
gc_call_destructors_in_fiber(end); | |
} |
Invoking each destructor in its own coroutine would be wasteful in terms of resources.
The algorithm used for Fibers runs all destructors in the same Fiber as long as no destructor suspends. If a destructor suspends we create a new Fiber an run the remaining destructors in it. I don't understand why it's not possible to apply the same logic for coroutines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the analysis and comments. They are very helpful and will be applied in the near future!
I don't understand why you need to execute the entire GC in a coroutine, when only this part can call userland code and potentially suspend:
So you mean that the code that processes nodes and the code that calls destructors can easily be split into two stages? And that these stages can be executed at completely different times?
The algorithm used for Fibers runs all destructors in the same Fiber as long as no destructor suspends. If a destructor suspends we create a new Fiber an run the remaining destructors in it. I don't understand why it's not possible to apply the same logic for coroutines.
Because you cannot transfer the state of the stack and the function’s variables into another stack-coroutine. And that is exactly what happens. This is the first reason. The second reason is that a Fiber works differently: it is guaranteed to return to the point that invoked it, whereas a coroutine does not.
(I see that after the destructor has been called, there is a separate block of code that executes afterward. And it seems that they cannot be easily separated.)
Therefore, when code calls a user-land destructor function that can suspend a coroutine, at that moment the GC must either run with the same context and variables in another coroutine, or else it has to WAIT until the destructor finishes.
Ok, actually there is a more elegant way. We can separately create a list of objects whose destructors need to be called, and then process this list asynchronously in a separate coroutine.
But all this leads to the fact that the GC would need to be almost completely reworked. Its approach to collecting roots and calling destructors would have to be reconsidered. If that is done, the code will most likely be better—80% chance it will be better than the current one. But… for now, it looks like even more changes.
If it comes to completely reworking the GC, then it would be more logical to finally make it run in a separate thread and be 100% concurrent. But this is clearly not part of TrueAsync at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you need to execute the entire GC in a coroutine, when only this part can call userland code and potentially suspend:
So you mean that the code that processes nodes and the code that calls destructors can easily be split into two stages? And that these stages can be executed at completely different times?
It could almost be split yes.
But what I wanted to emphasis is that only the code that calls destructors needs to be protected from suspension.
Because you cannot transfer the state of the stack and the function’s variables into another stack-coroutine.
Of course, but my point is you probably don't need to do this.
The second reason is that a Fiber works differently: it is guaranteed to return to the point that invoked it, whereas a coroutine does not.
I think what you need is an API that can execute a coroutine (in a way that blocks the caller), and that returns when the coroutine has switched. Exactly like execute_next_coroutine()
, but with a specific coroutine. Then you can use the same algorithm as for Fibers:
- Spawn a coroutine that does approximately this:
while ($dtor = pop_dtor()) { $dtor(); }
- Execute it. Block until the coroutine switches or returns
- If there are destructors remaining, repeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, but my point is you probably don't need to do this.
This was the first idea that came to my mind. But there’s a nuance here.
/* Destroy zvals. The root buffer may be reallocated. */
GC_TRACE("Destroying zvals");
zend_hrtime_t free_start_time = zend_hrtime();
idx = GC_FIRST_ROOT;
while (idx != end) {
current = GC_IDX2PTR(idx);
This code ultimately destroys the ZVALs themselves and runs after the destructors have been executed.
As we can see, it also iterates over the list while (idx != end) {.
So, if we don’t know for sure that all destructors have already been executed, we cannot use while (idx != end) {.
We need something else — for example, to have a list of ZVALs that can be freed.
And these are not all the actions that occur after the destructors are called. Therefore, it turns out that the destructor code is currently tied to the second half of the GC function.
To completely separate this code from each other, it seems necessary to fully rework the second part of the function so that it operates on a different ZVAL list. Or so that it can work with slices of this list. This is a difficult task for me because I need to fully dive into the GC algorithm. Therefore, I can honestly admit that I wrote bad code in order to avoid a full GC refactoring :)
I think what you need is an API that can execute a coroutine (in a way that blocks the caller), and that returns when the coroutine has switched. Exactly like execute_next_coroutine(), but with a specific coroutine. Then you can use the same algorithm as for Fibers:
But this breaks the overall logic and makes the code more complicated. It’s not worth it, because one way or another the GC needs to be completely reworked for concurrent logic. My point is simply not to do it right away.
I would prefer the GC to be reworked by someone more experienced than me, or by my future self. The solution in this PR is, on the one hand, not the best, but on the other hand, it’s as close as possible to the previous GC logic. That’s its advantage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's safe to execute this loop before all destructors have returned:
- It frees only
GARBAGE
nodes. This excludesDTOR_GARBAGE
nodes (the queued destructors). [1] - None of the
GARBAGE
nodes are reachable from destructors, so it safe to free them before dtors have returned. [2]
Please reconsider the solution I'm proposing, as I feel that it's really simple and doesn't need a complete rework of the GC.
[1] You can see that nodes with destructors are marked as DTOR_GARBAGE
here:
Lines 2034 to 2037 in 8cd085a
/* Mark all roots for which a dtor will be invoked as DTOR_GARBAGE. Additionally | |
* color them purple. This serves a double purpose: First, they should be | |
* considered new potential roots for the next GC run. Second, it will prevent | |
* their removal from the root buffer by nested data removal. */ |
And we handle only those in
gc_call_destructors()
: Line 1880 in 8cd085a
if (GC_IS_DTOR_GARBAGE(current->ref)) { |
[2] This is past the point where we have proven that GARBAGE
are unreachable from anywhere except other GARBAGE
nodes, and we have removed of the buffer any nodes reachable from destructors here:
Line 2058 in 8cd085a
/* Remove nested data for objects on which a destructor will be called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds not bad, but I still don’t understand how this can help.
Let’s say we have one destructor.
We call the GC code, and it calls this destructor.
The destructor stops the GC coroutine. We want to continue GC work in another coroutine.
How can we do that, while guaranteeing we don’t break the state of the stack or nodes?
Example:
if (obj->handlers->dtor_obj != zend_objects_destroy_object <-- we stop here
|| obj->ce->destructor) {
current->ref = GC_MAKE_DTOR_GARBAGE(obj);
Here’s what just happened:
- The destructor execution took away our coroutine along with the entire GC state.
- But we don’t want to wait for this destructor, since there are others as well.
We basically need to call the GC function again, but in such a way that it continues execution as if there had been no interruption.
Of course, there’s another way. We could run all destructors in a separate coroutine and wait until it finishes. This is cheaper than trying to run each destructor individually. I could actually try implementing this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, there’s another way. We could run all destructors in a separate coroutine and wait until it finishes. This is cheaper than trying to run each destructor individually. I could actually try implementing this option.
This is exactly what I'm suggesting 👍 This is what we do for fibers currently (see gc_call_destructors_in_fiber)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I’ll try to implement this to compare. However, in this case running the GC may increase RAM usage by the size of the C stack for the Fiber (which is something over 2 MB if I’m not mistaken).
It cannot go into 8.5 in any case as the RFC vote has not started yet and the deadline for the start has already passed. |
I don’t see any problem here. The code is not required to be permanent. This is explicitly stated in the RFC. The fact that core code is tied to userland is an inevitable reality, and of course major changes in userland will inevitably lead to changes in core. But this is not a design flaw, it’s something that cannot be avoided. Userland should change very rarely, while the binary API, on the contrary, can change very often and that’s normal. These may be changes due to optimizations, a better API, optimizations of structures, memory, and so on. It should change in order to make the code efficient. Of course, there will be different versions of the extension for different versions of PHP, but again I don’t see any problem with this, since it is normal practice to have your own extension versions for different PHP releases. |
Co-authored-by: Arnaud Le Blanc <365207+arnaud-lb@users.noreply.github.com>
I agree that code can be changed, but my point is that changing code often, and in breaking ways, when it's in php-src, will be more painful for you and the users of the extension than if it was maintained in a separate repository. Every change will have to go through code review, for example.
I'm not saying it's a design flaw. I'm saying I don't see benefits, only drawbacks, in merging all of this in php-src now. When you suggested to merge some parts of Async a while back I imagined you wanted to merge lower-level APIs only (e.g. changes in zend_fiber.c), that would allow the rest of the extension to plug into the engine where it needed it. But this PR proposes to merge the higher level part of Async, the one that is tightly coupled to the userland API and will need to be changed. This part would be maintained with less friction in a separate repos while you iterate on the userland API.
Please note that we usually don't allow breaking changes in patch releases (8.6.x), and avoid them when possible in minor releases (8.x) as well to reduce extension churn. Of course we could make an exception for Async while the API is considered unstable. What bothers me is that I don't see any benefits in merging this now in core. This will not allow you to move faster, it will create more friction. This will not allow users to test more easily, as you could chip this code in the Async extension directly. |
The benefit was very simple: PHP 8.5 could have been asynchronous thanks to this code. But now this issue is no longer relevant :) |
TrueAsync engine API
The TrueAsync engine API defines a pluggable interface that lets extensions register different async backends while the core supplies standardized primitives.
Key Components
PR for https://wiki.php.net/rfc/true_async_engine_api