-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable persistent CurlShareHandle
objects
#15603
feat: enable persistent CurlShareHandle
objects
#15603
Conversation
Thanks for the PR! However, persistent resources are broken by design for ZTS builds, since Additionally, the usage of resources is frowned upon (https://wiki.php.net/rfc/resource_to_object_conversion), so I don't think we should introduce any new resource types. |
Thanks!
I see, well I'm not tied to this particular implementation but as far as I can tell this is similar to how existing persistent handles work, like persistent streams, memcached connections, mysql connections, etc. If there's a better way to do this that is ZTS-compatible though I'm happy to make the change!
If I understand correctly, this is using a resource under the hood, but it is still exposed as a |
Right, and we have at least one bug report about mysqli about this.
It seems to me that the only proper way to handle persistent objects would be to keep them in shared memory. It probably makes no sense to use OPcache for that, since OPcache is about unmodifiable data, so we would need to introduce some other SHM cache (with all the glitches that may arise; e.g. users may not want to use, ASLR issues on Windows and maybe other platforms, platform dependend implementations, etc). Methinks this is not a small task, and would go far beyond this feature (persistent curl shared handles), and probably requires the RFC process.
Right, but we even want to get rid of these "hidden" resources, see e.g. #14282. |
If I understand correctly, the behavior in that bug report is exactly what we would expect. That is, we expect that each
I'm definitely open to creating an RFC! However the scope seems somewhat undefined to me - would we be aiming to achieve a thread-safe persistent resource replacement? Would suggesting a change to all existing persistent resources be in-scope? Is there already a plan for fixing all existing persistent resources that I could build on top of? |
I think so. Likely controversial, since as you've said, some may expect the current behavior, and others a different behavior. And given that SHM may not work, there would need to be fallback anyway (i.e. that shared cache would need to be optional somehow).
Not that I'm aware of. Maybe @kocsismate is, since he is driving forward the resource to object conversion (which is long overdue, since resources have a couple of issues). |
If we are really going to get rid of internal resources (and not just userspace ones), then I'd expect we'd either need some solution for things like If we agree that it should, might we consider merging this (barring any other code improvements) while the current pattern exists? I imagine that any solution we come to for refactoring the other cases of persistence would apply here as well, so migrating this to a hypothetical new pattern would be straightforward. I'm obviously biased, but I do think this is a valuable change, since it would provide a way of reusing work from earlier PHP requests which would reduce end-user latency. If alternatively we decide that PHP should no longer support persistence, that would surprise me since I'd expect it to be controversial. But if so, I'd understand not wanting to merge this as it would only move us further from that goal.
To clarify my understanding a bit, I understand why userspace resources have a number of issues, but I'm not sure I understand the downside of resources used internally, like in this case. Is it that people have differing expectations of how they interact with threads and processes? Architecturally it seems clear to me that these things would be per-thread or per-process, and not shared globally, so it feels like this may be more of an issue of expectations than a technical underpinning. Maybe there are other problems that I'm not aware of, however! |
From the RFC mentioned indirectly above:
While only the last point would be relevant for internal resources, there is additionally the problem that resources get a unique ID (32bit unsigned integer) which eventually leads to depletion especially with long running processes (which are rather important nowadays, consider Swoole or FrankenPHP etc.) There is pretty broad consensus that we want to finally rid resources, and it seems that according to that RFC persistent dba and odbc connections have already been replaced with objects (so basically behave the same as non-persistent connections). From skimming https://externals.io/message/121660 and https://externals.io/message/122116, persistent resources apparently have not been discussed at all. I don't know why. |
Right, I had read the RFC and it seemed to me like it didn't quite apply to internal resources. That said, your point about using 32 bit unsigned integers makes sense - though I'm surprised that there aren't other issues with long running processes. That is, I'd expect other types of PHP runtimes to do something similar to Apache's
Thanks for mentioning To clarify, I don't actually care about using a persistent resource here, I was mainly copying from other examples of persistence in PHP (like |
May make sense, but that's up to the SAPIs; the engine should still be able to work without such measures. And frankly, 32bit unsigned is a lot, but nowadays it might just not be enough.
Ah, thanks! I had a closer look and found that persistent resources are only freed in the global shutdown handler (i.e. when the process/thread terminates), but not in the request shutdown handler. So that it basically the same as persistent resource lists. Note, though, that
Then it should be fine to do it as it's done in the other PRs. |
Ah, I'm looking into this more and you're right, it seems like the actual key to persistence for those drivers is the Thanks for the help so far! |
@cmb69 upon further inspection, it seems like both the In Line 581 in 367f303
Lines 937 to 944 in 367f303
and in Line 2324 in 367f303
Lines 2344 to 2348 in 367f303
I'm not entirely sure why, since they have a persistent I'll continue trying to write this without a persistent resource, but maybe someone could provide some more information here? It seems to me like internal resources may still be the correct pattern for persistence for the time being, despite their drawbacks. |
Indeed. See #14239 (comment). Apparently noone followed up.
Right. As is, the implementation in dba.c (haven't checked odbc.c) seems to be broken; at least the resource would be unnecessary. I'll have a closer look as soon as possible.
Please do so. :) |
I planned some changes, and discussed them with Niels on PHPF slack (https://phpfoundation.slack.com/archives/CSSNEANGY/p1716414441035159?thread_ts=1716048807.178419&cid=CSSNEANGY). TLDR: he was against introducing a new API until we can remove the current one. Unfortunately, it won't be possible for quite a while we have to respect the current ABI for 3rd party extensions... So I guess it will be a PHP 10 thing. |
We |
Thanks for the info @kocsismate!
I cannot view this discussion, as it seems to require a PHP Foundation slack account, and I can't seem to find a way to join (though I'd be happy to!). Thanks for the TLDR though.
I'm close to finishing up a non-persistent resource approach for the curl share handle, so I will hopefully push that up soon. That said - and I realize I've made a case for sticking with the status quo before - your above comments give me pause in moving away from the persistent internal resource approach in this PR. Would we want |
7316e84
to
428e25c
Compare
@cmb69 (and @kocsismate, if you don't mind taking a look), I've updated this PR to use |
Okay, that makes some sense. Although I consider
From a quick look, this is generally okay, but you should add a test (or three); these likely will not work with ZTS, since you're not using Anyhow, this needs at least approval from @adoy as code owner, and one of the RMs (@ericmann, @NattyNarwhal, @SakiTakamachi), but given that it introduces a new concept to ext/curl, might require the RFC process. It's unfortunate that nobody replied to your mail to the internals list so it's hard to assess what people generally think of this feature. |
Yes, I agree that this requires an RFC. And regardless of the need for RFC, this cannot be incorporated into 8.4. Because this is clearly a new feature (that means it's not a bugfix) and not a change that passed an RFC before the feature freeze. Of course, we can discuss by targeting the next version from 8.4 :) |
I agree it's probably a bit too late for 8.4. I will also say persistent objects also tend to be a bit awkward, especially in the transition to using class objects instead of resources. One such case is closing a connection in the cases you may need to. You would manually close a persistent resource, but an object's destructor is hit when references go away, but the persistent list holds a reference. |
I'm not confident on how exactly to test this, so if this ends up needing an RFC I'll likely defer adding tests until I get a feel for which way the vote is going.
I don't believe I'm ever using Line 238 in d552fbd
Everywhere else I use
Agreed. I'll bump that email thread to see if I can get any other input, and if an RFC is required I'm open to doing that. |
I don't have a particular deadline in mind, so any release version is fine with me! We're (Etsy) comfortable with running our own patched version of PHP so just knowing that it will make it into any version is enough. |
This can be solved by having a |
Sounds like we should be doing this for PDO, it's something I've run into with clients. (Sorry to disrupt your thread Eric, I'll make a separate issue for this tangent.) |
Hi, can you clarify that this allows reusing the same CURL handle between FPM requests? (FPM wasn't clearly mentioned in the thread so I want to make sure) FYI this is something we discussed with Roman Pronskiy, also with @dunglas. If this is it, I think this could have HUGE benefits to applications out there. Switching from PHP-FPM to a runtime like FrankenPHP, Laravel Octane (Roadrunner/Swoole), or similar, usually gives the following benefits:
The second part is much less talked about, but very important in terms of performance. Opening an HTTPS request can sometimes take 100ms or more! I believe such a feature could be a performance game changer for the PHP ecosystem, without having to rewrite apps so that they work in long-lived processes (assuming libraries like Guzzle, etc. take advantage of it ofc). |
@mnapoli I believe this would work in any form of PHP, i.e. The thing to keep in mind is that this will only persist share handles between requests to the same SAPI worker, so if you had 100 I do think this would potentially have large performance benefits for certain PHP applications, even considering the above. |
Very clear, that makes sense! Thanks for clarifying! |
ext/curl/interface.c
Outdated
@@ -416,6 +422,19 @@ PHP_MINIT_FUNCTION(curl) | |||
} | |||
/* }}} */ | |||
|
|||
/* {{{ PHP_GINIT_FUNCTION */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please no more Vim fold markers. The PHP_GINIT_FUNCTION
is extraneous too.
/* {{{ PHP_GINIT_FUNCTION */ |
ext/curl/share.c
Outdated
CURLSHcode error; | ||
|
||
ZEND_PARSE_PARAMETERS_START(2, 2) | ||
Z_PARAM_STR_EX(id, 1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are equivalent:
Z_PARAM_STR_EX(id, 1, 0) | |
Z_PARAM_STR_OR_NULL(id) |
ext/curl/share.c
Outdated
error = curl_share_setopt(sh->share, CURLSHOPT_SHARE, zval_get_long(entry)); | ||
|
||
if (error != CURLSHE_OK) { | ||
php_error_docref(NULL, E_WARNING, "could not construct persistent curl share: %s", curl_share_strerror(error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warnings were canonized a few years ago to start with a capital letter (at least most of them):
php_error_docref(NULL, E_WARNING, "could not construct persistent curl share: %s", curl_share_strerror(error)); | |
php_error_docref(NULL, E_WARNING, "Could not construct persistent cURL share: %s", curl_share_strerror(error)); |
The curl spelling is not consistent with
Line 67 in 11accb5
zend_argument_value_error(2, "is not a valid cURL share option"); |
ext/curl/share.c
Outdated
ZEND_HASH_FOREACH_VAL(Z_ARRVAL_P(arr), entry) { | ||
ZVAL_DEREF(entry); | ||
|
||
error = curl_share_setopt(sh->share, CURLSHOPT_SHARE, zval_get_long(entry)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice not to use weak mode when retrieving the int
value from entry
. Instead, zval_get_long_ex(entry, true)
could be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but the is_strict
parameter of zval_get_long_ex()
must not be confused with strict typing.
ext/curl/curl.stub.php
Outdated
@@ -3687,6 +3687,9 @@ function curl_share_errno(CurlShareHandle $share_handle): int {} | |||
/** @refcount 1 */ | |||
function curl_share_init(): CurlShareHandle {} | |||
|
|||
/** @refcount 1 */ | |||
function curl_share_init_persistent(string $persistent_id, array $shares): CurlShareHandle|false {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a bit expressive name for the 2nd parameter. I'm thinking about $share_types
or something similar.
ext/curl/share.c
Outdated
error: | ||
zval_ptr_dtor(return_value); | ||
|
||
RETURN_FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would be possible to throw instead of returning false
?
/* {{{ PHP_GINIT_FUNCTION */ | ||
PHP_GINIT_FUNCTION(curl) | ||
{ | ||
zend_hash_init(&curl_globals->persistent_share_handles, 0, NULL, curl_share_free_persistent, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will need some discussion in the RFC whether we want to stop using the EG(persistent_list)
.
|
||
PHP_GSHUTDOWN_FUNCTION(curl) | ||
{ | ||
zend_hash_destroy(&curl_globals->persistent_share_handles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would make sense to explicitly close persistent shares with curl_share_close
? It's a no-op for regular shares, but it may be useful for persistent ones?
Thanks for the feedback @kocsismate. I've submitted an RFC (https://wiki.php.net/rfc/curl_share_persistence), so I'll address your changes and any others from the RFC process once we come to a consensus. |
This commit introduces a new function, curl_share_init_persistent, that creates a php_curlsh struct that can live beyond a single PHP request. Persisting a curl share handle would allow PHP userspace to cache things like DNS lookups, or even entire connections, between multiple PHP requests, thus reducing work for subsequent requests. I created a new function instead of reusing the existing curl_share_init function since it also takes an array of curl_share_setopt options to set when the persistent handle doesn't yet exist. It is noteworthy that calling curl_share_setopt on the persistent handle would affect future requests using the handle; we could consider preventing this. It is also noteworthy that changing the persistent share options would not take effect if the persistent share already existed; changing the persistent share ID would be sufficient to resolve that.
8596760
to
c772d79
Compare
- The signature of curl_share_init now takes two optional parameters, $share_options and $persistent_id. - I've removed vim markers, and made other small edits per suggestions from the pull request. - curl_share_close now closes persistent share handles. The php_curlsh struct needs to keep track of the persistent ID string in order to remove it from the module global hash table. - curl_share_init throws an exception if the share options are invalid. This should be backwards compatible as share options are optional, so it can only throw once they are specified incorrectly.
c772d79
to
e52ca6e
Compare
I opted to use the existing Caddy testing infrastructure since it supports keepalives, whereas it seems the PHP development server does not. Alternatively I could write just enough of a socket listener to confirm that there is only ever a single connection coming from curl, but I believe it is safe to rely on connect_time being zero when a connection is reused.
025efee
to
4de6a1d
Compare
Hey @kocsismate and @cmb69! I am working to get this PR into a mergeable state now that the vote has passed. I believe I've addressed all of the comments so far. I have also submitted an alternative PR in #16937 in order to attempt to address some of the concerns from the RFC vote discussion. Let me know what you think, and if there is any other feedback you may have! |
As https://wiki.php.net/rfc/curl_share_persistence_improvement has been accepted, I am closing this PR in favor of #16937. |
This PR introduces a new function,
curl_share_init_persistent
, that creates aphp_curlsh
struct that can live beyond a single PHP request.Persisting a curl share handle would allow PHP userspace to cache things like DNS lookups, or even entire connections, between multiple PHP requests, thus reducing work for subsequent requests.
I created a new function instead of reusing the existing
curl_share_init
function since it also takes an array ofcurl_share_setopt
options to set when the persistent handle doesn't yet exist.It is noteworthy that calling curl_share_setopt on the persistent handle would affect future requests using the handle; we could consider preventing this.
It is also noteworthy that changing the persistent share options would not take effect if the persistent share already existed; changing the persistent share ID would be sufficient to resolve that.
I'm open to any feedback whatsoever, including the name! I would also add documentation if people were amenable to this PR.