Optimize string allocation in uri_parser_rfc3986 when getting IPv6 hosts and url paths#21550
Optimize string allocation in uri_parser_rfc3986 when getting IPv6 hosts and url paths#21550kocsismate merged 9 commits intophp:masterfrom
uri_parser_rfc3986 when getting IPv6 hosts and url paths#21550Conversation
|
If this case really needs to be optimized, I would prefer extending smart_str to support appending to the string without bounds checks, with debug assertions. But I'm not code-owner. |
Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
I think this is the better solution initially. But I don't want to touch smart_str just because of this small patch. |
…nto optimize-3
Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
…nto optimize-3
|
@LamentXU123 Thank you. Can you prepare another benchmark with the latest changes? Ideally using hyperfine. The |
Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
Sure. The optimized one is 1.05x faster in this benchmark case. <?php
$url = "http://example.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/segment8/segment9/segment10/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z";
$sum = 0;
for ($i = 0; $i < 100000; $i++) {
$uri = Uri\Rfc3986\Uri::parse($url);
$path = $uri->getPath();
$sum += strlen($path);
}
echo $sum, PHP_EOL;In cases when the path is less nested, like: <?php
$url = "http://example.com/segment1";
$sum = 0;
for ($i = 0; $i < 1000000; $i++) {
$uri = Uri\Rfc3986\Uri::parse($url);
$path = $uri->getPath();
$sum += strlen($path);
}
echo $sum, PHP_EOL;They are at the similar speed, the optimized one is 1.01x faster |
uri_parser_rfc3986 when dealing with IPv6 hostsuri_parser_rfc3986 when getting IPv6 hosts and url paths
TimWolla
left a comment
There was a problem hiding this comment.
I don't love the additional complexity, but it seems worth it.
Currently,
php_uri_parser_rfc3986_host_readandphp_uri_parser_rfc3986_path_readuse the smart_str API to build the host and path strings. I thinksmart_str_append*introduces unnecessary overhead here, since it use repetitive boundary checks and dynamic memory reallocations.This PR optimize string allocation in
uri_parser_rfc3986by replacingsmart_strwith pre-calculatedzend_string_alloc. Focusing on two main functions,php_uri_parser_rfc3986_host_read()andphp_uri_parser_rfc3986_path_read(), which effectgetHost()andgetrawHost()for php_uri_parser_rfc3986_host_read()
This PR constructs the IPv6/IPFuture host directly using a fixed-length zend_string (formatted as [ + hostText + ] + \0). This replaces the previous smart_str appending process.
for php_uri_parser_rfc3986_path_read()
This PR first traverses the segments to pre-calculate the total length (total_len). This includes the leading / and segment delimiters. Then it performs a single zend_string_alloc() and populates the path content and delimiters using memcpy, and finally append the \0 terminator.
Benchmark script: bench.php
for
getHost()for
getrawHost()