Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctype_alnum 5 times slower in PHP 8.1 or greater #11997

Closed
joanhey opened this issue Aug 19, 2023 · 12 comments
Closed

ctype_alnum 5 times slower in PHP 8.1 or greater #11997

joanhey opened this issue Aug 19, 2023 · 12 comments

Comments

@joanhey
Copy link
Contributor

joanhey commented Aug 19, 2023

Description

The following code:

<?php

$foo = '112312312312323';

$start_time = microtime(true);

for ($i = 0; $i < 10000; $i++) {
    $foo = $foo.'1';
    preg_match('/^[a-zA-Z0-9]+$/', $foo);
}

$end_time = microtime(true);

echo 'preg_match Result: ' . ($end_time - $start_time).PHP_EOL;

$foo = '112312312312323';

$start_time = microtime(true);

for ($i = 0; $i < 10000; $i++) {
    $foo = $foo.'1';
    ctype_alnum($foo);
}

$end_time = microtime(true);

echo 'ctype Result: ' . ($end_time - $start_time).PHP_EOL;

Resulted in this output:
A lot slower in 8.1 and 8.2

But I expected this output instead:
The same that in 8.0 or smaller.

Here start the performance degradation:
image

https://3v4l.org/fjSaZ
https://onlinephp.io/code/e3b9ddc370fdb0f00ff40447eadc24def0dd1abe

PD: I didn't test other ctype_* functions

PHP Version

8.1

Operating System

No response

@nielsdos
Copy link
Member

nielsdos commented Aug 19, 2023

Strangely, it was already way slower in 8.0.30 on my system. In 8.1.0 there is additional slowdown that seems to come from the indirect call when the code was refactored from macro into function: dc80ea7
However, even after reverting that change, it's only a little bit faster, still almost a factor of 5 slower...
Maybe your environment is smart enough to create an inline version of isalnum.

If I apply the following patch, the ctype_isalnum becomes almost twice as fast as the regex solution. Probably because it avoid C library call overhead:

diff --git a/ext/ctype/ctype.c b/ext/ctype/ctype.c
index 939959bc09..a0a09f0cf1 100644
--- a/ext/ctype/ctype.c
+++ b/ext/ctype/ctype.c
@@ -61,7 +61,7 @@ static PHP_MINFO_FUNCTION(ctype)
 }
 /* }}} */
 
-static void ctype_impl(
+static zend_always_inline void ctype_impl(
 		INTERNAL_FUNCTION_PARAMETERS, int (*iswhat)(int), bool allow_digits, bool allow_minus) {
 	zval *c;
 
@@ -99,10 +99,15 @@ static void ctype_impl(
 	}
 }
 
+static zend_always_inline int php_isalnum(int c)
+{
+	return (c >= '0' && c <= '9') || ((unsigned int) c | 32) - 'a' < 26;
+}
+
 /* {{{ Checks for alphanumeric character(s) */
 PHP_FUNCTION(ctype_alnum)
 {
-	ctype_impl(INTERNAL_FUNCTION_PARAM_PASSTHRU, isalnum, 1, 0);
+	ctype_impl(INTERNAL_FUNCTION_PARAM_PASSTHRU, php_isalnum, 1, 0);
 }
 /* }}} */
 

Could even be further improved by moving the deprecated code section into a separate no-inline function such that the code cache pressure doesn't increase too much.

We could make our own version for each of the iswhatever functions, but idk if we want this? Maybe some other people can voice there opinion here :-) EDIT: this may be undesirable because they depend on the locale, even for the isalnum case...

@Girgias
Copy link
Member

Girgias commented Aug 19, 2023

Meh, I don't really mind.

However, a large part of the complexity of those functions is related to needing to deal with integers.

When this is removed and a proper string ZPP check can be used, it might make more sense to just inline everything instead of passing by a common generic implementation.

@nielsdos
Copy link
Member

However, a large part of the complexity of those functions is related to needing to deal with integers.

According to callgrind, the most time is spent in the isalnum call and the if check. The overhead of the C library call is quite high.

That being said, hand-rolling our own iswhatever functions would give the biggest speedup, but would also introduce complexity because we actually have to take into account the current C locale...

@joanhey If you compiled PHP yourself, are you able to test what the performance difference is on your system when you add zend_always_inline to ctype_impl (like I did in the patch, but without the other additions).

@Girgias
Copy link
Member

Girgias commented Aug 20, 2023

I maybe should go forward with actually deprecating setlocale() and forcing the C locale to be used. As this is a constant source of pain

@bwoebi
Copy link
Member

bwoebi commented Aug 21, 2023

@Girgias Probably we can just refer to the Locale class from intl. And rip out setlocale() support everywhere else.
However this also means every printf("%f") for example will be affected, which may not be desirable. I don't think the path to deprecation is as simple as it seems, as locales are still involved in core places of PHP.

@Girgias
Copy link
Member

Girgias commented Aug 21, 2023

I don't think it's simple, but is probably worthwhile. Also printf() for useland uses our own implementation, for C extensions I'm not sure this is something widely used within php-src (I do have a script somewhere to check) but even then they probably shoud use the custom version.

@morrisonlevi
Copy link
Contributor

morrisonlevi commented Sep 5, 2023

Note that on musl's libc, these functions don't take into account locale. You can see the sources here: https://git.musl-libc.org/cgit/musl/plain/src/ctype/ though I've copied relevant pieces here:

int isalnum(int c)
{
	return isalpha(c) || isdigit(c);
}

int isalpha(int c)
{
	return ((unsigned)c|32)-'a' < 26;
}

int isdigit(int c)
{
	return (unsigned)c-'0' < 10;
}

Of course, their locale support is... well I'll just quote them:

Locale support is very limited, and barely works.

Just adding this as more background information when making a decision.

@morrisonlevi
Copy link
Contributor

LLVM's libc implementation also does not support locales here, which is expected there since they don't support locales yet.

Honestly, I'm not certain locale support is even desirable for these newer libc implementations. The only system which is not totally bonkers broken is UTF-8, and ctypes are definitely not that, so...

@nielsdos
Copy link
Member

I took a look at this again.

Sample patch: https://gist.github.com/nielsdos/0faacfefb4844be0cf107d775c5d0ac7

It's worth noting that glibc & musl ctype implementation comes in two variants: function-based and macro-based. e.g. we have an isalnum macro and an isalnum function. Currently, the function-based version is used because ctype function are passed as a function pointer to ctype_impl.
When we use a macro to implement ctype_impl instead, the macro-based version of isalnum etc can be used. That's what my sample patch accomplishes.

OP's example code is around ~0.02s faster on my system (glibc-based) after this patch. So it's still much slower than OP claims it was or in comparison to pcre. Mainly we got rid of some call overhead and can do the check inline. Thinking about it, the benchmark code that compares pcre with ctype isn't even fair because the pcre code doesn't take into account the locale...

However, on Alpine there is indeed a much higher win and I get an enormous speedup, the time for ctype is about half the time for pcre. This is practically the same result as OP got for PHP 8.0, so I suspect OP is on Alpine or another musl-based system. On those systems the sample patch is more worthwhile.

Any thoughts?

@Girgias
Copy link
Member

Girgias commented Sep 25, 2023

So it's basically reverting dc80ea7, which would explain the difference in speed from 8.0 to 8.1.

I don't really care, even if I don't really like macros, but this case has always been pretty readable, even in macro form.

I wonder if it still doesn't make sense to change the behaviour to not depend on the C locale, as we have done with other functions recently (strtolower coming to mind).

nielsdos added a commit to nielsdos/php-src that referenced this issue Sep 25, 2023
Currently, a common function is used where a function pointer gets
passed to check the character class type. If we instead use a macro, the
macro version of these character class type checkers can be used. While
this gives only a minor speed-up for glibc-based systems, on Alpine this
gives a multi-facor speed-up

This is essentially a manual revert of dc80ea7.

Full discussion in phpGH-11997.
@nielsdos
Copy link
Member

So it's basically reverting dc80ea7, which would explain the difference in speed from 8.0 to 8.1.

Indeed.

I don't really care, even if I don't really like macros, but this case has always been pretty readable, even in macro form.

Yeah I'm quite neutral too, but I guess it doesn't hurt, and it does get rid of a regression.

I wonder if it still doesn't make sense to change the behaviour to not depend on the C locale, as we have done with other functions recently (strtolower coming to mind).

Good question. I don't know what the actual user expectation is of these functions. I think I'd expect them to behave like how they would in C (i.e. respecting the locale) because they borrowed the c header's name as a prefix.

@Girgias
Copy link
Member

Girgias commented Sep 26, 2023

I don't know what the user expectation is either.

Maybe something to add to php-task repo (as a function that depends on the locale, which might make sense to try and list everything that does to see what the scope of the impact would be about removing it)

nielsdos added a commit that referenced this issue Sep 26, 2023
* PHP-8.1:
  Fix GH-11997: ctype_alnum 5 times slower in PHP 8.1 or greater
  Fix GH-12297: PHP Startup: Invalid library (maybe not a PHP library) 'mysqlnd.so' in Unknown on line
nielsdos added a commit that referenced this issue Sep 26, 2023
* PHP-8.2:
  Fix GH-11997: ctype_alnum 5 times slower in PHP 8.1 or greater
  Fix GH-12297: PHP Startup: Invalid library (maybe not a PHP library) 'mysqlnd.so' in Unknown on line
nielsdos added a commit that referenced this issue Sep 26, 2023
* PHP-8.3:
  Fix GH-11997: ctype_alnum 5 times slower in PHP 8.1 or greater
  Fix GH-12297: PHP Startup: Invalid library (maybe not a PHP library) 'mysqlnd.so' in Unknown on line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants