Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.x] Optimize Str::startsWith() #32243

Merged
merged 2 commits into from Apr 6, 2020
Merged

[7.x] Optimize Str::startsWith() #32243

merged 2 commits into from Apr 6, 2020

Conversation

SjorsO
Copy link
Contributor

@SjorsO SjorsO commented Apr 5, 2020

This PR makes Str::startsWith around 10% faster by using strncmp.

There have been two PRs in the past that have tried to optimize this method using strpos: #30952 and #16761. Using strpos is not a good idea, this comment on a previous PR explains why.

Benchmark

Haystack Needle Time %
Rows below use the current substr implementation
testte1193
testno1250
str_repeat('test', 2000)str_repeat('test', 20)1368
str_repeat('test', 2000)str_repeat('nope', 20)1344
str_repeat('test', 2000000)test1244
str_repeat('test', 2000000)nope1260
teststr_repeat('test', 2000000)1198
Rows below use a strpos implementation
testte990+17.0% ✔️
testno964+22.9% ✔️
str_repeat('test', 2000)str_repeat('test', 20)2674-95.4% ❌
str_repeat('test', 2000)str_repeat('nope', 20)7007-421% ❌
str_repeat('test', 2000000)test1018+18.1% ✔️
str_repeat('test', 2000000)nope99999+-999% ❌
teststr_repeat('test', 2000000)1003+16.3% ✔️
Rows below use the strncmp implementation proposed in this PR
testte1039+12.9% ✔️
testno1082+13.4% ✔️
str_repeat('test', 2000)str_repeat('test', 20)1192+12.9% ✔️
str_repeat('test', 2000)str_repeat('nope', 20)1205+10.3% ✔️
str_repeat('test', 2000000)test1080+13.2% ✔️
str_repeat('test', 2000000)nope1108+12.1% ✔️
teststr_repeat('test', 2000000)1181+1.4% ✔️

Benchmark code

/** @test */
function benchmark_str_starts_with()
{
    $haystack = 'test';

    $needle = 'te';

    $speed = [];

    for ($times = 0; $times < 5; $times++) {
        $startedAt = microtime(true) * 1000;

        for ($i = 0; $i < 10000000; $i++) {
            Str::startsWith($haystack, $needle);
        }

        $speed[] = (microtime(true) * 1000) - $startedAt;
    }

    dump(
        'Haystack: '.Str::limit($haystack, 20),
        'Needle: '.Str::limit($needle, 20),
        $speed,
        'Average: '.(array_sum($speed) / count($speed))
    );
}

@taylorotwell
Copy link
Member

Does this work with UTF-8 strings?

@SjorsO
Copy link
Contributor Author

SjorsO commented Apr 5, 2020

As far as I know, yes.

In the same way the old code used substr instead of mb_substr, we can use a non-multibyte function because we are comparing the start of a string, not a specific offset.

I've added an additional test with some Chinese characters (that are definitely UTF-8).

@taylorotwell taylorotwell merged commit 13eea24 into laravel:7.x Apr 6, 2020
@vlakoff
Copy link
Contributor

vlakoff commented Jul 4, 2020

The endsWith() method could be optimized similarly, by using substr_compare() with a negative index argument:

substr_compare($haystack, $needle, -strlen($needle)) === 0

props Will Hudgins, I discovered this code in a RFC for PHP 8 he has authored.

@vlakoff
Copy link
Contributor

vlakoff commented Jul 4, 2020

I have run some benchmarks with the above code for endsWith(), and it isn't faster. It seems to be a bit slower actually.

According to the RFC for PHP 8, it should be more memory efficient, still, but I think the more important is execution speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants