[7.x] Optimize Str::startsWith() #32243

SjorsO · 2020-04-05T09:47:51Z

This PR makes Str::startsWith around 10% faster by using strncmp.

There have been two PRs in the past that have tried to optimize this method using strpos: #30952 and #16761. Using strpos is not a good idea, this comment on a previous PR explains why.

Benchmark

Haystack	Needle	Time	%
Rows below use the current `substr` implementation
test	te	1193
test	no	1250
`str_repeat('test', 2000)`	`str_repeat('test', 20)`	1368
`str_repeat('test', 2000)`	`str_repeat('nope', 20)`	1344
`str_repeat('test', 2000000)`	test	1244
`str_repeat('test', 2000000)`	nope	1260
test	`str_repeat('test', 2000000)`	1198
Rows below use a `strpos` implementation
test	te	990	+17.0% ✔️
test	no	964	+22.9% ✔️
`str_repeat('test', 2000)`	`str_repeat('test', 20)`	2674	-95.4% ❌
`str_repeat('test', 2000)`	`str_repeat('nope', 20)`	7007	-421% ❌
`str_repeat('test', 2000000)`	test	1018	+18.1% ✔️
`str_repeat('test', 2000000)`	nope	99999+	-999% ❌
test	`str_repeat('test', 2000000)`	1003	+16.3% ✔️
Rows below use the `strncmp` implementation proposed in this PR
test	te	1039	+12.9% ✔️
test	no	1082	+13.4% ✔️
`str_repeat('test', 2000)`	`str_repeat('test', 20)`	1192	+12.9% ✔️
`str_repeat('test', 2000)`	`str_repeat('nope', 20)`	1205	+10.3% ✔️
`str_repeat('test', 2000000)`	test	1080	+13.2% ✔️
`str_repeat('test', 2000000)`	nope	1108	+12.1% ✔️
test	`str_repeat('test', 2000000)`	1181	+1.4% ✔️

Benchmark code

/** @test */
function benchmark_str_starts_with()
{
    $haystack = 'test';

    $needle = 'te';

    $speed = [];

    for ($times = 0; $times < 5; $times++) {
        $startedAt = microtime(true) * 1000;

        for ($i = 0; $i < 10000000; $i++) {
            Str::startsWith($haystack, $needle);
        }

        $speed[] = (microtime(true) * 1000) - $startedAt;
    }

    dump(
        'Haystack: '.Str::limit($haystack, 20),
        'Needle: '.Str::limit($needle, 20),
        $speed,
        'Average: '.(array_sum($speed) / count($speed))
    );
}

taylorotwell · 2020-04-05T15:44:14Z

Does this work with UTF-8 strings?

SjorsO · 2020-04-05T16:33:54Z

As far as I know, yes.

In the same way the old code used substr instead of mb_substr, we can use a non-multibyte function because we are comparing the start of a string, not a specific offset.

I've added an additional test with some Chinese characters (that are definitely UTF-8).

vlakoff · 2020-07-04T03:27:37Z

The endsWith() method could be optimized similarly, by using substr_compare() with a negative index argument:

substr_compare($haystack, $needle, -strlen($needle)) === 0

props Will Hudgins, I discovered this code in a RFC for PHP 8 he has authored.

vlakoff · 2020-07-04T05:02:28Z

I have run some benchmarks with the above code for endsWith(), and it isn't faster. It seems to be a bit slower actually.

According to the RFC for PHP 8, it should be more memory efficient, still, but I think the more important is execution speed.

optimize Str::startsWith by using strncmp

745cba9

add some tests with characters that are definitely utf8

6940bc9

taylorotwell merged commit 13eea24 into laravel:7.x Apr 6, 2020

vlakoff mentioned this pull request Jul 5, 2020

[7.x] Make Str::endsWith return false if both haystack and needle are empty strings #33434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[7.x] Optimize Str::startsWith() #32243

[7.x] Optimize Str::startsWith() #32243

SjorsO commented Apr 5, 2020 •

edited

taylorotwell commented Apr 5, 2020

SjorsO commented Apr 5, 2020 •

edited

vlakoff commented Jul 4, 2020

vlakoff commented Jul 4, 2020

[7.x] Optimize Str::startsWith() #32243

[7.x] Optimize Str::startsWith() #32243

Conversation

SjorsO commented Apr 5, 2020 • edited

Benchmark

Benchmark code

taylorotwell commented Apr 5, 2020

SjorsO commented Apr 5, 2020 • edited

vlakoff commented Jul 4, 2020

vlakoff commented Jul 4, 2020

SjorsO commented Apr 5, 2020 •

edited

SjorsO commented Apr 5, 2020 •

edited