Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.8] Arr::flatten performance #28614

Merged
merged 1 commit into from May 26, 2019

Conversation

Projects
None yet
7 participants
@bbsnly
Copy link
Contributor

commented May 25, 2019

MacBook Pro (2017), i5, 16 GB

Before: 1.92s
After: 6.5ms

$arr = [];
for ($i = 0; $i < 20000; $i++) {
    $arr[] = ['test'];
}
Arr::flatten($arr);

Before: 6.33s
After: 87ms

$arr = [];
for ($i = 0; $i < 20000; $i++) {
    $arr[] = ['name' => 'John', 'email' => 'john@me.com'];
}
app('db')->table('users')->insert($arr);

Before: 5.7s
After: 88.93ms

$arr = [];
for ($i = 0; $i < 20000; $i++) {
    $arr[] = ['name' => 'John', 'email' => 'john@me.com'];;
}
User::insert($arr);

@bbsnly bbsnly changed the title Arr::flatten performance [5.8] Arr::flatten performance May 25, 2019

@laurencei

This comment has been minimized.

Copy link
Member

commented May 26, 2019

That's some impressive performance increases. For my own curiosity, and to help Taylor when he comes to review this PR, are you able to explain what the change actually is? i.e. we can see the code that has been changed, but what is the "logic" that allowed you to release there was such a performance improvement here.

And how do you know this gives the same result in all situations etc etc.

@bbsnly

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

@laurencei I was comparing the insert performance between Eloquent and Doctrine, to do so I prepared ten different ways to insert new data with Laravel (Eloquent, QueryBuilder, direct connection). At some point I saw that when doing bulk insertion with thousands of items the performance was very poor.
When debugging I found the bottleneck here Illuminate/Database/Query/Builder::insert - Arr::flatten. It worked perfectly with a low amount of items but when using it with tens of thousands of items it was very slow.
The explanation of my solution is very easy: array_merge is slow when merging to a big array as it looks and compares keys where the concatenation operation is very fast. We are not interested in keys (array_values is used) so by using the array_merge function we will perform many operations we do not need.

Performance:
array_merge: 1 item - 3.1E-6; 100k items - 63.06s
concatenation: 1 item - 2.1E-6; 100k items - 3.53ms

@taylorotwell taylorotwell merged commit 670df8c into laravel:5.8 May 26, 2019

2 checks passed

continuous-integration/styleci/pr The analysis has passed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@taylorotwell

This comment has been minimized.

Copy link
Member

commented May 26, 2019

Nice, thanks!

@mfn

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

Very nice improvements, thanks @bbsnly

array_merge is slow when merging to a big array

Can confirm, in my experience the 2nd worst bottleneck (1st being the database 😉) I've seen in practice (and: being easily identifiable) is this function. If used in a loop and large datasets, almost always a recipe to avoid.

@Korko

This comment has been minimized.

Copy link

commented May 28, 2019

Nice one! I would have preferred array_push($result, ...$values) instead of a foreach loop but the performance are almost the same between the 2 solutions https://3v4l.org/AH1ZX

@mfn

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

array_push($result, ...$values)

wow, that's a neat trick!

I wonder if there are downsides or limitations of argument unpacking a large set for a function call?

@solofeed

This comment has been minimized.

Copy link

commented May 29, 2019

@mfn We can add a check for the size of the array and decide what to use

@bbsnly What do you think about it? maybe you can improve speed more?)

@mfn

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

We can add a check for the size of the array and decide what to use

Yeah but that already requires a conclusion if there is an issue with bigger sets. Is there ? ;)

@bbashy

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

@solofeed why the mention?

@solofeed

This comment has been minimized.

Copy link

commented May 29, 2019

@solofeed why the mention?

Sry) It was a mistake)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.