[5.6] Optimize query builder's pluck() method#23482
[5.6] Optimize query builder's pluck() method#23482taylorotwell merged 4 commits intolaravel:5.6from
Conversation
|
What's the status of fetch code? It seems as if it cannot be changed in full-stack Laravel, whereas it can be changed when using illuminate/database as a stand-alone package (via So please tell me which is the correct way forward:
Even if the second option is preferred, taking care of this change in the tests may be worthwhile as well, as this would make the mocks more realistic (because full-stack Laravel always returns objects). |
| * @param callable $callback | ||
| * @return mixed | ||
| */ | ||
| protected function onceWithColumns($columns, $callback) |
There was a problem hiding this comment.
Is this the correct location for the helper method? Or should it be placed below get()?
|
Thank you for your work on this. Much appreciated. Ah ah yes, fetch mode has been removed, see notably #17557. Still, users can modify the fetch mode, so currently your code would break with "array fetch modes". Off the top of my head:
|
|
Also, keep in mind the (as a reminder: at some time, laravel was using |
| foreach ($queryResult as $row) { | ||
| $itemValue = $row->$column; | ||
|
|
||
| if (is_null($key)) { |
There was a problem hiding this comment.
I would move the is_null($key) function call outside of the loop.
(replacing with $key === null would be fine too, but you know the laravel convention...)
|
Can you provide some benchmarks? I believe this kind of changes does not get merged without some kind of benchmarks. |
|
I will gladly do that once I get some official buy-in. ;) |
07201ac to
f6894c0
Compare
|
Okay, I refactored a bit, tests are working. BenchmarkTable with 10885 rows, 1000 iterations So, almost 2x speedup. |
| } else { | ||
| $results[$row->$key] = $row->$column; | ||
| } | ||
| } |
There was a problem hiding this comment.
I would move the is_null() function call outside of the loop:
if (is_null($key)) {
foreach ($queryResult as $row) {
$results[] = $row->$column;
}
} else {
foreach ($queryResult as $row) {
$results[$row->$key] = $row->$column;
}
}It's a bit longer, but saving thousands of function calls is worth it, and the code is still readable, maybe even a bit better.
| } else { | ||
| $results[$row[$key]] = $row[$column]; | ||
| } | ||
| } |
There was a problem hiding this comment.
Same is_null() unlooping as in pluckFromObjectColumn().
|
|
||
| return tap($callback(), function () use ($original) { | ||
| $this->columns = $original; | ||
| }); |
There was a problem hiding this comment.
I know this one is "Laravel style", but sorry it's just incredibly painful to understand, especially if you are not used to this voodoo helper. Can't we just stay with the regular and simple lines:
$results = $callback();
$this->columns = $original;
return $results;There was a problem hiding this comment.
Yeah, you're right. (I'm used to it from Ruby land, where it's far more readable.) I've advocated against the overuse of tap in Laravel in the past myself, whoops. Thanks for the feedback.
|
Strong +1 for this. Sure there is a bit of added code, but it's not that much, and it's still very understandable. The DB Also, would you rerun your benchmark with the unlooped |
This allows us to reimplement pluck() without calling get(), while avoiding the duplication of a lot of code. As a next step, pluck() can be re-implemented without relying on the (comparatively expensive) same method on the Collection class.
This prepares for the final step of removing (without replacement) the expensive parts of its implementation (from which, I suspect, the code will get shorter again).
Because we don't need all the features of data_get() - such as dot access - there is no need to use the function, which can become expensive when doing this with many, many rows in a loop.
f6894c0 to
5e444b7
Compare
|
@vlakoff I've taken care of your feedback, thank you! Updated BenchmarkTable with 10885 rows, 1000 iterations |
|
@franzliedke @vlakoff |
|
Comparing this bench with the previous one, the |
|
@carusogabriel Indeed. If it were up to me, I would definitely use this. But the Laravel codebase uses |
|
Well, since we're not doing it in a loop, this type of micro-optimization won't have any measurable effect anyway. |
|
@vlakoff Maybe is some topic we can put at @franzliedke Isn't micro, we reduce from 3 opcodes to 1, see:
|
|
The difference in number of opcodes is because of (what I think is called) constant folding. An optimization that is performed by PHP because
|
|
Totally like the PR. But shouldn't we be cautious and rather target master/5.7 with this? |
|
|
||
| $result = $callback(); | ||
|
|
||
| $this->columns = $original; |
There was a problem hiding this comment.
Shouldn't this be wrapped in a try/finally block? We want to make sure we always reset them.
There was a problem hiding this comment.
It wasn't before my refactoring, so I don't want to change this here...
|
So… it seems to be ready. Anything remaining to do? |
|
Replying to mfn:
The change is supposed to be transparent ;-) If ever it breaks something, postponing the change to 5.7 wouldn't have made it better. In such a scenario, we would have to fix it asap or revert the change if not quickly fixable. Would be the same for any branch. edit: Applying it to 5.7, it would have some "beta testing" phase, indeed. But as it's not changing the API, I still think it should be applied the soonest (and fixed the soonest, in worst case scenario). |
|
Optimizing pluck (on our own fork) on a project resulted in a previous ~2-minute process optimizing to around 20 seconds. Glad to see someone has also encountered this as well and raised attention to it. |
|
Thanks. Sorry for the delay 😄 |
Proposed by @vlakoff in laravel/ideas#98 and yesterday acknowledged by Taylor...
This refactors the
pluckmethod (and, to stay DRY, thegetmethod) ofIlluminate\Database\Query\Builderto no longer rely onobject_getinternally. As reported in the aforementioned issue, this can become a bottleneck when looping over a large number of results.I've inlined
Collection::pluckand thenArr::pluck, subsequently throwing away any overly generic code that is not necessary in this use case - because we know the data type of query results. Also, we don't need the other features ofobject_get, such as dot notation and default values.