Research pluck ineffiency #947

danielmewes · 2013-06-05T08:36:24Z

I was taking a brief look at YCSB and noticed that for point gets, we are doing a pluck. This is perfectly sane. Efficiently implemented, pluck should be a practically zero-cost operator.

But it isn't. Here's a quick experiment:

First the output:

Get: 875.38309583279qps
GetAll: 841.41265312264qps
Get pluck: 542.69162263385qps
GetAll pluck: 733.0771512848qps

The code:

r\db('test')->tableCreate('t55')->run($conn);
$doc = array('id' => "1", 'foo' => 'test');
    r\table('t55')->insert($doc)->run($conn);

// Experiment 1:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->get(1)->run($conn);
echo "Get: " . 10000 / (microtime(true) - $t) . "qps\n";

// Experiment 2:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->getAll(1)->run($conn);
echo "GetAll: " . 10000 / (microtime(true) - $t) . "qps\n";

// Experiment 3:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->get(1)->pluck(array('id', 'foo'))->run($conn);
echo "Get pluck: " . 10000 / (microtime(true) - $t) . "qps\n";

// Experiment 4:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->getAll(1)->pluck(array('id', 'foo'))->run($conn);
echo "GetAll pluck: " . 10000 / (microtime(true) - $t) . "qps\n";

RethinkDB CPU utilization while running the different parts:
Get: 60 %
GetAll: 60 %
Get + Pluck: 69 %
GetAll + Pluck: 55 %

Conclusion: Pluck makes gets slower by about 40 %. While I am using only a single client/connection, this cannot be attributed to a longer delay alone, as the CPU usage is actually higher with pluck, while processing fewer queries per second. Weirdly, pluck on sequences seems to be significantly faster than pluck on a single object.
We should find out what makes pluck so slow, and check if the same issue affects other operations as well.

The text was updated successfully, but these errors were encountered:

danielmewes · 2013-06-05T09:23:46Z

To give a little more info on how I think we should tackle this:

If somebody has a spontaneous idea on what might cause this, we should verify that hypothesis. @mlucy: any suggestions?
If 1 doesn't lead to success relatively quickly, we should take the systematic approach and profile the whole thing. I've successfully used the profiler built into Solaris Studio http://www.oracle.com/technetwork/server-storage/solarisstudio/downloads/index.html on RethinkDB in the past (used it on Linux), and I was very satisfied with what it did (I recently had to use the Intel Parallel Studio one and it was awful to use in comparison). @wmrowan: If not done already, I think it would make sense for you to set up some profiler. We will very likely need it for solving a number of issues soon anyways.

danielmewes · 2013-06-05T10:27:55Z

Some more results: I've used a varying number of chained without() to test the scalability of chaining this operation.
without() shows a similar performance penalty as pluck.

table('t55')->get(1)->run($conn);
table('t55')->get(1)->without([])->run($conn);
table('t55')->get(1)->without([])->without([])->run($conn);
...

Again, I've done this for both get() and getAll(). The plot shows the latency (~processing time) of a single such query.

For get(), chaining the withouts appears to scale quadratically, while for getAll it is scaling linearly.

jdoliner · 2013-06-05T16:53:24Z

I'm pretty sure this is caused by the fact that we make a copy for pluck. It would be pretty easy to fix. However we should bear in mind, there are a huge number of inefficiencies like this in the query language and we obviously don't want to take the time to fix them all right now. Ideally we want to only fix critical path things like base write performance and base read performance because otherwise performance becomes an intractable task. However I guess this issue becomes critical path due to the fact that YCSB uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test the hypothesis.

danielmewes · 2013-06-05T17:13:50Z

@jdoliner: Yes, we can absolutely live without it for a little while. YCSB doesn't have to use pluck. I believe that there actually are other more relevant issues with YCSB for now. However query processing will matter soon enough.

What really smells are the quadratic costs. Also, copying should in no way account for a 40 % performance hit. I mean we are talking about 800 queries per second here. Not 8 million. What I want to say is: I have some hope that there's a deeper issue behind this, which is not specific to pluck. Solving it could have a positive impact on a lot of queries.

mlucy · 2013-06-05T17:19:21Z

It appears to be quadratic because the code was written back when we cached
values between arg calls. In the case where pluck and without are
called on objects, arg(0) is evaluated twice; this means that if we have
nested calls we'll be doing O(2^d) work where d is the depth. This is easy
to fix.

On Wed, Jun 5, 2013 at 9:53 AM, Joe Doliner notifications@github.comwrote:

I'm pretty sure this is caused by the fact that we make a copy for pluck.
It would be pretty easy to fix. However we should bear in mind, there are a
huge number of inefficiencies like this in the query language and we
obviously don't want to take the time to fix them all right now. Ideally we
want to only fix critical path things like base write performance and base
read performance because otherwise performance becomes an intractable task.
However I guess this issue becomes critical path due to the fact that YCSB
uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test the
hypothesis.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18991322
.

mlucy · 2013-06-05T17:25:19Z

Daniel: I just pushed a branch pluck_without_fix to github; could you
check whether that fixes the problem?

On Wed, Jun 5, 2013 at 10:18 AM, Michael Lucy mlucy@rethinkdb.com wrote:

It appears to be quadratic because the code was written back when we
cached values between arg calls. In the case where pluck and without
are called on objects, arg(0) is evaluated twice; this means that if we
have nested calls we'll be doing O(2^d) work where d is the depth. This is
easy to fix.

On Wed, Jun 5, 2013 at 9:53 AM, Joe Doliner notifications@github.comwrote:

I'm pretty sure this is caused by the fact that we make a copy for pluck.
It would be pretty easy to fix. However we should bear in mind, there are a
huge number of inefficiencies like this in the query language and we
obviously don't want to take the time to fix them all right now. Ideally we
want to only fix critical path things like base write performance and base
read performance because otherwise performance becomes an intractable task.
However I guess this issue becomes critical path due to the fact that YCSB
uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test
the hypothesis.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18991322
.

mlucy · 2013-06-05T17:26:41Z

The fact that this was messed up in at least one place also makes
#927 more urgent. I'm going
to move it into 1.6.

On Wed, Jun 5, 2013 at 10:24 AM, Michael Lucy mlucy@rethinkdb.com wrote:

Daniel: I just pushed a branch pluck_without_fix to github; could you
check whether that fixes the problem?

On Wed, Jun 5, 2013 at 10:18 AM, Michael Lucy mlucy@rethinkdb.com wrote:

It appears to be quadratic because the code was written back when we
cached values between arg calls. In the case where pluck and without
are called on objects, arg(0) is evaluated twice; this means that if we
have nested calls we'll be doing O(2^d) work where d is the depth. This is
easy to fix.

On Wed, Jun 5, 2013 at 9:53 AM, Joe Doliner notifications@github.comwrote:

I'm pretty sure this is caused by the fact that we make a copy for pluck.
It would be pretty easy to fix. However we should bear in mind, there are a
huge number of inefficiencies like this in the query language and we
obviously don't want to take the time to fix them all right now. Ideally we
want to only fix critical path things like base write performance and base
read performance because otherwise performance becomes an intractable task.
However I guess this issue becomes critical path due to the fact that YCSB
uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test
the hypothesis.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18991322
.

danielmewes · 2013-06-05T18:06:17Z

Thanks @mlucy. The problem is gone.

Get: 1161.05924555qps
GetAll: 1108.7861070552qps
Get pluck: 1032.0003712373qps
GetAll pluck: 907.12853594443qps

And I can chain without() behind a get() as much as I want... :-)

@jdoliner: I consider this solved. I think it is not necessary to make pluck or other commands copy-free at this point, for the reasons that you've mentioned. We will get back to that in a couple months maybe...

mlucy · 2013-06-05T18:18:03Z

Cool. This is in code-review 607 by Marc.

On Wed, Jun 5, 2013 at 11:06 AM, Daniel Mewes notifications@github.comwrote:

Thanks @mlucy https://github.com/mlucy. The problem is gone.

Get: 1161.05924555qps
GetAll: 1108.7861070552qps
Get pluck: 1032.0003712373qps
GetAll pluck: 907.12853594443qps

And I can chain without() behind a get() as much as I want... :-)

@jdoliner https://github.com/jdoliner: I consider this solved. I think
it is not necessary to make pluck or other commands copy-free at this
point, for the reasons that you've mentioned. We will get back to that in a
couple months maybe...

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18995797
.

coffeemug · 2013-06-05T19:49:25Z

there are a huge number of inefficiencies like this in the query language and we obviously don't want to take the time to fix them all right now

Here is how I would like to structure this process. First, I think we should become competitive with mongodb on standard predefined YCSB workloads (or, hopefully, do a lot better than them). We should fix all operations that are on the critical path. This will give us two benefits:

A standard set of benchmarks everyone else knows and uses that we can publish and point people to.
I suspect that the overall qualitative performance of the system will likely increase significantly by virtue of fixing all these low hanging fruit (default YCSB workloads are really good at fixing low-hanging fruit).

After that, we should just follow our users and fix things they complain about workload by workload.

This gives us more than enough to work with, and narrows the playing field quite a bit.

jdoliner · 2013-06-05T19:58:25Z

@coffeemug it's kind of moot in this case because we already solved it but what you're saying doesn't really clarify issues like this. In this case pluck is on the critical path only because of how we choose to implement YCSB. We could have implemented in another way and completely ignored pluck. So we need to give a bit more thought to what's actually on the critical path and what appears to be on the critical path.

wmrowan · 2013-06-05T20:12:12Z

There are only a few query languages features that we use in YCSB. Beyond the basic operations like get, between, and update, that we have to have, we only use pluck, and limit.

With both pluck and limit we could choose to implement the same functionality in the client but that involves transferring more data to the client than is strictly necessary. Ultimately it will be faster to ensure that these commands are fast than to try and work around the fact that they are slow by avoiding them. Given how easy it was to fix pluck (and I think limit is already doing the right thing) I don't think this is actually something we need to worry about anymore.

jdoliner · 2013-06-05T20:14:41Z

@wmrowan fair enough. This seems to be more of a philosophical question than a real one then.

coffeemug · 2013-06-07T01:11:53Z

Since this is already fixed I'm moving it to 1.6 and assigning to @mlucy. Please feel free to close when the fix makes it into next.

mlucy · 2013-06-07T17:57:05Z

This is in next, review 607.

ghost assigned mlucy Jun 7, 2013

mlucy closed this as completed Jun 7, 2013

mlucy mentioned this issue Jul 25, 2013

Investigate pluck performance #1200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research pluck ineffiency #947

Research pluck ineffiency #947

danielmewes commented Jun 5, 2013

danielmewes commented Jun 5, 2013

danielmewes commented Jun 5, 2013

jdoliner commented Jun 5, 2013

danielmewes commented Jun 5, 2013

mlucy commented Jun 5, 2013

mlucy commented Jun 5, 2013

mlucy commented Jun 5, 2013

danielmewes commented Jun 5, 2013

mlucy commented Jun 5, 2013

coffeemug commented Jun 5, 2013

jdoliner commented Jun 5, 2013

wmrowan commented Jun 5, 2013

jdoliner commented Jun 5, 2013

coffeemug commented Jun 7, 2013

mlucy commented Jun 7, 2013

Research pluck ineffiency #947

Research pluck ineffiency #947

Comments

danielmewes commented Jun 5, 2013

danielmewes commented Jun 5, 2013

danielmewes commented Jun 5, 2013

jdoliner commented Jun 5, 2013

danielmewes commented Jun 5, 2013

mlucy commented Jun 5, 2013

mlucy commented Jun 5, 2013

mlucy commented Jun 5, 2013

danielmewes commented Jun 5, 2013

mlucy commented Jun 5, 2013

coffeemug commented Jun 5, 2013

jdoliner commented Jun 5, 2013

wmrowan commented Jun 5, 2013

jdoliner commented Jun 5, 2013

coffeemug commented Jun 7, 2013

mlucy commented Jun 7, 2013