Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research pluck ineffiency #947

Closed
danielmewes opened this issue Jun 5, 2013 · 15 comments
Closed

Research pluck ineffiency #947

danielmewes opened this issue Jun 5, 2013 · 15 comments
Assignees
Milestone

Comments

@danielmewes
Copy link
Member

I was taking a brief look at YCSB and noticed that for point gets, we are doing a pluck. This is perfectly sane. Efficiently implemented, pluck should be a practically zero-cost operator.

But it isn't. Here's a quick experiment:

First the output:

Get: 875.38309583279qps
GetAll: 841.41265312264qps
Get pluck: 542.69162263385qps
GetAll pluck: 733.0771512848qps

The code:

r\db('test')->tableCreate('t55')->run($conn);
$doc = array('id' => "1", 'foo' => 'test');
    r\table('t55')->insert($doc)->run($conn);

// Experiment 1:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->get(1)->run($conn);
echo "Get: " . 10000 / (microtime(true) - $t) . "qps\n";

// Experiment 2:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->getAll(1)->run($conn);
echo "GetAll: " . 10000 / (microtime(true) - $t) . "qps\n";

// Experiment 3:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->get(1)->pluck(array('id', 'foo'))->run($conn);
echo "Get pluck: " . 10000 / (microtime(true) - $t) . "qps\n";

// Experiment 4:
$t = microtime(true);
for ($i = 0; $i < 10000; ++$i)
    r\table('t55')->getAll(1)->pluck(array('id', 'foo'))->run($conn);
echo "GetAll pluck: " . 10000 / (microtime(true) - $t) . "qps\n";

RethinkDB CPU utilization while running the different parts:
Get: 60 %
GetAll: 60 %
Get + Pluck: 69 %
GetAll + Pluck: 55 %

Conclusion: Pluck makes gets slower by about 40 %. While I am using only a single client/connection, this cannot be attributed to a longer delay alone, as the CPU usage is actually higher with pluck, while processing fewer queries per second. Weirdly, pluck on sequences seems to be significantly faster than pluck on a single object.
We should find out what makes pluck so slow, and check if the same issue affects other operations as well.

@danielmewes
Copy link
Member Author

To give a little more info on how I think we should tackle this:

  1. If somebody has a spontaneous idea on what might cause this, we should verify that hypothesis. @mlucy: any suggestions?
  2. If 1 doesn't lead to success relatively quickly, we should take the systematic approach and profile the whole thing. I've successfully used the profiler built into Solaris Studio http://www.oracle.com/technetwork/server-storage/solarisstudio/downloads/index.html on RethinkDB in the past (used it on Linux), and I was very satisfied with what it did (I recently had to use the Intel Parallel Studio one and it was awful to use in comparison). @wmrowan: If not done already, I think it would make sense for you to set up some profiler. We will very likely need it for solving a number of issues soon anyways.

@danielmewes
Copy link
Member Author

Some more results: I've used a varying number of chained without() to test the scalability of chaining this operation.
without() shows a similar performance penalty as pluck.

table('t55')->get(1)->run($conn);
table('t55')->get(1)->without([])->run($conn);
table('t55')->get(1)->without([])->without([])->run($conn);
...

Again, I've done this for both get() and getAll(). The plot shows the latency (~processing time) of a single such query.

without chaining plot

For get(), chaining the withouts appears to scale quadratically, while for getAll it is scaling linearly.

@jdoliner
Copy link
Contributor

jdoliner commented Jun 5, 2013

I'm pretty sure this is caused by the fact that we make a copy for pluck. It would be pretty easy to fix. However we should bear in mind, there are a huge number of inefficiencies like this in the query language and we obviously don't want to take the time to fix them all right now. Ideally we want to only fix critical path things like base write performance and base read performance because otherwise performance becomes an intractable task. However I guess this issue becomes critical path due to the fact that YCSB uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test the hypothesis.

@danielmewes
Copy link
Member Author

@jdoliner: Yes, we can absolutely live without it for a little while. YCSB doesn't have to use pluck. I believe that there actually are other more relevant issues with YCSB for now. However query processing will matter soon enough.

What really smells are the quadratic costs. Also, copying should in no way account for a 40 % performance hit. I mean we are talking about 800 queries per second here. Not 8 million. What I want to say is: I have some hope that there's a deeper issue behind this, which is not specific to pluck. Solving it could have a positive impact on a lot of queries.

@mlucy
Copy link
Member

mlucy commented Jun 5, 2013

It appears to be quadratic because the code was written back when we cached
values between arg calls. In the case where pluck and without are
called on objects, arg(0) is evaluated twice; this means that if we have
nested calls we'll be doing O(2^d) work where d is the depth. This is easy
to fix.

On Wed, Jun 5, 2013 at 9:53 AM, Joe Doliner notifications@github.comwrote:

I'm pretty sure this is caused by the fact that we make a copy for pluck.
It would be pretty easy to fix. However we should bear in mind, there are a
huge number of inefficiencies like this in the query language and we
obviously don't want to take the time to fix them all right now. Ideally we
want to only fix critical path things like base write performance and base
read performance because otherwise performance becomes an intractable task.
However I guess this issue becomes critical path due to the fact that YCSB
uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test the
hypothesis.


Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18991322
.

@mlucy
Copy link
Member

mlucy commented Jun 5, 2013

Daniel: I just pushed a branch pluck_without_fix to github; could you
check whether that fixes the problem?

On Wed, Jun 5, 2013 at 10:18 AM, Michael Lucy mlucy@rethinkdb.com wrote:

It appears to be quadratic because the code was written back when we
cached values between arg calls. In the case where pluck and without
are called on objects, arg(0) is evaluated twice; this means that if we
have nested calls we'll be doing O(2^d) work where d is the depth. This is
easy to fix.

On Wed, Jun 5, 2013 at 9:53 AM, Joe Doliner notifications@github.comwrote:

I'm pretty sure this is caused by the fact that we make a copy for pluck.
It would be pretty easy to fix. However we should bear in mind, there are a
huge number of inefficiencies like this in the query language and we
obviously don't want to take the time to fix them all right now. Ideally we
want to only fix critical path things like base write performance and base
read performance because otherwise performance becomes an intractable task.
However I guess this issue becomes critical path due to the fact that YCSB
uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test
the hypothesis.


Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18991322
.

@mlucy
Copy link
Member

mlucy commented Jun 5, 2013

The fact that this was messed up in at least one place also makes
#927 more urgent. I'm going
to move it into 1.6.

On Wed, Jun 5, 2013 at 10:24 AM, Michael Lucy mlucy@rethinkdb.com wrote:

Daniel: I just pushed a branch pluck_without_fix to github; could you
check whether that fixes the problem?

On Wed, Jun 5, 2013 at 10:18 AM, Michael Lucy mlucy@rethinkdb.com wrote:

It appears to be quadratic because the code was written back when we
cached values between arg calls. In the case where pluck and without
are called on objects, arg(0) is evaluated twice; this means that if we
have nested calls we'll be doing O(2^d) work where d is the depth. This is
easy to fix.

On Wed, Jun 5, 2013 at 9:53 AM, Joe Doliner notifications@github.comwrote:

I'm pretty sure this is caused by the fact that we make a copy for pluck.
It would be pretty easy to fix. However we should bear in mind, there are a
huge number of inefficiencies like this in the query language and we
obviously don't want to take the time to fix them all right now. Ideally we
want to only fix critical path things like base write performance and base
read performance because otherwise performance becomes an intractable task.
However I guess this issue becomes critical path due to the fact that YCSB
uses it. Is there anyway we could work around this and have YCSB not use it?

Also I'm going to hack up a copy free version of pluck so we can test
the hypothesis.


Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18991322
.

@danielmewes
Copy link
Member Author

Thanks @mlucy. The problem is gone.

Get: 1161.05924555qps
GetAll: 1108.7861070552qps
Get pluck: 1032.0003712373qps
GetAll pluck: 907.12853594443qps

And I can chain without() behind a get() as much as I want... :-)

@jdoliner: I consider this solved. I think it is not necessary to make pluck or other commands copy-free at this point, for the reasons that you've mentioned. We will get back to that in a couple months maybe...

@mlucy
Copy link
Member

mlucy commented Jun 5, 2013

Cool. This is in code-review 607 by Marc.

On Wed, Jun 5, 2013 at 11:06 AM, Daniel Mewes notifications@github.comwrote:

Thanks @mlucy https://github.com/mlucy. The problem is gone.

Get: 1161.05924555qps
GetAll: 1108.7861070552qps
Get pluck: 1032.0003712373qps
GetAll pluck: 907.12853594443qps

And I can chain without() behind a get() as much as I want... :-)

@jdoliner https://github.com/jdoliner: I consider this solved. I think
it is not necessary to make pluck or other commands copy-free at this
point, for the reasons that you've mentioned. We will get back to that in a
couple months maybe...


Reply to this email directly or view it on GitHubhttps://github.com//issues/947#issuecomment-18995797
.

@coffeemug
Copy link
Contributor

there are a huge number of inefficiencies like this in the query language and we obviously don't want to take the time to fix them all right now

Here is how I would like to structure this process. First, I think we should become competitive with mongodb on standard predefined YCSB workloads (or, hopefully, do a lot better than them). We should fix all operations that are on the critical path. This will give us two benefits:

  • A standard set of benchmarks everyone else knows and uses that we can publish and point people to.
  • I suspect that the overall qualitative performance of the system will likely increase significantly by virtue of fixing all these low hanging fruit (default YCSB workloads are really good at fixing low-hanging fruit).

After that, we should just follow our users and fix things they complain about workload by workload.

This gives us more than enough to work with, and narrows the playing field quite a bit.

@jdoliner
Copy link
Contributor

jdoliner commented Jun 5, 2013

@coffeemug it's kind of moot in this case because we already solved it but what you're saying doesn't really clarify issues like this. In this case pluck is on the critical path only because of how we choose to implement YCSB. We could have implemented in another way and completely ignored pluck. So we need to give a bit more thought to what's actually on the critical path and what appears to be on the critical path.

@wmrowan
Copy link
Contributor

wmrowan commented Jun 5, 2013

There are only a few query languages features that we use in YCSB. Beyond the basic operations like get, between, and update, that we have to have, we only use pluck, and limit.

With both pluck and limit we could choose to implement the same functionality in the client but that involves transferring more data to the client than is strictly necessary. Ultimately it will be faster to ensure that these commands are fast than to try and work around the fact that they are slow by avoiding them. Given how easy it was to fix pluck (and I think limit is already doing the right thing) I don't think this is actually something we need to worry about anymore.

@jdoliner
Copy link
Contributor

jdoliner commented Jun 5, 2013

@wmrowan fair enough. This seems to be more of a philosophical question than a real one then.

@ghost ghost assigned mlucy Jun 7, 2013
@coffeemug
Copy link
Contributor

Since this is already fixed I'm moving it to 1.6 and assigning to @mlucy. Please feel free to close when the fix makes it into next.

@mlucy
Copy link
Member

mlucy commented Jun 7, 2013

This is in next, review 607.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants