Implement explain command #175

coffeemug · 2012-12-21T23:34:32Z

As discussed...

coffeemug · 2013-01-22T06:18:57Z

Not making into 1.4, moving to 1.5.

jdoliner · 2013-10-10T18:20:16Z

I'm getting close enough to a working implementation of this that I think we should discuss the API a bit. There are 2 components that make up this issue:

The backend which returns a profiling report (represented as a datum)
The frontend which renders it in the webUI

I'd like to get the backend part merged in without the frontend part for 2 reasons. First I'd like to get this to the people doing profiling as soon as possible because I think it will make their jobs easier and to avoid having a nasty merge conflict the easiest thing to do is merge it in to next. Second this will help a lot in providing support. One of the most common questions I get from users is "why is this query slow." When the query in question involves multiple tables I frequently need to ask, how many documents are in each table, how big documents in each table are, and what the schema of those documents are before I can even make a guess as to what the problem is. It gets to be a pretty painful experience for everyone involved. If I could just tell the user to run with explain and give me the output I can much more quickly give them an answer. Also while the output isn't going to be as friendly as webUI rendering will be I'm sure users will still be able to gain some insight from the queries.

Here's what I'd propose for the API:
First off I don't think I like the name explain in SQL explain means "explain to me the plan for this query but don't actually run it" this is something that doesn't make a ton of sense in RethinkDB because we don't really have an optimizer so query basically is the plan. What we can do is actually run the query and time various things as it's running. However I think this process is much better described by the verb profile. Here's my first proposal for the API:

>>> query.run(profile=True)
{
    "value" : query.run(),
    "profile" : [
        { #single step
            "description": "Read table"
            "duration(ms)": 5
            "sub_tasks" : [ ... ]
         },
         [ #Tasks happening in parallel
             [
                 {
                     "description": "Read from shard 1"
                     "duration(ms)": 5
                     "sub_tasks" : [ ... ]
                  },
                  ...
             ]
             [
                  {
                       "description": "Read from shard 2",
                       "duration(ms)": 5,
                       "sub_tasks" : [ ... ]
                   },
                   ...
             ]
        ],
        ...
    ]
}

coffeemug · 2013-10-10T18:43:50Z

Wow, that's awesome.

I strongly agree with merging this without even working on the front-end part. I think we should merge this in, use it for a bit (ourselves and with other people), learn how this turns out to work, what's useful and what isn't, may be tweak the API/information a bit, and only add the front-end visualization later. The profile command and the visualization for it seem like two separate killer features, not one killer feature.

On the API:

I like the name profile and the optarg to run. It feels right.
What is a "subtask"? Is it basically a subquery or is there a broader definition?
What is a step? If I have N steps, does that mean that data went back to the machine in question N times?
I think the API should include a bit more metadata. At least a node id for each task, and table name. Without that it would be really hard to tell what's going on.
I think in addition to a description we should include a "type" of a task, to make checking this programmatically easier (I'd imagine it would be an identifier string like "READ"). Actually the whole thing should be programmatically accessible in general -- I shouldn't have to parse the description to determine which shard the task is on. It should be metadata IMO.

I think it would be really cool to get an internal MVP of this working and post real output from an example query. It would probably make feedback much more constructive, since it's a bit hard to go off a pseudocode example.

coffeemug · 2013-10-10T18:47:10Z

Also, I would love to run this on some of my queries, like yesterday. This is really cool.

neumino · 2013-10-10T18:57:44Z

I agree for splitting back end/front end. I still have quite a lot of things to finish before working on the visualization part. I would tend to think that it's going to be tricky to build a good one, so merging the back end without the front end is definitely a good idea.

profile makes more sense too.

About the API, a few questions

Can we have which machine perform which task? (I suppose it's one machine per task right?)
Do we give information about intra cluster traffic? (just wondering)
Can we make sure that tasks executed in parallel are part of a meta task? That is to say, your example would be

>>> query.run(profile=True)
{
    "value" : query.run(),
    "profile" : [
        { #single step
            "description": "Read table"
            "duration(ms)": 5
            "sub_tasks" : [ ... ]
         },
         [ #Tasks happening in parallel
            "description": "Reading from shards"
            "duration(ms)": 5
            "tasks_in_parallel" : [
                 {
                     "description": "Read from shard 1"
                     "duration(ms)": 5
                     "sub_tasks" : [ ... ]
                  },
                  ...
             ]
             [
                  {
                       "description": "Read from shard 2",
                       "duration(ms)": 5,
                       "sub_tasks" : [ ... ]
                   },
                   ...
             ]
        ],
        ...
    ]
}

jdoliner · 2013-10-10T20:44:40Z

I'm gonna split responses up over a bunch of posts here just so I'm trying to wrangle a huge response.

What is a "subtask"? Is it basically a subquery or is there a broader definition?

A bit broader (although I'm not sure I have the most precise definition of a subquery). A subtask is a task whose duration duration is consider part of the other task. So taking the simple query: r.expr(1) + r.expr(1) you might get a profile like:

{
    "description" : "add",
    "duration(ms)": 1,
    "sub_tasks" : [
        {
            "description" : "expr"
            "duration(ms)" : .2
        },
        {
            "description" : "expr"
            "duration(ms)" : .1
         }
    ]
}

Think of it just like how most profiling tools will annotate function calls. Except when things go over the network they automatically send back their profiling information.

coffeemug · 2013-10-10T20:52:41Z

I know this is just an example, but in case you didn't think of it, I think you should seriously consider grouping low constant factor commands into a single task. For example, breaking up expressions like this into subtasks seems like it would make the output a lot less useful.

jdoliner · 2013-10-10T20:55:06Z

What is a step? If I have N steps, does that mean that data went back to the machine in question N times?

Steps are just things that happen sequentially, they don't correspond to jumps between physical machines. So in the example above each of the expr evaluations is a separate step.

I think the API should include a bit more metadata. At least a node id for each task, and table name. Without that it would be really hard to tell what's going on.

I definitely a agree about this. I've engineered things such that it will be a very easy to add fields to profile nodes. And all you have to do is change the function that renders it as a datum to propagate that information back up. Machine and shard information are obvious things to include. Thread might make sense as well. These will be easy to add as they're needed I just haven't gotten around to adding anything more than a simple string.

The above applies to the TYPE suggestion as well. I'd have to think a bit more about what TYPE would actually be but we'll definitely have something of that ilk.

mglukhovsky · 2013-10-10T20:56:41Z

I think this is wonderful. My only preference is to just call everything "tasks" instead of "subtasks" -- it seems obvious from the nesting of the output that it's part of a broader parent task.

jdoliner · 2013-10-10T21:07:47Z

Can we have which machine perform which task? (I suppose it's one machine per task right?)

We'll have this.

Do we give information about intra cluster traffic? (just wondering)

This is a bit vague to me. The most we can say about intra cluster traffic is, for example: "We were doing a read, we measured this process to take 10 ms. On the machine we did the read from it measured its process to take 7 ms. From this we surmise that there was 3 ms of time spent waiting for the network traffic." We could probably give information when we send information over the network, how much we're sending and to whom. However we can't in general say, we sent a message to this machine and it arrived 1.5 ms later.

Can we make sure that tasks executed in parallel are part of a meta task? That is to say, your example would be

This isn't explicitly enforced by the schema but in practice they always will be. Consider these for reference:

{
    "value" : ...,
    "profile" : A
}

{
    "value" : ...,
    "profile" : [
        {
             "duration" : ...,
             "description" : ...,
             "sub_tasks" : B
        }
    ]
}

The schema here is recursive such that A and B have an identical schema but in practice the top level will never contain parallel tasks, they're always going to wind up inside of sub_tasks with names like "performing read".

jdoliner · 2013-10-10T21:25:13Z

As for grouping tasks with low constant factor things in to groups I think that might be being a bit smarter with the backend than we want to be. We're definitely going to need to do some grouping for things like map. Mapping over a 1,000,000 elements could be thought of as 1,000,000 sequential steps. But that would be impractical to represent so instead we're going to need to have an aggregation node which basically says: "This operation total took n ms. It was performed 1,000,000 times. And something about the variance, a confidence interval seems like it might be what we want. How to represent the subtasks of such an operation might be a bit tricky but I think we'll figure something out.

I'm a bit dubious for doing it based on the content of the subtasks though since it's unclear exactly where we'd draw the line. For example we definitely want the full profile for the arguments to add if the query is something like:

r.table("foo").map(...).reduce(...) + r.table("bar").map(...).reduce(...)

and similarly if the arguments to expr are big enough it could become the bottleneck. It seems like it would take some time to come up with good heuristic rules for what to omit. The approach I normally see profiling tools taking is that you return output that gives the full unabridged story and then build tools that help you sift through the data, such as our visualizer. Heuristic 100 times easier to make useful when you give them to users as tools and let them guide the process. Granted we're not going to have that for a little while but looking at the types of queries that I see most users running they normally don't have a lot of terms in them like r.expr(1) that would cause enough noise to make the output unusable.

jdoliner · 2013-10-10T21:35:34Z

Also I was going to put off mentioning this for a while since it only really becomes meaningful once we have a visualizer but another piece of info we'll be including in each profile node is the backtrace info we use for error reporting. Once we have this we can present a view like this which shows the relative expense of various parts of the query:

coffeemug · 2013-10-11T06:46:45Z

I think the Unix philosophy of "do one thing and do it well" breaks down for profilers in practice. The profilers end up dumping too much data, which is useful in theory (if it could be analyzed), but is too voluminous to analyze in practice. Second-order analysis tools tend to leave a lot to be desired, so people end up feeling like they have enough information, but no immediately obvious actions they can take based on it.

I can easily think of a heuristic that would probably do a good job: group low constant time deterministic operations. Basically all the math, all the operations on fields, etc, but not point reads, math with nondeterministic components, array ops, etc. This isn't ideal (random isn't deterministic but should be grouped anyway), but I think it's pretty good. We might need to tweak this, but I think we could arrive at good heuristics pretty quickly. It doesn't seem hard.

Anyway, this is a bit hypothetical. I think we'll be able to tell very quickly what's worth doing and what isn't by running it on actual queries and examining the results.

jdoliner · 2013-11-05T20:42:35Z

This is in next as of c6baa1f. It was reviewed by @mlucy in CR 991 and ships in 1.11.

coffeemug · 2013-11-05T20:52:31Z

I believe this calls for the following -- http://www.youtube.com/watch?v=xB5ceAruYrI.

coffeemug · 2013-11-05T20:56:48Z

Also, my GOD the OS X build takes a long time. The anticipation is killing me :-D

jdoliner mentioned this issue Feb 13, 2013

Introduce ReQL logging facility #329

Closed

danielmewes mentioned this issue Sep 23, 2013

Think about how performance monitoring should work in a cluster #1475

Closed

ghost assigned jdoliner Oct 24, 2013

coffeemug mentioned this issue Nov 5, 2013

Add a query profiler visualizer #1608

Open

jdoliner closed this as completed Nov 5, 2013

coffeemug unassigned jdoliner Mar 12, 2014

orthecreedence mentioned this issue Mar 12, 2014

Is there a way to log or get timing data? orthecreedence/cl-rethinkdb#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement explain command #175

Implement explain command #175

coffeemug commented Dec 21, 2012

coffeemug commented Jan 22, 2013

jdoliner commented Oct 10, 2013

coffeemug commented Oct 10, 2013

coffeemug commented Oct 10, 2013

neumino commented Oct 10, 2013

jdoliner commented Oct 10, 2013

coffeemug commented Oct 10, 2013

jdoliner commented Oct 10, 2013

mglukhovsky commented Oct 10, 2013

jdoliner commented Oct 10, 2013

jdoliner commented Oct 10, 2013

jdoliner commented Oct 10, 2013

coffeemug commented Oct 11, 2013

jdoliner commented Nov 5, 2013

coffeemug commented Nov 5, 2013

coffeemug commented Nov 5, 2013

Implement explain command #175

Implement explain command #175

Comments

coffeemug commented Dec 21, 2012

coffeemug commented Jan 22, 2013

jdoliner commented Oct 10, 2013

coffeemug commented Oct 10, 2013

coffeemug commented Oct 10, 2013

neumino commented Oct 10, 2013

jdoliner commented Oct 10, 2013

coffeemug commented Oct 10, 2013

jdoliner commented Oct 10, 2013

mglukhovsky commented Oct 10, 2013

jdoliner commented Oct 10, 2013

jdoliner commented Oct 10, 2013

jdoliner commented Oct 10, 2013

coffeemug commented Oct 11, 2013

jdoliner commented Nov 5, 2013

coffeemug commented Nov 5, 2013

coffeemug commented Nov 5, 2013