Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement explain command #175

Closed
coffeemug opened this issue Dec 21, 2012 · 16 comments
Closed

Implement explain command #175

coffeemug opened this issue Dec 21, 2012 · 16 comments
Milestone

Comments

@coffeemug
Copy link
Contributor

As discussed...

@coffeemug
Copy link
Contributor Author

Not making into 1.4, moving to 1.5.

@jdoliner
Copy link
Contributor

I'm getting close enough to a working implementation of this that I think we should discuss the API a bit. There are 2 components that make up this issue:

  • The backend which returns a profiling report (represented as a datum)
  • The frontend which renders it in the webUI

I'd like to get the backend part merged in without the frontend part for 2 reasons. First I'd like to get this to the people doing profiling as soon as possible because I think it will make their jobs easier and to avoid having a nasty merge conflict the easiest thing to do is merge it in to next. Second this will help a lot in providing support. One of the most common questions I get from users is "why is this query slow." When the query in question involves multiple tables I frequently need to ask, how many documents are in each table, how big documents in each table are, and what the schema of those documents are before I can even make a guess as to what the problem is. It gets to be a pretty painful experience for everyone involved. If I could just tell the user to run with explain and give me the output I can much more quickly give them an answer. Also while the output isn't going to be as friendly as webUI rendering will be I'm sure users will still be able to gain some insight from the queries.

Here's what I'd propose for the API:
First off I don't think I like the name explain in SQL explain means "explain to me the plan for this query but don't actually run it" this is something that doesn't make a ton of sense in RethinkDB because we don't really have an optimizer so query basically is the plan. What we can do is actually run the query and time various things as it's running. However I think this process is much better described by the verb profile. Here's my first proposal for the API:

>>> query.run(profile=True)
{
    "value" : query.run(),
    "profile" : [
        { #single step
            "description": "Read table"
            "duration(ms)": 5
            "sub_tasks" : [ ... ]
         },
         [ #Tasks happening in parallel
             [
                 {
                     "description": "Read from shard 1"
                     "duration(ms)": 5
                     "sub_tasks" : [ ... ]
                  },
                  ...
             ]
             [
                  {
                       "description": "Read from shard 2",
                       "duration(ms)": 5,
                       "sub_tasks" : [ ... ]
                   },
                   ...
             ]
        ],
        ...
    ]
}

@coffeemug
Copy link
Contributor Author

Wow, that's awesome.

I strongly agree with merging this without even working on the front-end part. I think we should merge this in, use it for a bit (ourselves and with other people), learn how this turns out to work, what's useful and what isn't, may be tweak the API/information a bit, and only add the front-end visualization later. The profile command and the visualization for it seem like two separate killer features, not one killer feature.

On the API:

  • I like the name profile and the optarg to run. It feels right.
  • What is a "subtask"? Is it basically a subquery or is there a broader definition?
  • What is a step? If I have N steps, does that mean that data went back to the machine in question N times?
  • I think the API should include a bit more metadata. At least a node id for each task, and table name. Without that it would be really hard to tell what's going on.
  • I think in addition to a description we should include a "type" of a task, to make checking this programmatically easier (I'd imagine it would be an identifier string like "READ"). Actually the whole thing should be programmatically accessible in general -- I shouldn't have to parse the description to determine which shard the task is on. It should be metadata IMO.

I think it would be really cool to get an internal MVP of this working and post real output from an example query. It would probably make feedback much more constructive, since it's a bit hard to go off a pseudocode example.

@coffeemug
Copy link
Contributor Author

Also, I would love to run this on some of my queries, like yesterday. This is really cool.

@neumino
Copy link
Member

neumino commented Oct 10, 2013

I agree for splitting back end/front end. I still have quite a lot of things to finish before working on the visualization part. I would tend to think that it's going to be tricky to build a good one, so merging the back end without the front end is definitely a good idea.

profile makes more sense too.

About the API, a few questions

  • Can we have which machine perform which task? (I suppose it's one machine per task right?)
  • Do we give information about intra cluster traffic? (just wondering)
  • Can we make sure that tasks executed in parallel are part of a meta task? That is to say, your example would be
>>> query.run(profile=True)
{
    "value" : query.run(),
    "profile" : [
        { #single step
            "description": "Read table"
            "duration(ms)": 5
            "sub_tasks" : [ ... ]
         },
         [ #Tasks happening in parallel
            "description": "Reading from shards"
            "duration(ms)": 5
            "tasks_in_parallel" : [
                 {
                     "description": "Read from shard 1"
                     "duration(ms)": 5
                     "sub_tasks" : [ ... ]
                  },
                  ...
             ]
             [
                  {
                       "description": "Read from shard 2",
                       "duration(ms)": 5,
                       "sub_tasks" : [ ... ]
                   },
                   ...
             ]
        ],
        ...
    ]
}

@jdoliner
Copy link
Contributor

I'm gonna split responses up over a bunch of posts here just so I'm trying to wrangle a huge response.

What is a "subtask"? Is it basically a subquery or is there a broader definition?

A bit broader (although I'm not sure I have the most precise definition of a subquery). A subtask is a task whose duration duration is consider part of the other task. So taking the simple query: r.expr(1) + r.expr(1) you might get a profile like:

{
    "description" : "add",
    "duration(ms)": 1,
    "sub_tasks" : [
        {
            "description" : "expr"
            "duration(ms)" : .2
        },
        {
            "description" : "expr"
            "duration(ms)" : .1
         }
    ]
}       

Think of it just like how most profiling tools will annotate function calls. Except when things go over the network they automatically send back their profiling information.

@coffeemug
Copy link
Contributor Author

I know this is just an example, but in case you didn't think of it, I think you should seriously consider grouping low constant factor commands into a single task. For example, breaking up expressions like this into subtasks seems like it would make the output a lot less useful.

@jdoliner
Copy link
Contributor

What is a step? If I have N steps, does that mean that data went back to the machine in question N times?

Steps are just things that happen sequentially, they don't correspond to jumps between physical machines. So in the example above each of the expr evaluations is a separate step.

I think the API should include a bit more metadata. At least a node id for each task, and table name. Without that it would be really hard to tell what's going on.

I definitely a agree about this. I've engineered things such that it will be a very easy to add fields to profile nodes. And all you have to do is change the function that renders it as a datum to propagate that information back up. Machine and shard information are obvious things to include. Thread might make sense as well. These will be easy to add as they're needed I just haven't gotten around to adding anything more than a simple string.

The above applies to the TYPE suggestion as well. I'd have to think a bit more about what TYPE would actually be but we'll definitely have something of that ilk.

@mglukhovsky
Copy link
Member

I think this is wonderful. My only preference is to just call everything "tasks" instead of "subtasks" -- it seems obvious from the nesting of the output that it's part of a broader parent task.

@jdoliner
Copy link
Contributor

Can we have which machine perform which task? (I suppose it's one machine per task right?)

We'll have this.

Do we give information about intra cluster traffic? (just wondering)

This is a bit vague to me. The most we can say about intra cluster traffic is, for example: "We were doing a read, we measured this process to take 10 ms. On the machine we did the read from it measured its process to take 7 ms. From this we surmise that there was 3 ms of time spent waiting for the network traffic." We could probably give information when we send information over the network, how much we're sending and to whom. However we can't in general say, we sent a message to this machine and it arrived 1.5 ms later.

Can we make sure that tasks executed in parallel are part of a meta task? That is to say, your example would be

This isn't explicitly enforced by the schema but in practice they always will be. Consider these for reference:

{
    "value" : ...,
    "profile" : A
}

{
    "value" : ...,
    "profile" : [
        {
             "duration" : ...,
             "description" : ...,
             "sub_tasks" : B
        }
    ]
}

The schema here is recursive such that A and B have an identical schema but in practice the top level will never contain parallel tasks, they're always going to wind up inside of sub_tasks with names like "performing read".

@jdoliner
Copy link
Contributor

As for grouping tasks with low constant factor things in to groups I think that might be being a bit smarter with the backend than we want to be. We're definitely going to need to do some grouping for things like map. Mapping over a 1,000,000 elements could be thought of as 1,000,000 sequential steps. But that would be impractical to represent so instead we're going to need to have an aggregation node which basically says: "This operation total took n ms. It was performed 1,000,000 times. And something about the variance, a confidence interval seems like it might be what we want. How to represent the subtasks of such an operation might be a bit tricky but I think we'll figure something out.

I'm a bit dubious for doing it based on the content of the subtasks though since it's unclear exactly where we'd draw the line. For example we definitely want the full profile for the arguments to add if the query is something like:

r.table("foo").map(...).reduce(...) + r.table("bar").map(...).reduce(...)

and similarly if the arguments to expr are big enough it could become the bottleneck. It seems like it would take some time to come up with good heuristic rules for what to omit. The approach I normally see profiling tools taking is that you return output that gives the full unabridged story and then build tools that help you sift through the data, such as our visualizer. Heuristic 100 times easier to make useful when you give them to users as tools and let them guide the process. Granted we're not going to have that for a little while but looking at the types of queries that I see most users running they normally don't have a lot of terms in them like r.expr(1) that would cause enough noise to make the output unusable.

@jdoliner
Copy link
Contributor

Also I was going to put off mentioning this for a while since it only really becomes meaningful once we have a visualizer but another piece of info we'll be including in each profile node is the backtrace info we use for error reporting. Once we have this we can present a view like this which shows the relative expense of various parts of the query:

1etzmw6 1

@coffeemug
Copy link
Contributor Author

I think the Unix philosophy of "do one thing and do it well" breaks down for profilers in practice. The profilers end up dumping too much data, which is useful in theory (if it could be analyzed), but is too voluminous to analyze in practice. Second-order analysis tools tend to leave a lot to be desired, so people end up feeling like they have enough information, but no immediately obvious actions they can take based on it.

I can easily think of a heuristic that would probably do a good job: group low constant time deterministic operations. Basically all the math, all the operations on fields, etc, but not point reads, math with nondeterministic components, array ops, etc. This isn't ideal (random isn't deterministic but should be grouped anyway), but I think it's pretty good. We might need to tweak this, but I think we could arrive at good heuristics pretty quickly. It doesn't seem hard.

Anyway, this is a bit hypothetical. I think we'll be able to tell very quickly what's worth doing and what isn't by running it on actual queries and examining the results.

@jdoliner
Copy link
Contributor

jdoliner commented Nov 5, 2013

This is in next as of c6baa1f. It was reviewed by @mlucy in CR 991 and ships in 1.11.

@coffeemug
Copy link
Contributor Author

I believe this calls for the following -- http://www.youtube.com/watch?v=xB5ceAruYrI.

@coffeemug
Copy link
Contributor Author

Also, my GOD the OS X build takes a long time. The anticipation is killing me :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants