Search: Add query profiler #12974

polyfractal · 2015-08-18T22:48:29Z

Only about a year late, here is the followup to #6699. :)

This PR adds a query profiler to time the various components of a query. This PR differs from #6699 mainly in implementation details (and superficially, some of the response syntax).

How the old PR worked

The old method basically walked the query tree after it was processed and wrapped everything in a special ProfileQuery. This class then delegated to the wrapped query/filter and timed the execution.

This approach was problematic for one main reason: the query-walking dispatcher needed many special cases, since special-snowflake queries introduced edge cases that needed handling. This meant that any time queries were altered in ES/Lucene, the walker would likely need to be updated.

Another problem was how timings were stored: each ProfileQuery maintained it's own "local" timing. When the query was finished, timings had to be recursively merged upwards from the leaf nodes to find a total time, then merged back down to derive a relative time. This whole process required a second "profile walker" which would traverse the profiled query, calculate the timings and spit out a tree of Profiled components.

Finally, it made book-keeping very tricky due to rewrites. Some rewrites will change the query structure and you end up with "dangling" ProfileQueries that were no longer in the tree. Some optimizations in Lucene, such as collapsing multiple boolean queries into a single bool, could really mess up the tree.

How the new PR works

The new method basically injects logic in the ContextIndexSearcher and overrides a few key methods (rewrite, createWeight, createNormalizedWeight). If profiling is enabled, weights are wrapped in a ProfileWeight, which then further wraps scorers in a ProfileScorer.

Timings are then stored in a centralized, thread-local InternalProfiler, which also maintains a dependency graph. Conveniently, createWeight() is called once per node in tree, so we can use that to maintain a stack of tree depth and generate the dependency graph on the fly, instead of pre-walking the entire thing.

This is generally less invasive and more tolerant to rewrite changes (weights are generated after the rewrite is finished, so all of our wrapped weights/scores are done post-rewrite). The downside is that profiling logic is now baked into ContextIndexSearcher and toggled with a flag. We looked into wrapping the searcher with a ProfileIndexSearcher, but the current architecture won't allow that to work for technical reasons. So the current approach works, but isn't entirely non-invasive...definitely room for improvement.

Syntax

Sample query:

GET /test/test/_search
{
   "profile": true,
   "query": {
      "bool": {
         "should": [
            {
               "match": {
                  "abc": "xyz"
               }
            },
            {
               "match": {
                  "abc": "abc"
               }
            },
            {
               "match": {
                  "abc": "123"
               }
            }
         ]
      }
   }
}

And sample response (truncated):

{
   "hits": {...},
   "profile": {
      "query": {
         "shards": [
            {
               "shard_id": "o73tK_SGR9GDofdy9zg0Gw",
               "timings": [
                  {
                     "query_type": "BooleanQuery",
                     "lucene": "+(abc:xyz abc:abc abc:123) #ConstantScore(_type:test)",
                     "time": "40.78110900ms",
                     "relative_time": "100.0000000%",
                     "breakdown": {
                        "rewrite": 69067,
                        "weight": 20363221,
                        "score": 279524,
                        "cost": 0,
                        "normalize": 153166,
                        "build_scorer": 19916131
                     },
                     "children": [
                        {
                           "query_type": "BooleanQuery",
                           "lucene": "abc:xyz abc:abc abc:123",
                           "time": "30.67931900ms",
                           "relative_time": "75.22924156%",
                           "breakdown": {
                              "rewrite": 0,
                              "weight": 13726549,
                              "score": 92738,
                              "cost": 0,
                              "normalize": 58733,
                              "build_scorer": 16801299
                           },
                           "children": [
                              {
                                 "query_type": "TermQuery",
                                 "lucene": "abc:xyz",
                                 "time": "14.07877100ms",
                                 "relative_time": "34.52277622%",
                                 "breakdown": {
                                    "rewrite": 0,
                                    "weight": 11665755,
                                    "score": 0,
                                    "cost": 0,
                                    "normalize": 21939,
                                    "build_scorer": 2391077
                                 },
                                 "children": []
                              },
                              {
                                 "query_type": "TermQuery",
                                 "lucene": "abc:abc",
                                 "time": "3.957625000ms",
                                 "relative_time": "9.704554626%",
                                 "breakdown": {
                                    "rewrite": 0,
                                    "weight": 409593,
                                    "score": 45739,
                                    "cost": 40510,
                                    "normalize": 4256,
                                    "build_scorer": 3457527
                                 },
                                 "children": []
                              },
                              {
                                 "query_type": "TermQuery",
                                 "lucene": "abc:123",
                                 "time": "0.3979560000ms",
                                 "relative_time": "0.9758341785%",
                                 "breakdown": {
                                    "rewrite": 0,
                                    "weight": 197993,
                                    "score": 4931,
                                    "cost": 12881,
                                    "normalize": 3858,
                                    "build_scorer": 178293
                                 },
                                 "children": []
                              }
                           ]
                        },
           ...
}

Known issues

I haven't run the full test suite yet, so this may break some stuff :s
Does not work with DFS_query_then_fetch, since this adds logic to directly ContextIndexSearcher instead of wrapping
Untested against nested / parent-child.
A few locations where the profile results are serialized may need version checks? They are annotated with //nocommit
I wasn't entirely sure what utility methods to include for consumers of the Java API. Currently, the ProfileResults interface provides a way to get a map ( Shard -> ProfileResults), an EntrySet and a Collection. This was all I really needed to build the tests, but I'm open to suggestions for more user-friendly API
The profiled times are in nanoseconds...should we change this to milliseconds?

/cc @jpountz

Edit:

…l floats

…original

dadoonet · 2015-08-19T06:34:23Z

Thank you so much Zach!

jpountz · 2015-08-19T09:16:54Z

I think the way this PR does the wrapping is indeed more robust than the previous PR. Regarding the DFS issue, we have had a similar issue in other pull requests which boils down to the fact that IndexSearcher is hard (impossible?) to wrap correctly, so maybe it would need some refactoring...

We should try to explore profiling collectors, which are a common source of slowness eg. if you use heavy aggregations. Profiling the reduce phase would be nice too but I don't think it's required for the first iteration?

Maybe we should move this PR to a public branch to make it easier to iterate on?

polyfractal · 2015-08-21T14:30:49Z

For those that want to follow along, a public branch has been pushed..

Closing this PR, we'll re-open a new one once the shared branch is ready to go (third time's a charm) :)

$polyfractal$

polyfractal added 19 commits August 18, 2015 14:56

$@polyfractal$

checkpoint

c9620df

$@polyfractal$

checkpoint

82d2911

$@polyfractal$

Add createWeight() checkpoint

06ed5cd

$@polyfractal$

Checkpoint

45b0b6d

$@polyfractal$

Start to add back tests, refactor Java api

773d034

$@polyfractal$

Fix serialization of QuerySearchResult, InternalProfileResult

1ad2449

$@polyfractal$

Re-enable bypass for non-profiled queries

320c317

$@polyfractal$

Children need to be added to list after serialization

c8175dc

$@polyfractal$

Make naming consistent re: plural

3d368b2

$@polyfractal$

[TESTS] Update match and bool tests

145ffa5

$@polyfractal$

Fixup rebase conflicts

26c385f

$@polyfractal$

Move nearlyEquals() utility function to shared location

0f9cefb

$@polyfractal$

[TESTS] Fix profileMatchesRegular to handle NaN scores and nearlyEqua…

8e2970e

…l floats

$@polyfractal$

[TESTS] Comments & cleanup

d54a440

$@polyfractal$

Implement some missing methods

ffc6bdc

$@polyfractal$

Add note about DFS being unable to profile

57f2eca

$@polyfractal$

[TESTS] Fix comparison test to ensure results sort identically

6934503

$@polyfractal$

Comments and cleanup

66cbb27

$@polyfractal$

Only reconcile rewrite timing when rewritten query is different from …

5ada0a1

…original

$@polyfractal$ polyfractal added >feature :Search/Search Search-related issues that do not fall into other categories v2.1.0 labels Aug 18, 2015

jpountz self-assigned this Aug 19, 2015

$@polyfractal$ polyfractal closed this Aug 21, 2015

clintongormley removed the v2.1.0 label Aug 24, 2015

$@polyfractal$ polyfractal mentioned this pull request Nov 20, 2015

Add query profiler #14889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search: Add query profiler #12974

Search: Add query profiler #12974

$@polyfractal$ polyfractal commented Aug 18, 2015

dadoonet commented Aug 19, 2015

jpountz commented Aug 19, 2015

polyfractal commented Aug 21, 2015

Search: Add query profiler #12974

Search: Add query profiler #12974

Conversation

polyfractal commented Aug 18, 2015

How the old PR worked

How the new PR works

Syntax

Known issues

dadoonet commented Aug 19, 2015

jpountz commented Aug 19, 2015

polyfractal commented Aug 21, 2015

$@polyfractal$ polyfractal commented Aug 18, 2015