Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upSearch: Add query profiler #12974
Conversation
polyfractal
added some commits
Mar 18, 2015
polyfractal
added
>feature
:Search/Search
v2.1.0
labels
Aug 18, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Thank you so much Zach! |
jpountz
self-assigned this
Aug 19, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jpountz
Aug 19, 2015
Contributor
I think the way this PR does the wrapping is indeed more robust than the previous PR. Regarding the DFS issue, we have had a similar issue in other pull requests which boils down to the fact that IndexSearcher is hard (impossible?) to wrap correctly, so maybe it would need some refactoring...
We should try to explore profiling collectors, which are a common source of slowness eg. if you use heavy aggregations. Profiling the reduce phase would be nice too but I don't think it's required for the first iteration?
Maybe we should move this PR to a public branch to make it easier to iterate on?
|
I think the way this PR does the wrapping is indeed more robust than the previous PR. Regarding the DFS issue, we have had a similar issue in other pull requests which boils down to the fact that IndexSearcher is hard (impossible?) to wrap correctly, so maybe it would need some refactoring... We should try to explore profiling collectors, which are a common source of slowness eg. if you use heavy aggregations. Profiling the reduce phase would be nice too but I don't think it's required for the first iteration? Maybe we should move this PR to a public branch to make it easier to iterate on? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
polyfractal
Aug 21, 2015
Member
For those that want to follow along, a public branch has been pushed..
Closing this PR, we'll re-open a new one once the shared branch is ready to go (third time's a charm) :)
|
For those that want to follow along, a public branch has been pushed.. Closing this PR, we'll re-open a new one once the shared branch is ready to go (third time's a charm) :) |
polyfractal commentedAug 18, 2015
Only about a year late, here is the followup to #6699. :)
This PR adds a query profiler to time the various components of a query. This PR differs from #6699 mainly in implementation details (and superficially, some of the response syntax).
How the old PR worked
The old method basically walked the query tree after it was processed and wrapped everything in a special
ProfileQuery. This class then delegated to the wrapped query/filter and timed the execution.This approach was problematic for one main reason: the query-walking dispatcher needed many special cases, since special-snowflake queries introduced edge cases that needed handling. This meant that any time queries were altered in ES/Lucene, the walker would likely need to be updated.
Another problem was how timings were stored: each
ProfileQuerymaintained it's own "local" timing. When the query was finished, timings had to be recursively merged upwards from the leaf nodes to find a total time, then merged back down to derive a relative time. This whole process required a second "profile walker" which would traverse the profiled query, calculate the timings and spit out a tree of Profiled components.Finally, it made book-keeping very tricky due to rewrites. Some rewrites will change the query structure and you end up with "dangling" ProfileQueries that were no longer in the tree. Some optimizations in Lucene, such as collapsing multiple boolean queries into a single bool, could really mess up the tree.
How the new PR works
The new method basically injects logic in the ContextIndexSearcher and overrides a few key methods (rewrite, createWeight, createNormalizedWeight). If profiling is enabled, weights are wrapped in a
ProfileWeight, which then further wraps scorers in aProfileScorer.Timings are then stored in a centralized, thread-local
InternalProfiler, which also maintains a dependency graph. Conveniently, createWeight() is called once per node in tree, so we can use that to maintain a stack of tree depth and generate the dependency graph on the fly, instead of pre-walking the entire thing.This is generally less invasive and more tolerant to rewrite changes (weights are generated after the rewrite is finished, so all of our wrapped weights/scores are done post-rewrite). The downside is that profiling logic is now baked into ContextIndexSearcher and toggled with a flag. We looked into wrapping the searcher with a ProfileIndexSearcher, but the current architecture won't allow that to work for technical reasons. So the current approach works, but isn't entirely non-invasive...definitely room for improvement.
Syntax
Sample query:
GET /test/test/_search { "profile": true, "query": { "bool": { "should": [ { "match": { "abc": "xyz" } }, { "match": { "abc": "abc" } }, { "match": { "abc": "123" } } ] } } }And sample response (truncated):
{ "hits": {...}, "profile": { "query": { "shards": [ { "shard_id": "o73tK_SGR9GDofdy9zg0Gw", "timings": [ { "query_type": "BooleanQuery", "lucene": "+(abc:xyz abc:abc abc:123) #ConstantScore(_type:test)", "time": "40.78110900ms", "relative_time": "100.0000000%", "breakdown": { "rewrite": 69067, "weight": 20363221, "score": 279524, "cost": 0, "normalize": 153166, "build_scorer": 19916131 }, "children": [ { "query_type": "BooleanQuery", "lucene": "abc:xyz abc:abc abc:123", "time": "30.67931900ms", "relative_time": "75.22924156%", "breakdown": { "rewrite": 0, "weight": 13726549, "score": 92738, "cost": 0, "normalize": 58733, "build_scorer": 16801299 }, "children": [ { "query_type": "TermQuery", "lucene": "abc:xyz", "time": "14.07877100ms", "relative_time": "34.52277622%", "breakdown": { "rewrite": 0, "weight": 11665755, "score": 0, "cost": 0, "normalize": 21939, "build_scorer": 2391077 }, "children": [] }, { "query_type": "TermQuery", "lucene": "abc:abc", "time": "3.957625000ms", "relative_time": "9.704554626%", "breakdown": { "rewrite": 0, "weight": 409593, "score": 45739, "cost": 40510, "normalize": 4256, "build_scorer": 3457527 }, "children": [] }, { "query_type": "TermQuery", "lucene": "abc:123", "time": "0.3979560000ms", "relative_time": "0.9758341785%", "breakdown": { "rewrite": 0, "weight": 197993, "score": 4931, "cost": 12881, "normalize": 3858, "build_scorer": 178293 }, "children": [] } ] }, ... }Known issues
//nocommit/cc @jpountz
Edit:
