-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search: Add query profiler #12974
Search: Add query profiler #12974
Conversation
Thank you so much Zach! |
I think the way this PR does the wrapping is indeed more robust than the previous PR. Regarding the DFS issue, we have had a similar issue in other pull requests which boils down to the fact that IndexSearcher is hard (impossible?) to wrap correctly, so maybe it would need some refactoring... We should try to explore profiling collectors, which are a common source of slowness eg. if you use heavy aggregations. Profiling the reduce phase would be nice too but I don't think it's required for the first iteration? Maybe we should move this PR to a public branch to make it easier to iterate on? |
For those that want to follow along, a public branch has been pushed.. Closing this PR, we'll re-open a new one once the shared branch is ready to go (third time's a charm) :) |
Only about a year late, here is the followup to #6699. :)
This PR adds a query profiler to time the various components of a query. This PR differs from #6699 mainly in implementation details (and superficially, some of the response syntax).
How the old PR worked
The old method basically walked the query tree after it was processed and wrapped everything in a special
ProfileQuery
. This class then delegated to the wrapped query/filter and timed the execution.This approach was problematic for one main reason: the query-walking dispatcher needed many special cases, since special-snowflake queries introduced edge cases that needed handling. This meant that any time queries were altered in ES/Lucene, the walker would likely need to be updated.
Another problem was how timings were stored: each
ProfileQuery
maintained it's own "local" timing. When the query was finished, timings had to be recursively merged upwards from the leaf nodes to find a total time, then merged back down to derive a relative time. This whole process required a second "profile walker" which would traverse the profiled query, calculate the timings and spit out a tree of Profiled components.Finally, it made book-keeping very tricky due to rewrites. Some rewrites will change the query structure and you end up with "dangling" ProfileQueries that were no longer in the tree. Some optimizations in Lucene, such as collapsing multiple boolean queries into a single bool, could really mess up the tree.
How the new PR works
The new method basically injects logic in the ContextIndexSearcher and overrides a few key methods (rewrite, createWeight, createNormalizedWeight). If profiling is enabled, weights are wrapped in a
ProfileWeight
, which then further wraps scorers in aProfileScorer
.Timings are then stored in a centralized, thread-local
InternalProfiler
, which also maintains a dependency graph. Conveniently, createWeight() is called once per node in tree, so we can use that to maintain a stack of tree depth and generate the dependency graph on the fly, instead of pre-walking the entire thing.This is generally less invasive and more tolerant to rewrite changes (weights are generated after the rewrite is finished, so all of our wrapped weights/scores are done post-rewrite). The downside is that profiling logic is now baked into ContextIndexSearcher and toggled with a flag. We looked into wrapping the searcher with a ProfileIndexSearcher, but the current architecture won't allow that to work for technical reasons. So the current approach works, but isn't entirely non-invasive...definitely room for improvement.
Syntax
Sample query:
And sample response (truncated):
Known issues
//nocommit
/cc @jpountz
Edit:
![itshappening](https://cloud.githubusercontent.com/assets/1224228/9344854/c62c54f8-45d9-11e5-8ee3-a597052a413f.gif)