Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add ability to profile queries #6699

Closed
wants to merge 76 commits into from

Conversation

Projects
None yet
@polyfractal
Copy link
Member

commented Jul 3, 2014

This PR adds the ability to profile queries and determine which portions of the query are the slowest. All timings are relative and displayed in microseconds.

Timings are not guaranteed to match the took time of the query...in fact, they will likely be larger. This is because timings are aggregated from all shards, often executing in parallel, so the profiled time is more akin to CPU time than wallclock time.

It is also important to understand that Profile times represetn relative costs, rather than absolute times. The act of profiling adds considerable overhead to the querying process since all components in the query must be wrapped and timed.

Enabling Profiling

Profling can be enabled with a query param, URI param or dedicated endpoint:

# Query param
curl -XGET localhost:9200/my_index/my_type/_search -d '{
  "profile": true,
  "query" : { ... }
}'

# URI param
curl -XGET localhost:9200/my_index/my_type/_search?profile=true

# Endpoint
curl -XGET localhost:9200/my_index/my_type/_profile -d '{
  "query" : { ... }
}'

Response output

When enabled, the response will contain an additional profile element. This will contain a tree of "profile components", where each component represents a portion of the query. Often, this tree of components will be larger than you expect - Elasticsearch and Lucene will expand and rewrite portions of the query, which introduces more pieces than you might expect.

Each component contains the following:

  • type - The name of the Elasticsearch/Lucene class for this part of the query, e.g. TermQuery
  • time - The total aggregated time at this level in the tree, which includes all children and all shards. Displayed as microseconds
  • relative - Relative contribution of this level in the tree. This metric is time / total_time. Displayed as a percentage
  • lucene - A rough approximation of the Lucene syntax for the component, which is often helpful in differentiating many similar components (e.g. differentiating term: { "field1": "fred" } from term: { "field1": "bob" }, since both are TermQuerys)
  • components - Zero or more additional components which are children to this level. For example, a Bool will have children components representing the various clauses.
{
   "took": 99,
   "timed_out": false,
   "_shards": { ... },
   "hits": { ... },
   "profile": [
      {
         "type": "XFilteredQuery",
         "time": 153996,
         "relative": "100.00%",
         "lucene": "filtered(filtered(my_field:the my_field:quick my_field:fox)->cache(input1:[1 TO 1]))->cache(_type:test)",
         "components": [
            {
               "type": "XFilteredQuery",
               "time": 108036,
               "relative": "70.155%",
               "lucene": "filtered(my_field:the my_field:quick my_field:fox)->cache(input1:[1 TO 1])",
               "components": [
                  {
                     "type": "BooleanQuery",
                     "time": 62053,
                     "relative": "40.295%",
                     "lucene": "my_field:the my_field:quick my_field:fox",
                     "components": [
                        {
                           "type": "TermQuery",
                           "time": 20366,
                           "relative": "13.225%",
                           "lucene": "my_field:the"
                        },
                        {
                           "type": "TermQuery",
                           "time": 1117,
                           "relative": "0.72534%",
                           "lucene": "my_field:quick"
                        },
                        {
                           "type": "TermQuery",
                           "time": 391,
                           "relative": "0.25390%",
                           "lucene": "my_field:fox"
                        }
                     ]
                  },
                  {
                     "type": "FilterCacheFilterWrapper",
                     "time": 3983,
                     "relative": "2.5864%",
                     "lucene": "cache(input1:[1 TO 1])"
                  }
               ]
            },
            {
               "type": "FilterCacheFilterWrapper",
               "time": 3835,
               "relative": "2.4903%",
               "lucene": "cache(_type:test)"
            }
         ]
      }
   ]
}

Todo / Known Issues

  • More tests! The current tests check for basic validity and overt failure, but I think we need a better way to check the contents of the profile response. + Unit tests for things like the ProfileVisitors
  • Shards can rewrite queries differently (e.g. if the term is not present on one shard), which means chunks of the output could potentially be skipped. Naively handling this by just ignoring components which don't match.
  • Potentially gross output, especially with numerics being expressed as byte strings (e.g. \0001)
  • Fairly abusive use of Visitor pattern. In particular, I'm not overly happy with the ProfileCollapsingVisitor which is used to collect timings from the query tree...but I have no idea how to better structure it. The biggest problem is that all query types need to be hardcoded, so this will be a maintenance tar pit.
  • Audit of the supported Query/Filter types...I imagine there are others that I'm missing which need to be added to both visitors
  • /_profile endpoint could use some love, ideally extending the search endpoint instead of c/p

polyfractal added some commits Feb 20, 2014

Add Simon's Visitor pattern code
Thanks Simon! <3
Rewrite should modify this instead of returning new object
If a new object is returned, the rewritten query is no longer registered
in the search context and timings are lost to the void
Reconfigure Visitor to return Arrays
Some queries/filters return several components, but it doesn't make sense
to create a Profile element on that level.

e.g. a Bool may have several queries, but the Bool is already wrapped in
 a ProfileQuery.  Creating a new Profile to hold the Bool's makes it appear
 as if the Bool is twice in the query
Add nested document support
Thanks Martijn!
ProfileQuery and ProfileFilter should maintain their own name/details
By moving the name/detail logic inside the query and filter, we can keep
a consistent name even when the underlying query has
been rewritten (e.g. FuzzyQuery instead of the rewritten BoolQuery)
Rewritten queries need to be fully walked
Previously, only the top-level of a rewritten query was wrapped.  The full
rewritten tree is now walked and wrapped
Fix NPE when shards rewrite differently
Because shards can rewrite their queries differently (depending on term
dictionaries), it isn't guaranteed that all components will be present in all shards.

Taking the naive approach and just ignoring components that don't exist.
This means we are potentially missing components, if the first shard happens
to be the one that is missing the component.
Collapse the local query, not the original parsed one
We need to collapse the local version of the query, since it may have
new components due to rewriting.  If we use the original parsed query,
we lose timing of all the rewritten components.
Update /_profile endpoint to support templates
Largely replaced the old endpoint code with the new version which supports
templates.  Again, mostly a c/p from the search endpoint...probably a better
way to do this
Merge remote-tracking branch 'upstream/1.x' into features/profile_query
Conflicts:
	src/main/java/org/elasticsearch/search/internal/InternalSearchResponse.java
Fix NPE when no query is present
A ConstantScoreQuery which only contains a filter (e.g. MatchAllDocsFilter)
will throw an NPE because there is no query to get.
Move aggregation of profiles into queryResults loop
Simplifies the code and removes an unescessary second loop over the
same set of queryResults

polyfractal added some commits Dec 10, 2014

Add method disambiguation to the Query Walker
Previously, the multiple dispatcher would spew if it encountered multiple potential method matches.
This adds the ability to specify a Disambiguator which can choose from amongst the possible
matches.

The default behavior, if a disambiguator is not specified, is to continue throwing exceptions.

The ProfileQueryVisitor will try to choose the more specific of methods (e.g. BooleanQuery instead of
Query), with potential to fallback to basic Query.
ProfileQuery rewrite must *not* return `this` if the query has been r…
…ewritten

Needs to return a new object (with old time) holding the rewritten query, otherwise Lucene
will stop the recursive rewrite process prematurely
Move random query generation to own class
The random query / filter generation deserves it's own class in the testing infrastructure, rather
than a utility method of the profile query tests.  Has more widespread utility.

Also tried to generify the methods a little, to make them easier to use outside profile query tests
Add more tests
- Dismax to random generator
- Dismax non-randomized test which was causing an error
- A test which validates that a profiled query hits are identical to a non-profiled query
@aaneja

This comment has been minimized.

Copy link

commented Mar 23, 2015

Ping. Any updates on this ?
Also +1 for @jprante request; it would be helpful to determine what is happening at the shard and node level.
It would also be great to have a hook to log profiling data to the slow log.

@polyfractal polyfractal removed the review label Mar 23, 2015

@polyfractal

This comment has been minimized.

Copy link
Member Author

commented Mar 23, 2015

@aaneja I actually started to revisit this just last week. The current branch went a bit stale because I was concerned about the maintainability of the approach. It basically works by wrapping each component in the query with a special ProfileQuery / ProfileFilter. To do this, it walks the query tree recursively using a double-dispatcher. It does a similar process after the query is done, to "unwrap" the profile timings and build a response tree.

The problem is that the double-dispatcher needs a lot of "special cases", because many queries are special snowflakes. E.g. need to call getQueries() on a BooleanQuery query, getQuery() and getFilter() on a ConstantScoreQuery, etc. Basically the walker becomes a maintenance tarpit, because it will need to be updated any time new queries are added/removed, or if existing queries change.

I started working on a new approach (here: https://github.com/polyfractal/elasticsearch/tree/features/pq2) which hooks into the IndexSearcher, which is basically the main class that Lucene uses to drive queries. A lot of recent changes in Lucene make this a lot cleaner.

The current strategy is to push some of the functionality back into Lucene (ProfileIndexSearcher, ProfileQuery, etc) and make Elasticsearch responsible for the rest.

Still a WIP, but I'm optimistic it'll be a better solution.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Apr 5, 2015

Copied across @kamaradclimber's comments from #10348 (comment)

@polyfractal this is a very great step towards what I am looking for, I'd really like this to be merged.

Improvements I can think of once your patch is in master:

  • display if the query used a cache or not
  • display merge time (time to merge the results of subqueries/filters)
@tomryanx

This comment has been minimized.

Copy link

commented Jul 10, 2015

+1

@s1monw s1monw closed this Jul 10, 2015

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jul 12, 2015

@polyfractal this was autoclosed when we deleted 1.x. Please could you reopen against master

@bakks

This comment has been minimized.

Copy link

commented Jul 27, 2015

+1 for this branch, this is a sorely needed feature!

@mpereira

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2015

+1

1 similar comment
@deimosfr

This comment has been minimized.

Copy link

commented Sep 2, 2015

+1

@polyfractal polyfractal referenced this pull request Nov 20, 2015

Closed

Add query profiler #14889

@makeyang

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2015

+1

@makeyang

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2015

when will this feature be released?

@polyfractal

This comment has been minimized.

Copy link
Member Author

commented Nov 26, 2015

See #14889 for current status. Probably 2.2 or 2.3, assuming the review process goes smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.