Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve discover query #69049

Closed
nik9000 opened this issue Jun 12, 2020 · 6 comments
Closed

Improve discover query #69049

nik9000 opened this issue Jun 12, 2020 · 6 comments
Assignees
Labels
enhancement New value added to drive a business result Feature:Discover Discover Application Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.

Comments

@nik9000
Copy link
Member

nik9000 commented Jun 12, 2020

I bumped into a _search generated by discover that had a few things in it that looked like they'd slow elasticsearch down. I'm wondering if we can do anything to speed this up:

curl -XPOST -HContent-Type:application/json ????????   -d'{
  "version": true,  <--- do we really need this?
  "size": 500,      <--- this is fairly large. too big to fit on the screen, right?
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean" <---- wat
      }
    }
  ],
  "aggs": {    <---- having an agg in the same query as `size` turns off agg caching and doesn't let the early terminate fetching the top hits
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "30m",
        "time_zone": "America/Chicago",
        "min_doc_count": 1
      }
    }
  },
  "stored_fields": [  <---- stored_fields should be pretty rare. I'd expect leaving this off would mostly produce all you need and adding it will fetch more than you need.
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [   <----- that is a lot of fields. doc_values are a column store so aren't going to be efficient to fetch. I guess you do this to get a formatted date. https://github.com/elastic/elasticsearch/issues/55363 will help with that.
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    .... 11 other date_time fields
  ],
  "_source": {   <----- this is confusing. I think it means "don't filter" but I'd have to look it up. It's way, way less confusing to leave this out if you don't need any filtering.
    "excludes": []
  },
  "query": {
    "bool": {   <----- This looks big but it is 100% ok. We'll rewrite it down to just the range query.
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2020-06-08T05:00:00.000Z",
              "lte": "2020-06-09T02:06:38.662Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}       <--- this is expensive. There are 100 fields in this index. In this particular case the search query may be too simple to highlight. I'm not sure. But I am certain that if the query *isn't* super simple then this will be expensive.
    },
    "fragment_size": 2147483647 <---- this is asking ES to OOM if there are large documents.
  }
}
@nik9000 nik9000 added Feature:Discover Discover Application Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Jun 12, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@nik9000
Copy link
Member Author

nik9000 commented Jun 15, 2020

The doc_values fields I pointed out duplicates #68672. But the other things are still their own unique "fun" bits of search.

@kertal
Copy link
Member

kertal commented Jun 15, 2020

About

"aggs": {    <---- having an agg in the same query as `size` turns off agg caching and doesn't let the early terminate fetching the top hits

Here's an issue to split one into 2 queries because of performance: #69134

"size": 500,      <--- this is fairly large. too big to fit on the screen, right?

True, but there are only parts rendered, of course this could be optimized fetching a smaller size and using e.g. search_after to get more on demand

"version": true,  <--- do we really need this?
"stored_fields": [  <---- stored_fields should be pretty rare. I'd expect leaving this off would mostly produce all you need and adding it will fetch more than you need.
    "*"
  ],

this and more is maintained by @elastic/kibana-app-arch, dear team could you provide feedback here?

tbc.

@lukasolson lukasolson added Team:AppArch Feature:Search Querying infrastructure in Kibana labels Jun 23, 2020
@lukasolson
Copy link
Member

I believe version is required in discover because if auto-refresh is on, and a document is updated in ES, we will only see the updated doc if we request the version (see #10385).

Regarding stored_fields, I think that has been included historically just in case there are fields that are stored in the mapping... We might be able to have an advanced option for users to specifically opt into this behavior and opt out by default.

@lukasolson lukasolson added the enhancement New value added to drive a business result label Jun 23, 2020
@jtibshirani
Copy link

jtibshirani commented Aug 1, 2020

"query": {
  "bool": {   <----- This looks big but it is 100% ok. We'll rewrite it down to just the range query.
    "must": [],
    "filter": [
      {
        "match_all": {}
      },
      ...

I'm not actually sure that we rewrite this correctly. I noticed that we only remove match_all clauses if there is at least one must clause. I opened a Lucene PR with a possible fix: apache/lucene-solr#1709.

I noticed that we weren't removing the match_all while using the search profiler to debug why discover took a long time to load on a cluster.

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 2, 2021
@timroes timroes added Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Aug 31, 2021
@timroes
Copy link
Contributor

timroes commented Sep 15, 2021

Since (quote) "@nik9000 seemed to be happy around the Discover query the last time he looked at it", we're going to close this. Please feel free to reopen or create individual specific issues if there's more improvements left we can do.

@timroes timroes closed this as completed Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Discover Discover Application Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.
Projects
None yet
Development

No branches or pull requests

6 participants