New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms aggregation speed is proportional to field cardinality #19780

Closed
bobrik opened this Issue Aug 3, 2016 · 12 comments

Comments

Projects
None yet
4 participants
@bobrik
Contributor

bobrik commented Aug 3, 2016

Elasticsearch version:

# elasticsearch --version
Version: 2.3.4, Build: e455fd0/2016-06-30T11:24:31Z, JVM: 1.8.0_91

JVM version:

# java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

OS version: Debian Jessie, elasticsearch is running in the container from the official docker image.

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. Index ~1 billion documents (I used 1 241 584 977 for 12h) in ~1kb doc average.
  2. Run topN query for low cardinality field with zero matches -> get instant response. Good!
  3. Run topN query for high cardinality field with zero matches -> get slow response. Bad!

The index in question has 50 shards and all fields are doc_values.

Query for low cardinality field:

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "query": {
                "query_string": {
                  "query": "zoneName:foo.bar"
                }
              }
            },
            {
              "range": {
                "@timestamp": {
                  "gte": "2016-08-03T00:00:00Z",
                  "lte": "2016-08-03T12:00:00Z"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "topn": {
      "terms": {
        "field": "status",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

Response:

{
  "took": 148,
  "timed_out": false,
  "_shards": {
    "total": 50,
    "successful": 50,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "topn": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

Query for high cardinality field:

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "query": {
                "query_string": {
                  "query": "zoneName:foo.bar"
                }
              }
            },
            {
              "range": {
                "@timestamp": {
                  "gte": "2016-08-03T00:00:00Z",
                  "lte": "2016-08-03T12:00:00Z"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "topn": {
      "terms": {
        "field": "request",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

Response:

{
  "took": 10634,
  "timed_out": false,
  "_shards": {
    "total": 50,
    "successful": 50,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "topn": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

Field cardinality for the same timespan:

  • request: 168154206
  • status: 127

Problem is: these 10 seconds came out of nowhere. I don't have any matching documents, there's nothing to do. I'm afraid that I have to pay the price for every request trying to do aggregation for index-wise high cardinality field, even though query-wise the field is not that diverse.

@colings86

This comment has been minimized.

Member

colings86 commented Aug 4, 2016

The cause of this is probably due to loading global ordinals. The time taken to load global ordinals will increase as the cardinality of the field increases. The aggregator loads global ordinals when it is initialised (before starting to collect documents) and at this point the aggregator does not know whether there will be any matching documents for the query. Moving this loading to the collect phase would not be a great solution IMO as it would mean every document would incur the cost of checking whether we need to load global ordinals affecting the performance of collecting every document.

@bobrik

This comment has been minimized.

Contributor

bobrik commented Aug 4, 2016

It is also reasonably fast for cold indices. For yesterday's data:

  • 1ms (lowest) for low cardinality field with no matches
  • 200-400ms for high cardinality field with no matches
@clintongormley

This comment has been minimized.

Member

clintongormley commented Aug 4, 2016

@bobrik what do you mean by your last comment? What was different that time than the previous time that showed 10s?

@bobrik

This comment has been minimized.

Contributor

bobrik commented Aug 4, 2016

The difference is in freshness of data. 10s is for constantly updating current index, 200-400ms is for cold index with data from yesterday. Indices are rotated daily.

@clintongormley

This comment has been minimized.

Member

clintongormley commented Aug 4, 2016

Ah OK - so then yes, this sounds like the cost of building global ordinals. No way around this currently other than: (a) using eager global ordinals to build them as part of the refresh and (b) reducing the refresh time to build global ordinals as seldom as possible.

@martijnvg is looking of ways to reduce the cost of global ordinal loading with frequent refreshes in Lucene, but that's probably a long way off.

@bobrik

This comment has been minimized.

Contributor

bobrik commented Aug 4, 2016

@clintongormley, thanks. I enabled eager loading for the most offensive fields and with 200% CPU usage for flushes per node I can have faster searches. That's with 30s refresh intervals.

@jateenjoshi

This comment has been minimized.

jateenjoshi commented Aug 29, 2017

@clintongormley - coming in quite late here. Couple of questions:

  1. Has the situation changed with making global ordinal loading faster in Lucene with newer versions of elasticsearch?

  2. I have had similar issues when using term aggregations. I've tried wrapping the term aggregation in a sampler aggregation and also separately, a filter aggregation. Both of them seem to have no effect on the slowness - is that expected?

@colings86

This comment has been minimized.

Member

colings86 commented Aug 30, 2017

@jateenjoshi There may well have been improvements to the speed of loading global ordinals in more recent versions of Elasticsearch but for high cardinality fields it will probably still dominate the time taken to run these aggregations. Wrapping the terms aggregation in a sampler aggregation or a filter aggregation will unfortunately not work here since the global ordinals are not dependant on the number of terms matching the query but rather than number of terms in the shard (Lucene index) overall of that field. The global ordinals are built before we collect any matching documents so the time taken to load them is fixed (given a static index) no matter what the query or parent aggregations are.

The global ordinals only need to be loaded once and can then be used for all subsequent requests until the index changes (i.e. until a refresh is called). As @clintongormley says above, at the moment your only two options are to enable eager global ordinals so the cost of building global ordinals is incurred at the time the refresh is done rather than when the request is run, and/or you can increase the refresh interval so that you have to incur this cost less often.

@colings86

This comment has been minimized.

Member

colings86 commented Aug 30, 2017

Further to my comment above there is a change coming in Lucene 7.1 (https://issues.apache.org/jira/browse/LUCENE-7905) that will improve the speed of loading global ordinals. This should be available in Elasticsearch in one of the 6.x releases. However, although this will improve the speed of loading global ordinals it will not change the fact that global ordinals loading is the dominant part of these kinds of aggregation requests and so will not really change my advice from above.

@jateenjoshi

This comment has been minimized.

jateenjoshi commented Sep 13, 2017

Got it - thanks @colings86

@jateenjoshi

This comment has been minimized.

jateenjoshi commented Sep 13, 2017

One thing we are doing is increasing the number of shards in our indexes. It sounds like the dominating factor is the cardinality of the field in a single shard, so would increasing the number of shards with more nodes help bring down the latencies?

@colings86

This comment has been minimized.

Member

colings86 commented Sep 13, 2017

Only if you have the resources to handle the extra shards. Increasing shards has other affects on performance as well so you may find it helps but you may find it doesn't, you'd have to test in your use case to find out. Also if you increase the shards but they are shared over the same nodes then you will be loading double the number of global ordinals (one for each shard) every time you refresh so the total cost on resources may well be the same but with the cost of having more shards in other aspects (like the term dictionary cache and duplication of the same term in multiple shards)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment