Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term aggregations extraordinarily slow on Windows (ES 1.0.1) #5498

Closed
BrandesEric opened this issue Mar 23, 2014 · 7 comments
Closed

Term aggregations extraordinarily slow on Windows (ES 1.0.1) #5498

BrandesEric opened this issue Mar 23, 2014 · 7 comments

Comments

@BrandesEric
Copy link

I'm working on an app that makes liberal use of term facets. I have a query right now that takes about 1.5 seconds using term facets. I switched to try out the new aggregations, and the same query using aggregations now takes over 360 seconds. (6 mins!). This is on a 3 node cluster running in a windows environment. And while running, it literally makes the node that received the query completely unresponsive. To the point where the cluster thinks it's gone. Sometimes it never returns, either, and i have to bounce the service on the box.

Here is the original query:

   "query": {
      "match_all": {}
   },
   "size": 0,
   "facets": {
      "url": {
         "terms": {
            "field": "url",
            "size":20
         }
      }
   }

Here is the new query with aggregations:

   "query": {
      "match_all": {}
   },
   "size": 0,
   "aggs": {
      "url": {
         "terms": {
            "field": "url",
            "size":20
         }
      }
   }
@jpountz
Copy link
Contributor

jpountz commented Mar 24, 2014

Terms aggregations are not as optimized as terms facets (yet), especially on high-cardinality fields. I suppose this is the case of your url field? Could you check if you get better response times if you pass execution_hint: map to the terms aggregation?

@jpountz
Copy link
Contributor

jpountz commented Mar 24, 2014

I just noticed that you mentioned that the query made the node completely unresponsive, which is a typical symptom of memory pressure, because the garbage collector keeps running stop-the-world collections during which no thread is allowed to run. So this slowness might be mostly due to the higher memory usage of terms aggregations compared to facets (in which case the map execution hint should still help a bit since it requires less memory).

@BrandesEric
Copy link
Author

Ah, that did help tremendously, thanks! Down to about 15 seconds from 6 minutes! I think you're right that it was a GC pause that caused the issue. It most seriously occurs on the dev environment which is a much less powerful cluster. For now I will likely stick with the older term facets since the performance is still a bit better :)

One last question - my app uses ES primarily for faceting. Would it be advisable to make any changes to the default field data cache setting? Otherwise is the general rule the more memory the better? In general some of the facets have cardinalities in the millions, so any way to optimize that would be helpful!

@jpountz
Copy link
Contributor

jpountz commented Mar 24, 2014

Thanks for the feedback. Regarding the field data cache configuration, reloading a field data cache is very costly, so you should make sure that it is large enough to hold an entry for all segments and all fields that you need to facet/aggregate on (which is what the default configuration does).

I'm closing this issue for now, but be reassured that we are working on improving memory/speed for terms aggregations.

@jpountz jpountz closed this as completed Mar 24, 2014
@BrandesEric
Copy link
Author

Sounds good, thanks for your help!

@haochun
Copy link

haochun commented Oct 12, 2015

@BrandesEric Hello, I would like to know how you solve this problem.thank you very much!

@BrandesEric
Copy link
Author

@haochun Moved to Linux and upgraded to the 1.4 series :) (Of course, 1.4 is old these days, so something like the 1.7 series is likely even better)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants