Super slow Kibana UI on first load .. related to /_field_stats API call. #9386

diranged · 2016-12-06T05:54:09Z

Kibana version: 4.6.2
Elasticsearch version: 2.4.1
Server OS version: Ubuntu 14.04.3
Browser version: Chrome 54
Browser OS version: OSX

Description of the problem including expected versus actual behavior:
Issue #4342 introduced a new way to dynamically find indices when doing searching against them. This works and it works just fine for actual searches .. but causes problems when it comes time for Kibana to figure out which indexes to actually search. I think that the change was introduced in #4342, where usage of the /_field_stats api is used to figure out which indexes to search.

We store 30 days worth of indexes named logs-YYYY-MM-DD and events-YYYY-MM-DD in our cluster. Each day we store between 1-2tb of raw data (unduped .. double it for replication). Any given month, we're storing ~90-100TB of duplicated data, and around 30-40 billion events.

We have two clusters ingesting identical data and running the same versions of Kibana and ElasticSearch. The only difference is that on the newer cluster, we used the logs-* and events-* patterns rather than the [logs-]YYYY-MM-DD and [events-]YYYY-MM-DD.

A quick bit of debugging showed that the logs-* configured Kibana instances were issuing different API calls .. in particular, an extremely long lasting /_field_stats call:

[admin@ops-logpipeline-es-frontend-XXX:~]$ time curl --verbose -XPOST "http://localhost:9200/logs-*/_field_stats?level=indices" -d '{"fields":["timestamp"],"index_constraints":{"timestamp":{"max_value":{"gte":1481001283234,"format":"epoch_millis"},"min_value":{"lte":1481002183234,"format":"epoch_millis"}}}}'
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9200 (#0)
> POST /logs-*/_field_stats?level=indices HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:9200
> Accept: */*
> Content-Length: 176
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 176 out of 176 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 366
< 
* Connection #0 to host localhost left intact
{"_shards":{"total":420,"successful":420,"failed":0},"indices":{"logs-2016-12-06":{"fields":{"timestamp":{"max_doc":215701193,"doc_count":215701193,"density":100,"sum_doc_freq":862804772,"sum_total_term_freq":-1,"min_value":1480982400000,"min_value_as_string":"2016-12-06T00:00:00.000Z","max_value":1481002690210,"max_value_as_string":"2016-12-06T05:38:10.210Z"}}}}}
real	0m35.944s
user	0m0.007s
sys	0m0.000s

This call takes an average of 32-35s on our cluster almost every single time.. so the question I'm asking here is whether or not this is an expected behavior? Is this a normal amount of time to load up this data? Is it something thats not really well tested on large clusters? Or should we simply go back to the old model where we explicitly list the daily index strategy in our index pattern?

The text was updated successfully, but these errors were encountered:

diranged · 2016-12-06T16:00:22Z

Note I just found this post back from March where someone else was reporting a similar issue: https://discuss.elastic.co/t/kibana-field-stats-request-takes-20000-ms-if-number-of-indices-and-shards-are-more/43468/15

Bargs · 2016-12-07T00:18:28Z

30 seconds is quite unusual. I'd suggest seeking some tuning advice in the Elasticsearch discuss forum. There might be some changes you can make to your ES setup to improve field stats performance.

diranged · 2016-12-10T19:49:29Z

@Bargs,
I'm not sure what we can do to tune that query -- its not something thats well documented. Switching away from using the logs-* model and moving back to [logs-]YYYY-MM-DD has given us back the performance we expect. I would be surprised if we're the only people who are seeing this kind of impact.. but was this change actually tested from a performance perspective on "large"ish clusters?

From a performance perspective, our cluster looks something like this:

Masters

3 x c3.large master nodes that do nothing but coordinate stuff.. not part of queries, load balancing, etc,..

`Hot` data nodes

These ingest daily log/event data and hold ~5 days worth of indices (10 indices total ... logs- and events-). This farm has 18 x m4.4xl instances right now, and they handle the majority of our kibana queries because they have the most current data.

`Warm` data nodes

We roll data off after 5 days onto a farm of 10 x d2.2xl instances... these machines store 25 days worth of indices (so 50 indices * 14 shards = 700 shards over 10 nodes). Obviously these machines have higher and more static HEAP sizes because they aren't being queried as much and we're not doing active indexing on them.. but ultimately they sit with a pretty low load because they are rarely queried.

Daily Indexing Volume

Each day we index roughly 1-1.2 billion events... total storage size (with replication) is about 1.8TB per day.

Bargs · 2016-12-12T17:39:32Z

The main cause of slowness in field_stats I've seen is number of shards. field_stats needs to run per shard, so reducing the number of shards per index might help. Beyond that I don't have a ton of tuning advice myself, did you try posting in the Elasticsearch discuss forum? Since the issue here is field_stats performance we first need to diagnose why field_stats is slow in this particular case. If you post in Discuss and don't get any good advice I can try to bug some ES team members.

diranged · 2016-12-19T15:24:15Z

@Bargs,
So we have 30 ~1700 shards in our cluster (p+r) and roughly 1400 of them are on these warm nodes. That said, I think that its a poor UI choice to explicitly recommend against doing the <index>-* pattern when there is such an obvious performance issue on large clusters.

ersushantsood · 2017-01-08T21:06:01Z

@diranged can you please suggest if you got any inputs on the issue , we are still facing this issue .

ersushantsood · 2017-01-08T21:09:23Z

@diranged we are also using logstash-* pattern for field-stats api.

s1monw · 2017-01-09T10:50:51Z

@diranged @ersushantsood can you try to bisect the issue a bit and run the field stats against only a subset of the indices? What would be interesting is:

hot vs. cold indices (to see if the nodes make a difference)
if you reduce the number of indices does the time taken get reduced too? ie. is it related to the number of indices?
if you run the same request twice does it get faster on the second execution?

cdahlqvist · 2017-01-09T11:30:07Z

@diranged Have you tried isolating the field stats lookup test to just hot and/or warm nodes by altering the pattern? That should prove whether it is indeed the warm nodes that are causing the slowdown or not.

laf-rge · 2017-02-15T17:22:14Z

I am experiencing the same issue. Even when I look at just one index the _fieldstats query takes 8 seconds on a 5+1 shard 800 GB index.

alexfrancoeur · 2017-04-10T15:03:52Z

Working being done on #11014 should resolve this issue.

s1monw · 2017-04-10T15:06:15Z

oh this makes my day! 👍

Bargs · 2017-04-10T15:20:13Z

The new field_caps API won't help us on this issue. This issue is caused by the field_stats call that figures out which indices to query based on the selected time range. To fix this we need to consider #6644

s1monw · 2017-04-10T16:15:53Z

@Bargs we will get rid of field stats so I wonder why you still need it?

Bargs · 2017-04-10T16:19:57Z

I don't think we still need it, we can probably just remove our use of it in this scenario. I just wanted to point out that switching to the field_caps API to get searchable/aggregatable status won't affect this issue because this is a separate problem.

epixa · 2017-05-18T20:05:38Z

The solution here is to not use the field_stats behavior for index patterns. Since Elasticsearch has improved performance that helps when querying across time based indices now, the "expand" behavior for index patterns that we added as the default behavior early in 4.x that leveraged the field_stats API is no longer necessary and can even be harmful. Starting in 5.4.0, we changed the default for all new index patterns to not use the expanding behavior, and we will soon be deprecating that option entirely. I'd recommend anyone on 5.4.0 onward to uncheck the "Expand index pattern when searching" option for all of their index patterns, and this problem should go away.

s1monw · 2017-05-18T20:33:47Z

w00t

Bargs added the :Discovery label Dec 7, 2016

Bargs added Team:Operations Team label for Operations Team :Discovery and removed :Discovery labels Dec 19, 2016

Bargs mentioned this issue Feb 17, 2017

Index pattern creation UX: Remove UI for creating index patterns based on event times #10443

Closed

tbragin added bug Fixes for quality problems that affect the customer experience discuss labels Mar 31, 2017

spalger mentioned this issue Apr 10, 2017

Switch to field_caps API #11014

Closed

spalger mentioned this issue Apr 11, 2017

[IndexPatterns] Support cross cluster search #11114

Merged

epixa closed this as completed May 18, 2017

elasticmachine mentioned this issue Apr 25, 2018

Field stats slowdown with xpack security elastic/elasticsearch#30046

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Super slow Kibana UI on first load .. related to /_field_stats API call. #9386

Super slow Kibana UI on first load .. related to /_field_stats API call. #9386

diranged commented Dec 6, 2016 •

edited

Loading

diranged commented Dec 6, 2016

Bargs commented Dec 7, 2016

diranged commented Dec 10, 2016

Bargs commented Dec 12, 2016

diranged commented Dec 19, 2016

ersushantsood commented Jan 8, 2017

ersushantsood commented Jan 8, 2017

s1monw commented Jan 9, 2017

cdahlqvist commented Jan 9, 2017

laf-rge commented Feb 15, 2017

alexfrancoeur commented Apr 10, 2017

s1monw commented Apr 10, 2017

Bargs commented Apr 10, 2017

s1monw commented Apr 10, 2017

Bargs commented Apr 10, 2017

epixa commented May 18, 2017 •

edited

Loading

s1monw commented May 18, 2017

Super slow Kibana UI on first load .. related to /_field_stats API call. #9386

Super slow Kibana UI on first load .. related to /_field_stats API call. #9386

Comments

diranged commented Dec 6, 2016 • edited Loading

diranged commented Dec 6, 2016

Bargs commented Dec 7, 2016

diranged commented Dec 10, 2016

Masters

Hot data nodes

Warm data nodes

Daily Indexing Volume

Bargs commented Dec 12, 2016

diranged commented Dec 19, 2016

ersushantsood commented Jan 8, 2017

ersushantsood commented Jan 8, 2017

s1monw commented Jan 9, 2017

cdahlqvist commented Jan 9, 2017

laf-rge commented Feb 15, 2017

alexfrancoeur commented Apr 10, 2017

s1monw commented Apr 10, 2017

Bargs commented Apr 10, 2017

s1monw commented Apr 10, 2017

Bargs commented Apr 10, 2017

epixa commented May 18, 2017 • edited Loading

s1monw commented May 18, 2017

diranged commented Dec 6, 2016 •

edited

Loading

`Hot` data nodes

`Warm` data nodes

epixa commented May 18, 2017 •

edited

Loading