New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating Index painfully slow on cluster with large indices #18776
Comments
When you create an index, that causes a change to the routing table. A cluster state update task is submitted to the nodes in the cluster. When that cluster state update task arrives, each node must process the new routing table to see if they need to remove indices, delete shards, start shards, etc. Currently, applying deleted shards is O(number of indices * number of shards). I opened #18788 to address this. However, you're still going to be hurting here. Having 10000 indices on two nodes with one replica is asking for pain. This means that you have at a minimum 10000 shards on each node if you have one shard per index, and maybe 50000 shards on each node if you're using the default number of shards per index. Either way, this is way too many shards. So #18788 is not meant to address your issue directly, just improve performance for the general case. You'll still need to do something about how many indices and shards that you have.
No. |
@jasontedor I suffered from this. Is there any way to improve the index creation speed ? |
@jilen Creating an index requires a cluster state update which can be a slow thing indeed. The issue here was about the degradation in index-creation speed as the number of indices increased; that's what #18788 addressed. I'd say that if you rely on index creation being fast, you probably have an architecture that needs to be reconsidered. |
@jilen to quantify what @jasontedor said (which is very true) - index creation is slow when compared to data level operations like indexing and search. You should expect it to run within a couple of seconds. Also note that we now wait (since 5.0) for the primaries to be fully allocated before responding to the call. |
@jasontedor @bleskes I am now applying Parallel automatically index creationg(via bulk or update api) actually makes the cluster dead(no response). What do you suggest for my situation ? Disable automatically index creation ? |
Don't have an index per user. On Oct 10, 2016 10:22 PM, "jilen" notifications@github.com wrote:
|
An alternative to index per user, is to put all customers in one index, and use your customer's identifier as Elasticsearch's routingId, https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html . Works well in our system with millions of users, and 1 index. |
I have a cluster with 2 nodes and approximate 10000 index.Each index has one replication. Creating index with some preset mapping on this cluster is painfully slow( about 1 minute to create a index with 1 replication). There is sufficient memory , java heap , cpu and disk when creating index. I use hot_threads api and find that 95% period of time is spended on running the following code on master node:
is this a bug? Can I avoid this by setting any configuration?
Elasticsearch version:
2.3.3
JVM version:
1.8
OS version:
debian7
The text was updated successfully, but these errors were encountered: