Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Index painfully slow on cluster with large indices #18776

Closed
puuzll opened this issue Jun 8, 2016 · 7 comments
Closed

Creating Index painfully slow on cluster with large indices #18776

puuzll opened this issue Jun 8, 2016 · 7 comments

Comments

@puuzll
Copy link

puuzll commented Jun 8, 2016

I have a cluster with 2 nodes and approximate 10000 index.Each index has one replication. Creating index with some preset mapping on this cluster is painfully slow( about 1 minute to create a index with 1 replication). There is sufficient memory , java heap , cpu and disk when creating index. I use hot_threads api and find that 95% period of time is spended on running the following code on master node:

    at com.google.common.collect.Iterators$3.hasNext(Iterators.java:164)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyDeletedShards(IndicesClusterStateService.java:256)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:167)
    - locked <0x00000000f96ea8b0> (a java.lang.Object)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

is this a bug? Can I avoid this by setting any configuration?
Elasticsearch version:
2.3.3
JVM version:
1.8
OS version:
debian7

@jasontedor
Copy link
Member

jasontedor commented Jun 8, 2016

is this a bug?

When you create an index, that causes a change to the routing table. A cluster state update task is submitted to the nodes in the cluster. When that cluster state update task arrives, each node must process the new routing table to see if they need to remove indices, delete shards, start shards, etc. Currently, applying deleted shards is O(number of indices * number of shards). I opened #18788 to address this.

However, you're still going to be hurting here. Having 10000 indices on two nodes with one replica is asking for pain. This means that you have at a minimum 10000 shards on each node if you have one shard per index, and maybe 50000 shards on each node if you're using the default number of shards per index. Either way, this is way too many shards. So #18788 is not meant to address your issue directly, just improve performance for the general case. You'll still need to do something about how many indices and shards that you have.

Can I avoid this by setting any configuration?

No.

@jilen
Copy link

jilen commented Oct 8, 2016

@jasontedor I suffered from this. Is there any way to improve the index creation speed ?

@jasontedor
Copy link
Member

@jilen Creating an index requires a cluster state update which can be a slow thing indeed. The issue here was about the degradation in index-creation speed as the number of indices increased; that's what #18788 addressed. I'd say that if you rely on index creation being fast, you probably have an architecture that needs to be reconsidered.

@bleskes
Copy link
Contributor

bleskes commented Oct 10, 2016

@jilen to quantify what @jasontedor said (which is very true) - index creation is slow when compared to data level operations like indexing and search. You should expect it to run within a couple of seconds. Also note that we now wait (since 5.0) for the primaries to be fully allocated before responding to the call.

@jilen
Copy link

jilen commented Oct 11, 2016

@jasontedor @bleskes I am now applying one-index-per-user pattern, there are actually more than 20k shards.

Parallel automatically index creationg(via bulk or update api) actually makes the cluster dead(no response).

What do you suggest for my situation ? Disable automatically index creation ?

@nik9000
Copy link
Member

nik9000 commented Oct 11, 2016

Don't have an index per user.

On Oct 10, 2016 10:22 PM, "jilen" notifications@github.com wrote:

@jasontedor https://github.com/jasontedor I am now applying one-index-per-user
pattern, there are actually more than 20k shards.

Parallel automatically index creationg(via bulk or update api) actually
makes the cluster dead(no response).

What do you suggest for my situation ? Disable automatically index
creation ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#18776 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANLovR2fcqDQQkHcEd0qSWIhqX2DI-9ks5qyvLNgaJpZM4IwrBH
.

@NelsonBurton
Copy link

NelsonBurton commented Jul 29, 2021

An alternative to index per user, is to put all customers in one index, and use your customer's identifier as Elasticsearch's routingId, https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html . Works well in our system with millions of users, and 1 index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants