New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alias creation operation goes very slow when we have more than 100000 aliases #16853
Comments
Hmmmmmm Can you post your method and machine configuration please? |
Every alias change requires a cluster state update. Cluster state updates are published using cluster state diffs. The diff is calculated by checking every existing alias the before cluster state, and whether or not it has been removed in the after cluster state. This is clearly linear, so it makes perfect sense. The benefit of sending cluster state diffs vastly outweighs the advantages of not executing this linear time operation, especially since having 100000 aliases is an anti-pattern. What is more, even if this linear diff operation was removed, the published cluster state updates grow linearly in the number of aliases. |
In my snippet of code I make requests in serialized form (one by one). For the first request it takes around 10 ms and in a linear growth, it goes up to 170 ms for 40000th request. I need to create requests on demand, this is why I can't use bulk format. Please note that in the simple test case, I dedicated a machine that provided with 16 gig ram and a fast SSD hard without any Cluster configuration and other loads. So, it seems that the problem is not related to clustering. In real situation, it event can be worse.I have 2 clustered servers in production and more than 100000 Alias that are made on them. I found for some requests, latency is up to 40s. I really think it is not acceptable. |
On two nodes, for a single index? If my reading of this situation is correct, that strikes me as another anti-pattern.
I think that you should use filters directly on each request. With the information that you've given so far, I doubt that the routing is necessary. |
I have 1 index, 100 shards and a routing policy based on user ids like the following example: |
That is way too many shards for a single index on one or two machines. You'll be fine if you drop this to something more reasonable like two shards. As a bonanza, you can almost surely drop the custom routing. I'm also sorry that the documentation has led you astray here. :(
I don't think that you do, but I could be wrong. Why do you think that you do?
I think that you should drop the number of shards to two, I think that you do not need to use custom routing, and I think that instead of using aliases you should use a filter. |
@malaki12003 we have long spoken about faking an index-per-user using aliases, but this phrasing was ill chosen as it makes people believe that aliases are free and will scale infinitely. Unfortunately the truth is more prosaic. As you have found, alias creation scales linearly. Frankly, I'm impressed that you got to 100k! In earlier versions we struggled to get over 10k. This model works with small numbers of "users" (perhaps we should talk about index-per-tenant instead?) but at the scale you're talking about, you'll have to take a different approach. The problem with aliases is that they are held in memory all the time on every node. But you don't have all 500k users on your site at the same time. Instead, you could move this logic client side and use the user_id to add a routing value and filter to every request. Regarding the number of shards you have, ie 100. Think about this carefully. You're saying that you plan to grow to a cluster with 100 nodes, just for the primary shards. Another 100-200 nodes for the replicas. Do you really plan on a cluster of 300 nodes? And you're sure that you will never reindex (eg to change your mappings) while growing from two nodes to 300? My suggestion is to start with a more realistic number of shards like 5 or 10. |
I've developed a snippet of code so that it tries to create 100000 Aliases on Elasticsearch. I found that as long as number of Aliases was increasing the time takes for creation an Alias is raising as well. It seems complexity of this operation is O(n) that makes no sense.
The text was updated successfully, but these errors were encountered: