index.auto_expand_replicas and shard allocation settings don't play well together #2869

Open
synhershko opened this Issue Apr 7, 2013 · 8 comments

6 participants

@synhershko

With an index configured with auto_expand_replicas="0-all" the cluster will try to allocate one primary and ({number nodes} - 1) replicas for each shard, ie a copy of each shard on each node.

However, with "index.routing.allocation.include.zone"="zone1" the cluster is blocked from allocating a shard (either primary or replica) to any nodes that are not configured with a zone attribute set to "zone1", ie "node.zone=zone1" in the elasticsearch.yaml config file. So if a cluster has 3 nodes in "zone1" and 3 nodes in "zone2" it will allocate 3 shards (primary + 2 x replicas) to the "zone1" nodes and mark an additional 3 replicas as unassigned. I observed this using the elasticsearch head utility.

So the allocation behaviour is as desired, ie auto expand replicas to all nodes with a specific zone attribute, but the unassigned shards will make the cluster be in red state

See https://groups.google.com/forum/#!msg/elasticsearch/95hC-wGu7GE/BPPSWsfj8UkJ

@synhershko

using latest master I should add

@clintongormley
elastic member

I'm wondering if we should deprecate auto-expand... It just seems like the wrong solution

@synhershko

IMO no, auto-expand is a very useful feature and a way for clusters to automatically grow to accommodate peaks with read-heavy operations. It's just somewhat broken when shard-allocation rules aren't trivial.

I admit tho, auto-expand does require multi-cast to be enabled (or predefining IPs for the machines dynamically provisioned during peaks), so at this point it does seem more like of an exotic feature.

@nik9000
@markharwood markharwood added the adoptme label Sep 5, 2014
@markharwood

Attached an "adoptme" tag as this needs some further looking into to establish if this is an issue.
The auto-expand feature is used internally to keep a copy of the .scripts index (stored scripts) on each node so this does need to work.

@markharwood markharwood removed the discuss label Sep 5, 2014
@clintongormley
elastic member

Having reread this issue, it seems that if we were able to limit the max number of auto-expand replicas to the number of nodes that would allow allocation (eg awareness rules) then we'd avoid the yellow status.

Not sure how feasible this is.

@colings86
elastic member

@dakrone Do you know whether @clintongormley suggestion above is feasible?

@dakrone
elastic member

@colings86 @clintongormley I looked at where the setting is updated, it's
currently updated in MetaDataUpdateSettingsService, which submits a new
cluster state update to change the number of replicas if it detects that the
setting must be changed.

I think it would be possible to try and use the AllocationDeciders to adjust the
number of replicas to the number of nodes where it can be allocated, but I don't
think it's trivial.

Maybe it should be a separate setting? Instead of 0-all it can be
0-available or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment