Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Delayed allocation causing partial allocation of shards on allocation awareness #14010
It is difficult to write out a full repro in words, so I recorded a video of the repro which will help.
The test uses the latest 1.7.2 release.
In short, 6 nodes in cluster, 1 index with 4 shards and 2 replicas (3 copies).
When an updateDomain is killed (3 nodes gone), the cluster shows partial allocation of shards - until a manual _cluster/reroute command is run (without post body) to prod it, or if a command is issued that updates the cluster state (eg. create an index). Once a manual reroute (that doesn't change anything) is run or the cluster state is updated, then the remaining shards are immediately allocated successfully based on the awareness settings.
If delayed allocation is turned off entirely, then everything works fine and there is no need to manually prod it to complete the rest of the allocation.
Note that sometimes, with delayed allocation on, it does do the right thing, but if you retest a few times stopping and restarting the 3 nodes, you will see that it doesn't do so consistently.