Make index creation more user-friendly #9126

Closed
s1monw opened this Issue Jan 2, 2015 · 13 comments

Projects

None yet
@s1monw
Contributor
s1monw commented Jan 2, 2015 edited

Today when we create an index we return immediately after executing sanity checks and adding metadata to the cluster-state. Yet, we don't wait for any kind of allocations etc. such that an index can be created with more replicas than nodes in the cluster and once it's closed it can't be reopened since reopening an index requires a quorum of the replicas for each shard. If such an index is reopened the shards that have no quorum / not enough replicas are found in the cluster will just not be allocated at all.

Unfortunately not even waiting for yellow will help here since it means waiting for the primary to be allocated which might not be enough in the case of #replicas > 1.

There are a couple of things we can do here to improve the situation:

  • Add another wait to the cluster health to wait for quorum
  • By default wait for quorum for the index when an index is created
  • By default reject closing an index if less than the quorum of shards is allocated
@jpountz
Contributor
jpountz commented Jan 2, 2015

+1 This will especially make integration testing less trappy!

@bsandvik

+1 this would help us out a lot

@Mpdreamz
Member
Mpdreamz commented Jul 7, 2015

+1

@synhershko
Contributor

Reject sealing an index if no quorum, too?

@nik9000
Contributor
nik9000 commented Jul 7, 2015

Reject sealing an index if no quorum, too?

I don't think that's a good idea. The worst thing that happens if you do a synced_flush when one of the shards isn't around is that it doesn't effect them. Meaning that shard can't be restored quickly. There isn't really anything that can be done to get that shard to restore quickly regardless because its already offline. I wouldn't want to prevent speeding recovery on the other copies of the shard.

And I think its safe because if, by some nasty turn of events, one of the down shards ends up coming back and being the master shard then the synced flush won't have any effect because it won't be on the master.

@s1monw
Contributor
s1monw commented Jul 9, 2015

@kimchy promised to work on this today

@clintongormley
Member

This wait-for-quorum should be extended to the open-index API, eg see #12987

@clintongormley clintongormley added v2.2.0 and removed v2.1.0 labels Nov 20, 2015
@spinscale spinscale added v2.3.0 and removed v2.2.0 labels Dec 23, 2015
@ywelsch
Contributor
ywelsch commented Feb 14, 2016

@s1monw In v3.0 we allocate primary shard based on allocation IDs #14739. This means that reopening an index only requires 1 good copy (not a quorum anymore). With allocation ids, is this a usability issue now (and not resiliency-related anymore)?

@clintongormley clintongormley added v2.4.0 and removed v2.3.0 labels Mar 16, 2016
@bleskes bleskes added a commit that referenced this issue Apr 7, 2016
@bleskes bleskes Update resliency page
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
557a3d1
@bleskes bleskes added a commit that referenced this issue Apr 7, 2016
@bleskes bleskes Update resiliency page (#17586)
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
8eee28e
@bleskes
Member
bleskes commented May 12, 2016

@clintongormley and I discussed this again and came up with a plan.

There are two issues still left with index creation- the first is that an index creation move the cluster health status to RED even if everything is OK. The second is that when an index creation successfully returns, there is no guarantee that a follow up indexing operation will not have to wait (maybe not so bad) and that all operations on that index (for example _analyze, that needs a shard copy) will succeed.

  • Index Creation puts status to RED:
    Currently RED means that a shard has a non-active primary. The idea is to change the semantics to exclude primary shards that were never successfully assigned and also didn't experience any shard failure during assignment. If the allocation deciders block the allocation of primary (not throttle it) we will treat it as a failure and make the shard red as well. In other cases the shard is YELLOW.
  • Index Creation should wait or enough shard copies to reach started
    The index creation call should add the index to the cluster metadata and wait for enough shard copies (typically only primaries, but this should be based on action.write_consistency) to be started. It will return immediately if the status of one of those shards becomes RED (allocation failure or it can't be assigned to any node), reporting the failure.
@nik9000 nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 16, 2016
@nik9000 nik9000 Another wait_for_yellow
It'd be nice to have #9126!
1820423
@nik9000 nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 16, 2016
@nik9000 nik9000 Another wait_for_yellow
It'd be nice to have #9126!
97267d6
@bleskes bleskes removed the v2.4.0 label May 19, 2016
@nik9000
Contributor
nik9000 commented May 19, 2016

So now that we're testing snippets in the documentation this user-unfriendliness is leaking into the documentation. Which makes the issue pretty obvious. So I'd be pretty excited to have this fixed/make time to do it myself.

@nik9000 nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 19, 2016
@nik9000 nik9000 Another wait_for_yellow
It'd be nice to have #9126!
4d157d1
@abeyad abeyad self-assigned this May 20, 2016
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jun 20, 2016
@abeyad abeyad Index creation does not cause the cluster health to go RED
Previously, index creation would momentarily cause the cluster health to
go RED, because the primaries were still being assigned and activated.
This commit ensures that when an index is created or an index is being
recovered during cluster recovery and it does not have any active
allocation ids, then the cluster health status will not go RED, but
instead be YELLOW.

Relates #9126
ef9e1e5
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jun 20, 2016
@abeyad abeyad Blocked allocations on primary causes RED health
If the allocation decision for a primary shard was NO, this should
cause the cluster health for the shard to go RED, even if the shard
belongs to a newly created index or is part of cluster recovery.

Relates #9126
ef715f7
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jun 22, 2016
@abeyad abeyad Index creation waits for write consistency shards
Before returning, index creation now waits for the write consistency
number of shards to be available. An index can not take any indexing or
other replication operations without the write consistency level of
shards being available anyway, so waiting on the index creation response
in order for this condition to be met makes sense, and allows API users
to not depend on cluster health checks before attempting indexing
operations on the newly created index.

Relates #9126
38dc05a
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jun 24, 2016
@abeyad abeyad Write consistency changed to wait for active shards
Changes the API for waiting on shards from using write_consistency
to wait_for_active_shards, where wait_for_active_shards can take
values from [1, numCopies], 0 means use the default from the new
WAIT_ON_ACTIVE_SHARDS_SETTING, and -1 means all shards.

WAIT_ON_ACTIVE_SHARDS_SETTING replaces WRITE_CONSISTENCY_LEVEL_SETTING

Relates #9126
b511a5c
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jul 4, 2016
@abeyad abeyad Index creation waits for active shards before returning
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates #9126
9ef0dee
@abeyad abeyad added a commit that referenced this issue Jul 11, 2016
@abeyad abeyad Index creation does not cause the cluster health to go RED
Previously, index creation would momentarily cause the cluster health to
go RED, because the primaries were still being assigned and activated.
This commit ensures that when an index is created or an index is being
recovered during cluster recovery and it does not have any active
allocation ids, then the cluster health status will not go RED, but
instead be YELLOW.

Relates #9126
417bd0c
@abeyad abeyad added a commit that referenced this issue Jul 11, 2016
@abeyad abeyad Blocked allocations on primary causes RED health
If the allocation decision for a primary shard was NO, this should
cause the cluster health for the shard to go RED, even if the shard
belongs to a newly created index or is part of cluster recovery.

Relates #9126
0faf638
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jul 11, 2016
@abeyad abeyad Index creation waits for active shards before returning
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates #9126
af48cb9
@abeyad abeyad added a commit to abeyad/elasticsearch that referenced this issue Jul 15, 2016
@abeyad abeyad Index creation waits for active shard copies before returning
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates #9126
6184ba3
@abeyad abeyad added a commit that referenced this issue Jul 15, 2016
@abeyad abeyad Index creation waits for active shard copies before returning (#18985)
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates #9126
d78f40f
@abeyad abeyad closed this in #19450 Jul 15, 2016
@abeyad
Member
abeyad commented Jul 15, 2016

Closed by #19450

@robinst
robinst commented Nov 23, 2016

Just to check, this was released with 5.0.0, right?

@ywelsch
Contributor
ywelsch commented Nov 23, 2016

yes, the version label is on the linked PR #19450 that was used to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment