Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make index creation more user-friendly #9126

Closed
s1monw opened this issue Jan 2, 2015 · 13 comments

Comments

Projects
None yet
@s1monw
Copy link
Contributor

commented Jan 2, 2015

Today when we create an index we return immediately after executing sanity checks and adding metadata to the cluster-state. Yet, we don't wait for any kind of allocations etc. such that an index can be created with more replicas than nodes in the cluster and once it's closed it can't be reopened since reopening an index requires a quorum of the replicas for each shard. If such an index is reopened the shards that have no quorum / not enough replicas are found in the cluster will just not be allocated at all.

Unfortunately not even waiting for yellow will help here since it means waiting for the primary to be allocated which might not be enough in the case of #replicas > 1.

There are a couple of things we can do here to improve the situation:

  • Add another wait to the cluster health to wait for quorum
  • By default wait for quorum for the index when an index is created
  • By default reject closing an index if less than the quorum of shards is allocated
@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jan 2, 2015

+1 This will especially make integration testing less trappy!

@bsandvik

This comment has been minimized.

Copy link

commented Jan 20, 2015

+1 this would help us out a lot

@Mpdreamz

This comment has been minimized.

Copy link
Member

commented Jul 7, 2015

+1

@synhershko

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2015

Reject sealing an index if no quorum, too?

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2015

Reject sealing an index if no quorum, too?

I don't think that's a good idea. The worst thing that happens if you do a synced_flush when one of the shards isn't around is that it doesn't effect them. Meaning that shard can't be restored quickly. There isn't really anything that can be done to get that shard to restore quickly regardless because its already offline. I wouldn't want to prevent speeding recovery on the other copies of the shard.

And I think its safe because if, by some nasty turn of events, one of the down shards ends up coming back and being the master shard then the synced flush won't have any effect because it won't be on the master.

@s1monw

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2015

@kimchy promised to work on this today

@clintongormley

This comment has been minimized.

Copy link
Member

commented Aug 24, 2015

This wait-for-quorum should be extended to the open-index API, eg see #12987

@ywelsch

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2016

@s1monw In v3.0 we allocate primary shard based on allocation IDs #14739. This means that reopening an index only requires 1 good copy (not a quorum anymore). With allocation ids, is this a usability issue now (and not resiliency-related anymore)?

@clintongormley clintongormley added v2.4.0 and removed v2.3.0 labels Mar 16, 2016

bleskes added a commit that referenced this issue Apr 7, 2016

Update resliency page
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.

bleskes added a commit that referenced this issue Apr 7, 2016

Update resiliency page (#17586)
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
@bleskes

This comment has been minimized.

Copy link
Member

commented May 12, 2016

@clintongormley and I discussed this again and came up with a plan.

There are two issues still left with index creation- the first is that an index creation move the cluster health status to RED even if everything is OK. The second is that when an index creation successfully returns, there is no guarantee that a follow up indexing operation will not have to wait (maybe not so bad) and that all operations on that index (for example _analyze, that needs a shard copy) will succeed.

  • Index Creation puts status to RED:
    Currently RED means that a shard has a non-active primary. The idea is to change the semantics to exclude primary shards that were never successfully assigned and also didn't experience any shard failure during assignment. If the allocation deciders block the allocation of primary (not throttle it) we will treat it as a failure and make the shard red as well. In other cases the shard is YELLOW.
  • Index Creation should wait or enough shard copies to reach started
    The index creation call should add the index to the cluster metadata and wait for enough shard copies (typically only primaries, but this should be based on action.write_consistency) to be started. It will return immediately if the status of one of those shards becomes RED (allocation failure or it can't be assigned to any node), reporting the failure.
@nik9000

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

So now that we're testing snippets in the documentation this user-unfriendliness is leaking into the documentation. Which makes the issue pretty obvious. So I'd be pretty excited to have this fixed/make time to do it myself.

@abeyad abeyad self-assigned this May 20, 2016

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Jun 20, 2016

Ali Beyad
Index creation does not cause the cluster health to go RED
Previously, index creation would momentarily cause the cluster health to
go RED, because the primaries were still being assigned and activated.
This commit ensures that when an index is created or an index is being
recovered during cluster recovery and it does not have any active
allocation ids, then the cluster health status will not go RED, but
instead be YELLOW.

Relates elastic#9126

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Jun 20, 2016

Ali Beyad
Blocked allocations on primary causes RED health
If the allocation decision for a primary shard was NO, this should
cause the cluster health for the shard to go RED, even if the shard
belongs to a newly created index or is part of cluster recovery.

Relates elastic#9126

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Jul 4, 2016

Ali Beyad
Index creation waits for active shards before returning
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates elastic#9126

abeyad pushed a commit that referenced this issue Jul 11, 2016

Ali Beyad
Index creation does not cause the cluster health to go RED
Previously, index creation would momentarily cause the cluster health to
go RED, because the primaries were still being assigned and activated.
This commit ensures that when an index is created or an index is being
recovered during cluster recovery and it does not have any active
allocation ids, then the cluster health status will not go RED, but
instead be YELLOW.

Relates #9126

abeyad pushed a commit that referenced this issue Jul 11, 2016

Ali Beyad
Blocked allocations on primary causes RED health
If the allocation decision for a primary shard was NO, this should
cause the cluster health for the shard to go RED, even if the shard
belongs to a newly created index or is part of cluster recovery.

Relates #9126

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Jul 11, 2016

Ali Beyad
Index creation waits for active shards before returning
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates elastic#9126

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Jul 15, 2016

Ali Beyad
Index creation waits for active shard copies before returning
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates elastic#9126

abeyad pushed a commit that referenced this issue Jul 15, 2016

Ali Beyad
Index creation waits for active shard copies before returning (#18985)
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.

Relates #9126
@abeyad

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2016

Closed by #19450

@robinst

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2016

Just to check, this was released with 5.0.0, right?

@ywelsch

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2016

yes, the version label is on the linked PR #19450 that was used to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.