New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot/Restore: snapshot during rolling restart of a 2 node cluster might get stuck #9924

Closed
imotov opened this Issue Feb 27, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@imotov
Member

imotov commented Feb 27, 2015

The issue was originally reported in #7980 (comment) If a current master node that contains all primary shards is restarted in the middle of snapshot operation, it might leave the snapshot hanging in ABORTED state.

@cxxr

This comment has been minimized.

cxxr commented Mar 10, 2015

👍 ran into this a few times.

imotov added a commit to imotov/elasticsearch that referenced this issue Mar 12, 2015

imotov added a commit that referenced this issue Mar 12, 2015

imotov added a commit that referenced this issue Mar 12, 2015

@srgclr

This comment has been minimized.

Contributor

srgclr commented Jun 8, 2015

Ran into a similar issue and tried the snapshot cleanup utility. It didn't work as all shards were ignored:

Ignoring shard [[dev1_10_event.2015-03-15][4]] with state [ABORTED] on node [kyU3N9lpTIuTbdeUGp5ThQ] - node exists : [true]

What's the reason for ignoring shards when the node exists?

@imotov

This comment has been minimized.

Member

imotov commented Jun 10, 2015

@srgclr if a node exists and a shard is in ABORTED state it can mean one of the two things - we hit #11314 or the shard is stuck in the I/O operation and we need to wait until the I/O operation is over or we need to restart the node. It's impossible for the cleanup utility to determine which state we are in. Because of this, it takes a safer route - assume that we are stuck in I/O operation and skip such shards.

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

@imotov

This comment has been minimized.

Member

imotov commented Aug 20, 2015

This should be solved by #11450. Closing.

@imotov imotov closed this Aug 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment