Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidates the logic for cleaning up snapshots on master election #24894

Merged
merged 2 commits into from Jun 2, 2017

Conversation

Projects
None yet
4 participants
@abeyad
Copy link
Contributor

commented May 25, 2017

In #24605, logic was implemented to ensure that completed snapshots were
properly removed from the cluster state upon a change in master nodes.
This commit removes redundant logic that also attempted to clean up
completed snapshots from the cluster state on master election, but only
covered a limited case that was remedied in #24605.

This commit also adds a test to ensure cleaning up of completed
snapshots at the right moment in time when a master election happens
before finalizing a snapshot, as well as adds a check to handle the case
where the old master and new master could attempt to finalize the
snapshot and write the same blob to the repository simultaneously.

@imotov

imotov approved these changes May 25, 2017

Copy link
Member

left a comment

The change itself LGTM, but would be great to add the test as we discussed.

@abeyad abeyad force-pushed the abeyad:check_failed_snapshot_on_new_master branch May 28, 2017

@abeyad abeyad changed the title Ends a failed snapshot on a new master Consolidates the logic for cleaning up snapshots on master election May 28, 2017

@abeyad

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2017

@imotov I updated the PR based on the fact that we already take care of this case in #24605, so this PR adds a test for it and removes redundant duplicated logic because #24605 handles the general case of cleaning up completed snapshots.

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java Outdated
// if another master was elected and took over finalizing the snapshot, it is possible
// that both nodes try to finalize the snapshot and write to the same blobs, so we just
// log a warning here and carry on
logger.warn("Blob already exists while finalizing snapshot, assume the snapshot has already been saved", ex);

This comment has been minimized.

Copy link
@imotov

imotov May 30, 2017

Member

I am not quite sure why this needs to be treated differently and it seems a bit dangerous to me. In both case we should remove the snapshot record form the cluster state as the next step.

This comment has been minimized.

Copy link
@abeyad

abeyad May 30, 2017

Author Contributor

@imotov whether this method throws an exception or not, we still proceed with removeSnapshotFromClusterState, so this change doesn't effect that. This change would only introduce a difference in the response reported back to the user - success (with this change) vs an exception. I can see your point in that it is not necessary however, in that even if the response comes back with a 500 or something, the user can always check the snapshot status to see if the snapshot itself failed or not.

This comment has been minimized.

Copy link
@imotov

imotov May 30, 2017

Member

@abeyad I think in this case 500 is probably fine since we are dealing with a fault condition anyway (old master didn't leave the cluster cleanly). I think it should be an error, because we don't really know if this was the result of a rogue master or a bug in our code, but we might want to improve the error message

This comment has been minimized.

Copy link
@abeyad

abeyad May 30, 2017

Author Contributor

@imotov ok, in that case I will change it to still throw the RepositoryException but provide a better exception message for the FileAlreadyExistsException case

Ali Beyad added some commits May 28, 2017

Ali Beyad
Consolidates the logic for cleaning up snapshots on master election
In #24605, logic was implemented to ensure that completed snapshots were
properly removed from the cluster state upon a change in master nodes.
This commit removes redundant logic that also attempted to clean up
completed snapshots from the cluster state on master election, but only
covered a limited case that was remedied in #24605.

This commit also adds a test to ensure cleaning up of completed
snapshots at the right moment in time when a master election happens
before finalizing a snapshot, as well as adds a check to handle the case
where the old master and new master could attempt to finalize the
snapshot and write the same blob to the repository simultaneously.
Ali Beyad

@abeyad abeyad force-pushed the abeyad:check_failed_snapshot_on_new_master branch to 8cb9ec2 Jun 1, 2017

@abeyad

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2017

@imotov I pushed 8cb9ec2 to address your feedback

@abeyad abeyad merged commit 3cb3074 into elastic:master Jun 2, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@abeyad abeyad deleted the abeyad:check_failed_snapshot_on_new_master branch Jun 2, 2017

abeyad pushed a commit that referenced this pull request Jun 2, 2017

Ali Beyad
Consolidates the logic for cleaning up snapshots on master election (#…
…24894)

In #24605, logic was implemented to ensure that completed snapshots were
properly removed from the cluster state upon a change in master nodes.
This commit removes redundant logic that also attempted to clean up
completed snapshots from the cluster state on master election, but only
covered a limited case that was remedied in #24605.

This commit also adds a test to ensure cleaning up of completed
snapshots at the right moment in time when a master election happens
before finalizing a snapshot, as well as adds a check to handle the case
where the old master and new master could attempt to finalize the
snapshot and write the same blob to the repository simultaneously.

@abeyad abeyad removed the v5.4.2 label Jun 2, 2017

@colings86 colings86 added v6.0.0-beta1 and removed v6.0.0 labels Jul 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.