Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Inconsistent Shard Failure Count in Failed Snapshots #51416

Merged
merged 2 commits into from
Jan 24, 2020

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jan 24, 2020

This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count when the snapshot was finalized directly out of the INIT stage which is inconsistent with the snapshot status API returns which will show the shard as failed with message "skipped", also it broke repository consistency tests.

Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct index.latest blob exists
after each run.

Closes #47550

This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count which seems wrong and
broke repository consistency tests.

Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct `index.latest` blob exists
after each run.

Closes elastic#47550
@original-brownbear original-brownbear added >bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.7.0 labels Jan 24, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks Tanguy!

@original-brownbear original-brownbear merged commit c9c60bc into elastic:master Jan 24, 2020
@original-brownbear original-brownbear deleted the fix-47550 branch January 24, 2020 15:29
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 24, 2020
* Fix Inconsistent Shard Failure Count in Failed Snapshots

This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count which seems wrong and
broke repository consistency tests.

Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct `index.latest` blob exists
after each run.

Closes elastic#47550
original-brownbear added a commit that referenced this pull request Jan 24, 2020
…1426)

* Fix Inconsistent Shard Failure Count in Failed Snapshots

This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count which seems wrong and
broke repository consistency tests.

Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct `index.latest` blob exists
after each run.

Closes #47550
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Mar 31, 2020
* Fix Inconsistent Shard Failure Count in Failed Snapshots

This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count which seems wrong and
broke repository consistency tests.

Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct `index.latest` blob exists
after each run.

Closes elastic#47550
original-brownbear added a commit that referenced this pull request Mar 31, 2020
…4480)

* Fix Inconsistent Shard Failure Count in Failed Snapshots

This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count which seems wrong and
broke repository consistency tests.

Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct `index.latest` blob exists
after each run.

Closes #47550
@original-brownbear original-brownbear restored the fix-47550 branch August 6, 2020 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.6.3 v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Failure in org.elasticsearch.snapshots.SnapshotResiliencyTests.testSnapshotWithNodeDisconnects
4 participants