Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943

original-brownbear · 2019-04-08T08:37:48Z

The existence of a snapshot in the CS with state SUCCESS means that there is still a snapshot ongoing (it must still be finalized in the repository). Hence this change translates the SUCCESS state in the CS to in progress on the transport layer.
closes ClusterPrivilegeTests.testThatSnapshotAndRestore failing in master #38030

* The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of rafactoring the repository interaction, this added check fixes the test for now * closes elastic#38030

elasticmachine · 2019-04-08T08:37:50Z

Pinging @elastic/es-security

original-brownbear · 2019-04-08T09:02:12Z

Jenkins run elasticsearch-ci/2

original-brownbear · 2019-04-08T10:17:25Z

Jenkins run elasticsearch-ci/1

albertzaharovits · 2019-04-11T14:19:41Z

x-pack/plugin/security/src/test/java/org/elasticsearch/integration/ClusterPrivilegeTests.java

+            SnapshotsInProgress response =
+                client().admin().cluster().state(new ClusterStateRequest()).get().getState().custom(SnapshotsInProgress.TYPE);
+            assertThat(response.entries(), Matchers.empty());
+        });


@original-brownbear you've got my rubber stamp LGTM , but shouldn't we fix the prepareSnapshotStatus(repo).setSnapshots(snapshot) API to not return SUCCESS if there is still an SnapshotsInProgress.TYPE entry?

albertzaharovits

LGTM but I think the snapshot status API needs fix so that it returns success consistently with respect to the SnapshotsInProgress.TYPE metadata entry. If, for some obscure reason this is not possible, I believe a comment in the main code (similar to the one here) is required. If/when the change in main lands, I would remove this extra condition here.

original-brownbear · 2019-04-11T14:50:52Z

@albertzaharovits thanks for taking a look :) You know ... I just realized these tests run against a single node => we shouldn't have any races here with the state like that => I'm double checking the solution, sorry for the noise!

…-handle-snapshot-service

This reverts commit b3871f3.

…-handle-snapshot-service

original-brownbear · 2019-04-12T07:06:31Z

Ok, this is a mess :) I was able to fix the API (at least for the single node case) via f7c4afb

We have two APIs for getting the current list of snapshots, org.elasticsearch.snapshots.SnapshotsService#currentSnapshots(java.lang.String) and org.elasticsearch.snapshots.SnapshotsService#currentSnapshots(java.lang.String, java.util.List<java.lang.String>).

One of them returns org.elasticsearch.snapshots.SnapshotInfo and correctly translates a SUCCESS state in the cluster state as still technically IN_PROGRESS and sets that state on the SnapshotInfo.
The other one (used by org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction which is what this test uses) returns the raw SnapshotsInProgress.Entry list, so SUCCESS stays SUCCESS when in actuality the existence of a SUCCESS entry means that we still have to finalize the snapshot in the repository potentially. Due to the fact that the calls to SnapshotService are used in multiple places, I decided to for now just hack the fix of translating success for ongoing snapshots to "in progress/started" into the transport API that isn't used by the snapshot process itself. This fixed the test broken here for me and also makes the REST API a little more useful :)

albertzaharovits · 2019-04-16T12:05:00Z

@original-brownbear If I understand it correctly, every SnapshotsInProgress.Entry entry is a sign that the snapshot is "in-progress", ie SnapshotState.IN_PROGRESS or SnapshotsInProgress.State.STARTED or SnapshotsInProgress.State.SUCCESS mean the same thing from the "on-progress" perspective.

I would defer to others to lean on the API change. If you don't mind, I'll shirk away from the reviewers group.
If I were you, I would've done the state == SnapshotsInProgress.State.SUCCESS ? SnapshotsInProgress.State.STARTED : state hack on the client side, even though it's apparently harder. For example, if we go with this proposed API change I think we need an accompanying test of its own. And the PR label needs to be changed as well, and probably further backporting the change. Again, I will leave to others for authoritative suggestions.

But, can't the test(s) use TransportGetSnapshotsAction, for "in-progress" polling, which internally uses the "correctly" functioning SnapshotService#currentSnapshots ?

original-brownbear · 2019-04-16T12:20:00Z

@albertzaharovits

I would defer to others to lean on the API change. If you don't mind, I'll shirk away from the reviewers group.

Sure no worries, thanks for the input so far!

But, can't the test(s) use TransportGetSnapshotsAction, for "in-progress" polling, which internally uses the "correctly" functioning SnapshotService#currentSnapshots ?

Yea, but I figured we're seeing an actual bug in the API here, so why not fix it. We don't have other code dependent on this API it seems and it's just a 3 line change to get rid of a mean inconsistency here :)

hack on the client side, even though it's apparently harder.

I don't think we can really do anything on the client side here. To the client, the reason for SUCCESS is not visible (it doesn't know if the snapshot state was read from the repo or cluster state) -> it can't translate SUCCESS conditionally.

ywelsch · 2019-05-02T13:03:31Z

Reusing the same enum for snapshots that are in progress and those that have completed has created this odd behavior where users can't distinguish between an in-progress snapshot that's in its finalization step and one where finalization was successfully completed. The same applies to the FAILED state as well. Even worse, we reuse the same enum across shard snapshot status and global snapshot status, which has caused even more confusion. I think it's a good time now to start separating these instead of adding a workaround. We should fix the test first without making changes to ES so that we can get test coverage back. Waiting for successful snapshot can be done by having the test first check whether GET /_snapshot/my_backup/_current returns a SnapshotMissingException and then do the snapshot status check.

…-handle-snapshot-service

…om:original-brownbear/elasticsearch into more-aggressive-cs-handle-snapshot-service

original-brownbear · 2019-05-02T13:33:14Z

@ywelsch alright, no hack it is :) I just reverted to my initial check here and simply made the test verify that the snapshot is gone from the CS like we do in other tests.

Can you take another look?

original-brownbear · 2019-05-07T11:52:49Z

@ywelsch ping :)

* Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates elastic#40943 (comment) * Moved the contents of the class around a little so fields, constructors and nested classes/enums aren't all over the place especially now that we have yet another nested enum here * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class

ywelsch

LGTM

original-brownbear · 2019-05-21T17:21:58Z

thanks @ywelsch !

* master: (176 commits) Avoid unnecessary persistence of retention leases (elastic#42299) [ML][TEST] Fix limits in AutodetectMemoryLimitIT (elastic#42279) [ML Data Frame] Persist and restore checkpoint and position (elastic#41942) mute failing filerealm hash caching tests (elastic#42304) Safer Wait for Snapshot Success in ClusterPrivilegeTests (elastic#40943) Remove 7.0.2 (elastic#42282) Revert "Remove 7.0.2 (elastic#42282)" [DOCS] Copied note on slicing support to Slicing section. Closes 26114 (elastic#40426) Remove 7.0.2 (elastic#42282) Mute all ml_datafeed_crud rolling upgrade tests Move the FIPS configuration back to the build plugin (elastic#41989) Remove stray back tick that's messing up table format (elastic#41705) Add missing comma in code section (elastic#41678) add 7.1.1 and 6.8.1 versions (elastic#42253) Use spearate testkit dir for each run (elastic#42013) Add experimental and warnings to vector functions (elastic#42205) Fix version in tests since elastic#41906 was merged Bump version in BWC check after backport Prevent in-place downgrades and invalid upgrades (elastic#41731) Mute date_histo interval bwc test ...

* Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates #40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class

* master: (82 commits) Fix off-by-one error in an index shard test Cleanup Redundant BlobStoreFormat Class (elastic#42195) remove backcompat handling of 6.2.x versions (elastic#42044) Mute testDelayedOperationsBeforeAndAfterRelocated Execute actions under permit in primary mode only (elastic#42241) Mute another transforms_stats yaml test Deprecate support for chained multi-fields. (elastic#41926) Mute transforms_stats yaml test Make unwrapCorrupt Check Suppressed Ex. (elastic#41889) Remove Dead Code from Azure Repo Plugin (elastic#42178) Reorganize Painless doc structure (elastic#42303) Avoid unnecessary persistence of retention leases (elastic#42299) [ML][TEST] Fix limits in AutodetectMemoryLimitIT (elastic#42279) [ML Data Frame] Persist and restore checkpoint and position (elastic#41942) mute failing filerealm hash caching tests (elastic#42304) Safer Wait for Snapshot Success in ClusterPrivilegeTests (elastic#40943) Remove 7.0.2 (elastic#42282) Revert "Remove 7.0.2 (elastic#42282)" [DOCS] Copied note on slicing support to Slicing section. Closes 26114 (elastic#40426) Remove 7.0.2 (elastic#42282) ...

…ic#41940) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates elastic#40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class

* Safer Wait for Snapshot Success in ClusterPrivilegeTests * The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now * closes elastic#38030

…ic#41940) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates elastic#40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class

…2575) * Safer Wait for Snapshot Success in ClusterPrivilegeTests * The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now * closes #38030

… (#42573) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates #40943 (comment) * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class

original-brownbear added >test Issues or PRs that are addressing/adding tests :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.2.0 labels Apr 8, 2019

original-brownbear requested review from andrershov and albertzaharovits April 8, 2019 08:37

original-brownbear mentioned this pull request Apr 8, 2019

ClusterPrivilegeTests.testThatSnapshotAndRestore failing in master #38030

Closed

original-brownbear requested a review from DaveCTurner April 10, 2019 12:40

albertzaharovits reviewed Apr 11, 2019

View reviewed changes

albertzaharovits previously approved these changes Apr 11, 2019

View reviewed changes

original-brownbear added the WIP label Apr 11, 2019

original-brownbear added 2 commits April 11, 2019 16:51

Merge remote-tracking branch 'elastic/master' into more-aggressive-cs…

5a3cb51

…-handle-snapshot-service

Revert "Safer Wait for Snapshot Success in ClusterPrivilegeTests"

db79de8

This reverts commit b3871f3.

original-brownbear removed request for andrershov and DaveCTurner April 11, 2019 14:55

original-brownbear added 2 commits April 12, 2019 08:13

fixed again

f7c4afb

Merge remote-tracking branch 'elastic/master' into more-aggressive-cs…

ef03fb7

…-handle-snapshot-service

original-brownbear removed the WIP label Apr 12, 2019

original-brownbear requested review from albertzaharovits, DaveCTurner and andrershov April 12, 2019 09:15

albertzaharovits removed their request for review April 16, 2019 12:05

original-brownbear added 3 commits May 2, 2019 15:09

Merge remote-tracking branch 'elastic/master' into more-aggressive-cs…

7945e2d

…-handle-snapshot-service

Merge branch 'more-aggressive-cs-handle-snapshot-service' of github.c…

2239a35

…om:original-brownbear/elasticsearch into more-aggressive-cs-handle-snapshot-service

CR: test only fix

911224f

original-brownbear mentioned this pull request May 7, 2019

Support multiple repositories in get snapshot request #41799

Closed

original-brownbear mentioned this pull request May 8, 2019

Introduce ShardState Enum + Slight Cleanup SnapshotsInProgress #41940

Merged

ywelsch approved these changes May 21, 2019

View reviewed changes

original-brownbear merged commit 753726c into elastic:master May 21, 2019

original-brownbear added backport pending v7.3.0 and removed v7.2.0 labels May 21, 2019

original-brownbear deleted the more-aggressive-cs-handle-snapshot-service branch May 21, 2019 17:22

original-brownbear mentioned this pull request May 27, 2019

Introduce ShardState Enum + Slight Cleanup SnapshotsInProgress (#41940) #42573

Merged

original-brownbear removed the backport pending label May 27, 2019

original-brownbear mentioned this pull request May 27, 2019

Safer Wait for Snapshot Success in ClusterPrivilegeTests (#40943) #42575

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943

Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943

original-brownbear commented Apr 8, 2019 •

edited

Loading

elasticmachine commented Apr 8, 2019

original-brownbear commented Apr 8, 2019

original-brownbear commented Apr 8, 2019

albertzaharovits Apr 11, 2019

albertzaharovits left a comment

original-brownbear commented Apr 11, 2019

original-brownbear commented Apr 12, 2019

albertzaharovits commented Apr 16, 2019

original-brownbear commented Apr 16, 2019

ywelsch commented May 2, 2019

original-brownbear commented May 2, 2019

original-brownbear commented May 7, 2019

ywelsch left a comment

original-brownbear commented May 21, 2019

Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943

Safer Wait for Snapshot Success in ClusterPrivilegeTests #40943

Conversation

original-brownbear commented Apr 8, 2019 • edited Loading

elasticmachine commented Apr 8, 2019

original-brownbear commented Apr 8, 2019

original-brownbear commented Apr 8, 2019

albertzaharovits Apr 11, 2019

Choose a reason for hiding this comment

albertzaharovits left a comment

Choose a reason for hiding this comment

original-brownbear commented Apr 11, 2019

original-brownbear commented Apr 12, 2019

albertzaharovits commented Apr 16, 2019

original-brownbear commented Apr 16, 2019

ywelsch commented May 2, 2019

original-brownbear commented May 2, 2019

original-brownbear commented May 7, 2019

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented May 21, 2019

original-brownbear commented Apr 8, 2019 •

edited

Loading