Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track Snapshot Version in RepositoryData #50930

Conversation

original-brownbear
Copy link
Member

Add tracking of snapshot versions to RepositoryData to make BwC logic more efficient.
Follow up to #50853

Add tracking of snapshot versions to `RepositoryData` to make BwC logic more efficient.
Follow up to elastic#50853
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.6.0 labels Jan 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/1

// Step 2: Write new index-N blob to repository and update index.latest
setPendingStep.whenComplete(newGen -> threadPool().executor(ThreadPool.Names.SNAPSHOT).execute(ActionRunnable.wrap(listener, l -> {
// BwC logic: Load snapshot version information if any snapshot is missing a version in RepositoryData so that the new
// RepositoryData contains a version for every snapshot
final List<SnapshotId> snapshotIdsWithoutVersion = repositoryData.getSnapshotIds().stream().filter(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat annoying that we have to load all SnapshotInfo here one-off but I think it's still the best solution available.
Ideally, I was hoping to be able to just work from the assumption that if no version is set for a snapshot then it must be from before 8.0/7.6. But I gave up on that idea since it breaks as soon as some older version cluster (for whatever reason), takes a snapshot and removes all the version fields when writing out a new index-N (maybe that's the wrong trade off though, glad to hear opinions on this :) ... obviously you could argue that loading all the SnapshotInfo is too high a price to pay just so the repo can move to the new metadata version earlier, but IMO even on S3 etc. loading 100 snapshots or so won't take all that long or cost any meaningful amount of $$$).

@@ -263,11 +271,24 @@ public void testHandlingMissingRootLevelSnapshotMetadata() throws Exception {
logger.info("--> delete root level snapshot metadata blob for snapshot [{}]", snapshotToCorrupt);
Files.delete(repo.resolve(String.format(Locale.ROOT, BlobStoreRepository.SNAPSHOT_NAME_FORMAT, snapshotToCorrupt.getUUID())));

logger.info("--> strip version information from index-N blob");
final RepositoryData withoutVersions = new RepositoryData(repositoryData.getGenId(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously now that we're writing the version field to RepositoryData using shard generations to identify having any pre-7.6 snapshots or not goes away and we have to fake an old-style snapshot by stripping out the version fields manually.

new ActionListener<>() {
@Override
public void onResponse(Collection<Void> voids) {
logger.info("Successfully loaded all snapshot's version information for {} from snapshot metadata",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info IMO since it's a one time thing and it would be nice to have some marker of the "upgrade" in the logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that snapshotIdsWithoutVersion can be very long. I was wondering if we should just display the size of snapshotIdsWithoutVersion here in the info logging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have the concrete ids of the snapshots to help debug situations where users might be writing to the repo from various ES versions (that's my main motivation to have this). I figured the list isn't going to be so long that it would create real issues due to the log line length so worst case it's a bit of an annoyance right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have the concrete ids of the snapshots to help debug situations where users might be writing to the repo from various ES versions (that's my main motivation to have this)

Why do the exact names and versions matter in that case? Isn't it sufficient to know that some snapshots got version-tagged that previously were not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versions I don't care about much and the exact names aren't important either, but it would be nice to see if the list of snapshots changed if this gets logged repeatedly.
Especially on Cloud just having just the size which might be effectively constant over time seems like it could create an annoying situation when debugging :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use firstListElementsToCommaDelimitedString then (see AllocationService)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice thanks => reused in 02e8c4e (can probably move that method to a more appropriate place in a follow up)

public void onFailure(Exception e) {
logger.warn("Failure when trying to load missing version information from snapshot metadata", e);
}
}, () -> filterRepositoryDataStep.onResponse(repositoryData.withVersions(updatedVersionMap))),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not great but just like in last week's discussion of SnapshotsService not introducing any new breakage here so we take whatever version information we can get and run with it even on exceptions.

@original-brownbear
Copy link
Member Author

@ywelsch the test failure in a previous run here seems like it might be interesting to you:

  1> [2020-01-13T22:49:34,516][INFO ][o.e.s.DedicatedClusterSnapshotRestoreIT] [testMasterShutdownDuringSnapshot] -->  creating repository
  1> [2020-01-13T22:49:34,517][INFO ][o.e.r.RepositoriesService] [node_tm0] put repository [test-repo]
  1> [2020-01-13T22:49:34,528][ERROR][o.e.g.G.AsyncLucenePersistedState] [node_td2] Exception occurred when storing new meta data
  1> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.gateway.GatewayMetaState$AsyncLucenePersistedState$1@13966630 on EsThreadPoolExecutor[name = node_td2/AsyncLucenePersistedState#updateTask, queue capacity = 1, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@17a9f09a[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 0]]
  1> 	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355) ~[?:?]
  1> 	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:84) [main/:?]
  1> 	at org.elasticsearch.gateway.GatewayMetaState$AsyncLucenePersistedState.scheduleUpdate(GatewayMetaState.java:316) [main/:?]
  1> 	at org.elasticsearch.gateway.GatewayMetaState$AsyncLucenePersistedState.setLastAcceptedState(GatewayMetaState.java:308) [main/:?]
  1> 	at org.elasticsearch.cluster.coordination.CoordinationState.handlePublishRequest(CoordinationState.java:345) [main/:?]
  1> 	at org.elasticsearch.cluster.coordination.Coordinator.handlePublishRequest(Coordinator.java:316) [main/:?]
  1> 	at org.elasticsearch.cluster.coordination.PublicationTransportHandler.acceptState(PublicationTransportHandler.java:405) [main/:?]
  1> 	at org.elasticsearch.cluster.coordination.PublicationTransportHandler.handleIncomingPublishRequest(PublicationTransportHandler.java:385) [main/:?]
  1> 	at org.elasticsearch.cluster.coordination.PublicationTransportHandler.lambda$new$0(PublicationTransportHandler.java:96) [main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [main/:?]
  1> 	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:262) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:688) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 	at java.lang.Thread.run(Thread.java:834) [?:?]

even though a snapshot test failed here it looks like this is related to the new CS persistence layer instead of my changes.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, looking good o.w.

new ActionListener<>() {
@Override
public void onResponse(Collection<Void> voids) {
logger.info("Successfully loaded all snapshot's version information for {} from snapshot metadata",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that snapshotIdsWithoutVersion can be very long. I was wondering if we should just display the size of snapshotIdsWithoutVersion here in the info logging.

@original-brownbear
Copy link
Member Author

Thanks Yannick!

@original-brownbear original-brownbear merged commit e349c5e into elastic:master Jan 14, 2020
@original-brownbear original-brownbear deleted the store-snapshot-version-in-index-N branch January 14, 2020 16:15
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 14, 2020
Add tracking of snapshot versions to RepositoryData to make BwC logic more efficient.
Follow up to elastic#50853
original-brownbear added a commit that referenced this pull request Jan 14, 2020
* Track Snapshot Version in RepositoryData (#50930)

Add tracking of snapshot versions to RepositoryData to make BwC logic more efficient.
Follow up to #50853
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Add tracking of snapshot versions to RepositoryData to make BwC logic more efficient.
Follow up to elastic#50853
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants