Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add peer recovery planners that take into account available snapshots #75840

Merged
merged 30 commits into from
Aug 9, 2021

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jul 29, 2021

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot.

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. This commit doesn't wire these classes
yet to avoid adding additional noise.
@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 29, 2021

@elasticmachine run elasticsearch-ci/part-2

(it was a docker-compose problem)

@fcofdez fcofdez requested a review from DaveCTurner July 29, 2021 16:21
@fcofdez fcofdez added v7.15.0 :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team labels Jul 29, 2021
@fcofdez fcofdez marked this pull request as ready for review July 29, 2021 16:21
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 30, 2021

@DaveCTurner I added the wiring + feature flag, could you take a look when you have the chance? Thanks!

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fcofdez for wiring everything up. I think I understand how it all fits together now, but I left a few suggestions for simplifying things in areas where we aren't using all the generality we're introducing. I haven't gone right down to the lowest levels of detail yet because I think things will still move around quite a bit.

@fcofdez
Copy link
Contributor Author

fcofdez commented Aug 4, 2021

We discussed offline the pros and cons of using a single repository specified as a node setting and we reached the conclusion that this might be limiting in terms of flexibility. We agreed on introducing a new repository setting that would enable using the latest available snapshot from that repository explicitly. We would implement this in a subsequent PR. cc @DaveCTurner @henningandersen

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close, I left only tiny suggestions.

}

ShardSnapshot latestSnapshot = availableSnapshots.stream()
.max(Comparator.comparingLong(ShardSnapshot::getStartedAt))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up I'd like us to push this .max calculation through TransportGetShardSnapshotAction and into IndexSnapshotsService: as it stands we might retrieve a SnapshotInfo from each repository but really if we checked them in time order we could do them in reverse time order and stop as soon as we've found the one we'll be using here.

If we can't do that before 7.15 then it's probably ok, it'll just leave a bit of BwC cruft.

final long totalSize = shardRecoveryPlan.getTotalSize();
final long existingTotalSize = shardRecoveryPlan.getExistingSize();

if (logger.isTraceEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice work keeping this logging intact.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just a handful more small things that struck me from looking at #76114.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@fcofdez fcofdez merged commit 3c8b9a6 into elastic:master Aug 9, 2021
@fcofdez
Copy link
Contributor Author

fcofdez commented Aug 9, 2021

Thanks David!

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Aug 9, 2021
This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.

It adds a new recovery setting `indices.recovery.use_snapshots`

Relates elastic#73496
Backport of elastic#75840
fcofdez added a commit that referenced this pull request Aug 9, 2021
…#76239)

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.

It adds a new recovery setting `indices.recovery.use_snapshots`

Relates #73496
Backport of #75840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants