heal: Avoid marking a bucket as done when remote drives are offline #19587
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Community Contribution License
All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.
Description
When the node healing an erasure set suddenly disconnects from the
cluster, len(disks) in the below line will always be empty,
disks, _ := er.getOnlineDisksWithHealing(false)
and that is because getOnlineDisksWithHealing(false) does not include
healing drives, hence the local drive will not be included in the list.
When this happens, a bucket is marked as done, which is simply wrong.
Requires at least N/2 non healing drives before deciding to start to
heal.
Motivation and Context
How to test this PR?
Types of changes
Checklist:
commit-id
orPR #
here)