Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heal: Avoid marking a bucket as done when remote drives are offline #19587

Merged
merged 3 commits into from
Apr 26, 2024

Conversation

vadmeste
Copy link
Member

@vadmeste vadmeste commented Apr 23, 2024

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

When the node healing an erasure set suddenly disconnects from the
cluster, len(disks) in the below line will always be empty,

disks, _ := er.getOnlineDisksWithHealing(false)

and that is because getOnlineDisksWithHealing(false) does not include
healing drives, hence the local drive will not be included in the list.

When this happens, a bucket is marked as done, which is simply wrong.

Requires at least N/2 non healing drives before deciding to start to
heal.

Motivation and Context

How to test this PR?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Unit tests added/updated
  • Internal documentation updated
  • Create a documentation update request here

When the node healing an erasure set suddenly disconnects from the
cluster, len(disks) in the below line will always be empty,

disks, _ := er.getOnlineDisksWithHealing(false)

and that is because getOnlineDisksWithHealing(false) does not include
healing drives, hence the local drive will not be included in the list.

When this happens, a bucket is marked as done, which is simply wrong.

Requires at least N/2 non healing drives before deciding to start to
heal.
@vadmeste vadmeste marked this pull request as ready for review April 24, 2024 15:57
@vadmeste vadmeste changed the title heal: Avoid marking a bucket as done when remote drives are offline [NO-MERGE-YET] heal: Avoid marking a bucket as done when remote drives are offline Apr 24, 2024
@vadmeste vadmeste changed the title [NO-MERGE-YET] heal: Avoid marking a bucket as done when remote drives are offline heal: Avoid marking a bucket as done when remote drives are offline Apr 25, 2024
cmd/global-heal.go Outdated Show resolved Hide resolved
@harshavardhana harshavardhana merged commit 135874e into minio:master Apr 26, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants