-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xl: Avoid multi-disks node to exit when one disk fails #12423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? @vadmeste
14be7b3
to
7e5ca0e
Compare
There is one report about a node which is not starting because if input/output error, which is not good if a node has multiple disks. So this PR will ignore disks that are returning errors when try to upgrade format.json and clean temporary disks. |
We already do that @vadmeste in the registerStorage() |
Well yes, but that was not enough, there are some golang os calls to try to upgrade format.json and clean tmp directories before calling newXLStorage(). Those functions are called formatErasureMigrateLocalEndpoints(endpoints) and formatErasureCleanupTmpLocalEndpoints(endpoints) By the way, maybe it makes sense to move formatErasureMigrateLocalEndpoints and formatErasureCleanupTmpLocalEndpoints inside newXLStorage(), it looks like this is an easier fix. |
Yes that can be done instead @vadmeste adding another variable into Endpoints looks unclean. |
7e5ca0e
to
386466d
Compare
PTAL at the conflicts @vadmeste |
386466d
to
e66a975
Compare
Fixed |
for _, diskPath := range diskPaths { | ||
j := &tierJournal{ | ||
diskPath: diskPath, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tier deletion journal is intentionally being done centrally as it an overkill to have it run on all disk paths. @krisis PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadmeste to select the first available local disk on a server works to an extent. It's definitely an improvement. Additionally, we need to act on journals that may be present on other local disks, due to a different set of disks available locally during previous server restarts.
E.g
- On the first startup, when all disks are online, this approach would select the first local disk on each server.
- Imagine the first disk on the first server goes temporarily offline. On restart, we would select the second disk on this server.
- Now the first disk is back online. On a subsequent restart, we would pick this disk.
At this point we would have a journal in the second disk which will remain unattended. We need to address this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadmeste we could address the issue of possibly unattended tier deletion journals on other local disks in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I just thought this cannot be more bad than what we have right now. Yes, we can revert this and address it more properly in another PR.
It makes sense that a node which has multiple disks to start when one disk fails, returning i/o error for example. This commit will make this faulty tolerence available in this specific use case.
e66a975
to
ae8ae3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@poornas I reverted back the change of the journal, let me know if you see any issue with it. |
Mint Automation
12423-734a45b/mint-compress-encrypt-dist-erasure.sh.log:
Deleting image on docker hub |
Description
It makes sense that a node which has multiple disks to start when one
disk fails, returning i/o error for example. This commit will make this
faulty tolerence available in this specific use case.
Motivation and Context
The cluster should be started when a disk is broken.
How to test this PR?
Contact me.
Types of changes
Checklist:
commit-id
orPR #
here)