New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add normal/deep type of heal scanning #7251
Conversation
0741d6d
to
f423fa1
Compare
We should also fix #7199 (comment)
We should heal only the part that needs to be healed. |
519fe65
to
8207add
Compare
@harshavardhana can we do this in another PR ? we used to prepare everything under tmp/ directory then a rename atomic, but this looks like it needs to be done differently |
Why do we need to do that in other PR @vadmeste - is it harder to do in this? - Since we are bringing in new feature such as deep and normal healing. I thought we would want to address the above situation as well. |
@harshavardhana since we decided not to heal individual parts alone, and heal the entire object, we can take this PR |
yeah done.. @vadmeste is this PR ready for review? |
8207add
to
d954781
Compare
This is ready for review. @krishnasrinivas I only rebased this commit against master, can you approve again ? |
@vadmeste travis failing |
d954781
to
246d6a4
Compare
Codecov Report
@@ Coverage Diff @@
## master #7251 +/- ##
==========================================
+ Coverage 48.42% 48.43% +<.01%
==========================================
Files 297 297
Lines 46451 46460 +9
==========================================
+ Hits 22496 22503 +7
- Misses 21884 21885 +1
- Partials 2071 2072 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you state the usefulness of normal/deep scan from the user point of view - let's say some files are corrupted with bit-rot how does a user know that they have to run mc admin heal --scan deep
instead of normal.
What additional information we are providing to user in normal scan which makes this decision clear, I see that normal scan is simply stating the xl.json
Currently we short circuit from on the first error from the disk, shouldn't we be continuing? to the next disk?
dataErrs[i] = err
break
This is the question for both Deep and Normal scan, it also means we won't heal other disks which might have bitrot or xl.json
missing.
current implementation's spec was given by AB. On the
we stat xl.json and all parts too. (but no bitrot)
it will continue to the next disk, note that it only breaks the loop |
The spec is incomplete without user level indication on what they should be doing. For me personally explaining this to a customer will be a headache here are the reasons why.
From the PR its not evident that we can tell with clarity that deep scan should be run v/s normal scan. If our most users are just using I would suggest normal scan should also do bit-rot but not for the entire file instead it should jump to random block in the file and verify its bit-rot. The block is erasure block - this way we are not taking away the essential functionality but we are also not trying to read the entire file. Normal scan just stating |
the above cases are handled in normal mode. the difference between normal/deep is just bitrot checking. We can probably discuss in person. |
The answers to these are needed before on how we communicate to our users and document them. |
If we think about it, minio's metadata in xl.json is kind of not relevant to the user. When they manually run heal they are primarily concerned with their objects. Maybe there is some benefit in only scanning the metadata during healing when healing is automatic and run by the server, but its not clear why it should be exposed to the user. |
Correct @donatello - I guess this is the gist which should be clarified. |
We discussed this with @abperiasamy - the
Now bit-rot errors are healed only there is a deep scan, this would read and write all content if needed. the plan is to write a document providing recommendation to our users on when Deep scan should be used and what does |
5f12d57
to
0d9c43c
Compare
@harshavardhana so basically nothing to change in this PR. @donatello @krishnasrinivas can you review/approve again ? |
3c65851
to
90a0df6
Compare
Can you fix the conflicts @vadmeste ? |
Healing scan used to read all objects parts to check for bitrot checksum. This commit will add a quicker way of healing scan by only checking if parts are actually present in disks or not.
90a0df6
to
5d6ffb9
Compare
Mint Automation
7251-5d6ffb9/mint-large-bucket.sh.log:
7251-5d6ffb9/mint-dist-xl.sh.log:
|
@krishnasrinivas I only rebased this PR, can you review/approve again ? |
Description
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
Motivation and Context
Fixes #7510
Regression
No, this is a new feature
How Has This Been Tested?
Test normal heal scan:
4. rm /tmp/xl/1/testbucket/file/
5. mc admin heal -r myminio/testbucket/
6. ls -l /tmp/xl/1/testbucket/file/part.1
Test deep heal scan
7. echo "foo" >> /tmp/xl/1/testbucket/file/part.1
8. mc admin heal -r --scan=deep myminio/testbucket/
Types of changes
Checklist:
mint
PR # here: )