-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repair with hashes #2925
Repair with hashes #2925
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Does the PR describe what changes are being made?
- Does the PR describe why the changes are being made?
- Does the code follow our style guide?
- Does the code follow our testing guide?
- Is the PR appropriately sized? (If it could be broken into smaller PRs it should be)
- Does the new code have enough tests? (every PR should have tests or justification otherwise. Bug-fix PRs especially)
- Does the new code have enough documentation that answers "how do I use it?" and "what does it do?"? (both source documentation and higher level, diagrams?)
- Does any documentation need updating?
- Do the database access patterns make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
satellite/repair/repairer/ec.go
Outdated
return successfulNodes, successfulHashes, nil | ||
} | ||
|
||
// copied from ecclient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we keeping these comments long-term? I'm not sure if they're necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they can be removed. I was hoping we could get rid of duplicate code here and in ecclient, but I don't know of a good way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe someone else has thoughts on this? maybe could make a ec_helpers.go file or something?
test added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks nice! I just would like to know something about last changes (see comments)
@@ -163,7 +163,7 @@ func (checker *Checker) updateIrreparableSegmentStatus(ctx context.Context, poin | |||
|
|||
// we repair when the number of healthy pieces is less than or equal to the repair threshold | |||
// except for the case when the repair and success thresholds are the same (a case usually seen during testing) | |||
if numHealthy > redundancy.MinReq && numHealthy <= redundancy.RepairThreshold && numHealthy < redundancy.SuccessThreshold { | |||
if numHealthy >= redundancy.MinReq && numHealthy <= redundancy.RepairThreshold && numHealthy < redundancy.SuccessThreshold { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this change? and rest in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before repair with hashes, a segment was considered irreparable if it had MinReq
or fewer pieces. This is because we needed at least MinReq+1
pieces for the reed solomon error correction to work correctly. Now that we are verifying pieces with hashes before decoding into a segment, the minimum number exactly is sufficient to repair correctly.
The changes in this file ensure that we
- Add to repair queue if we have exactly the minimum healthy number of pieces
- Do not add to the irreparable db if we have exactly the minimum healthy number of pieces
…o green/repair-with-hashes
What:
Create a specialized ecclient for the repairer that does additional verification:
ecRepairer.Repair
does the same thing asecClient.Repair
, which can be removed now)Why:
Repair is expensive so we want to download from the smallest possible number of storagenodes to reconstruct the segment. However, if we use the minimum RS number of pieces, we have no way of automatically detecting errors in pieces, and the reconstructed segment could be incorrect without our knowledge. By using hashes to check the validity of pieces before we decode the segment, we can be sure of the authenticity of a piece, and use the minimum number of pieces to get the segment.
Unfortunately, this requires us to bring each piece entirely into memory since we need to download the entire piece before we can verify the hash.
https://storjlabs.atlassian.net/browse/V3-2396
https://storjlabs.atlassian.net/browse/V3-2395
https://storjlabs.atlassian.net/browse/V3-2394
Please describe the tests:
Please describe the performance impact:
Code Review Checklist (to be filled out by reviewer)