New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snapshot] Snapshot recovery handler added #2074
Conversation
consensus/istanbul/backend/engine.go
Outdated
| header1, .. headerN | | ||
*/ | ||
func (sb *backend) regen(chain consensus.ChainReader, headers []*types.Header) { | ||
if !sb.isRestoringSnapshots.Load() && len(headers) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments related to the style.
- Reducing the call depth
if sb.isRestoringSnapshots.Load() || len(headers) < 1 {
return
}
BTW, is it okay to pass len(headers) == 1
case in the original code?
- More safe operation for the atomic value change
sb.isRestoringSnapshots.Load()
->sb.isRestoringSnapshots.CompareAndSwap(false, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good suggestions.
First, I'll take your first suggestion. Once the suggestion is updated, the residual task will be as follows:
Whether to change Store
to CompareAndSwap
.
sb.isRestoringSnapshots.Store(true) // sb.isRestoringSnapshots.CompareAndSwap(false, true)
defer func() {
sb.isRestoringSnapshots.Store(false) // sb.isRestoringSnapshots.CompareAndSwap(true, false)
}()
I think the existing code seems more intuitive and reliable.
Additionally, I couldn't understand what this implies.
BTW, is it okay to pass len(headers) == 1 case in the original code?
Could you give a detail? BTW, I installed a guard from proceeding when the length of header is one because it was already handled before regen
handler is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed before that load-store can have race conditions, but ignored because it's practically impossible to trigger.
But now that it's mentioned, safer design would be better.
compare-and-swap lock goes like this (it doesn't need load or store):
if sb.isRestoringSnapshots.CompareAndSwap(false, true) {
// enter critical section
}
Note that previous atomic load-store can have race conditions if two threads are running in parallel.
T1 - load and check
T2 - load and check
T1 - store
T2 - store
both threads enter critical section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly make sense. Thank you. Before making changes, let me ask one concern from your comment. Once the critical section executed, restoring true
to false
seems still required. Doesnt' it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ian0371 Updated. PTAL.
@aidan-kwon, I kept len(headers) <= 1
. PTAL.
consensus/istanbul/backend/engine.go
Outdated
sb.isRestoringSnapshots.Store(true) | ||
defer func() { | ||
sb.isRestoringSnapshots.Store(false) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to print a log message informing the start of snapshot restoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering entry message should not be printed excessively, that message also should be printed under the condition. Putting the initiating message seems redundant and not effective under this condition.
(2). Early check if the snapshot is not in database
@hyunsooda Please fix for linter or rebase against the latest dev. |
Proposed changes
If the required snapshot data is absent in the cache or database due to a
sethead
operation or manual deletion, snapshot data may not be accessible. This PR introduces a snapshot recovery handler, which stores all snapshot data within the specified header range. The handler is activated under two conditions: (1) when the node is first loaded and (2) during API serving. If no corresponding snapshot is found in both cases, the handler is triggered.Types of changes
Please put an x in the boxes related to your change.
Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.
$ make test
)Related issues
Further comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...