Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snapshot] Snapshot recovery handler added #2074

Merged
merged 11 commits into from Jan 19, 2024

Conversation

hyunsooda
Copy link
Contributor

Proposed changes

If the required snapshot data is absent in the cache or database due to a sethead operation or manual deletion, snapshot data may not be accessible. This PR introduces a snapshot recovery handler, which stores all snapshot data within the specified header range. The handler is activated under two conditions: (1) when the node is first loaded and (2) during API serving. If no corresponding snapshot is found in both cases, the handler is triggered.

Types of changes

Please put an x in the boxes related to your change.

  • Bugfix
  • New feature or enhancement
  • Others

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING GUIDELINES doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes ($ make test)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Related issues

  • Please leave the issue numbers or links related to this PR here.

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

consensus/istanbul/backend/backend.go Outdated Show resolved Hide resolved
consensus/istanbul/backend/engine.go Outdated Show resolved Hide resolved
consensus/istanbul/backend/engine.go Outdated Show resolved Hide resolved
consensus/istanbul/backend/engine.go Outdated Show resolved Hide resolved
consensus/istanbul/backend/engine.go Outdated Show resolved Hide resolved
ian0371
ian0371 previously approved these changes Dec 28, 2023
hyeonLewis
hyeonLewis previously approved these changes Dec 28, 2023
| header1, .. headerN |
*/
func (sb *backend) regen(chain consensus.ChainReader, headers []*types.Header) {
if !sb.isRestoringSnapshots.Load() && len(headers) > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments related to the style.

  1. Reducing the call depth
if sb.isRestoringSnapshots.Load() || len(headers) < 1 {  
    return 
}

BTW, is it okay to pass len(headers) == 1 case in the original code?

  1. More safe operation for the atomic value change
    sb.isRestoringSnapshots.Load() -> sb.isRestoringSnapshots.CompareAndSwap(false, true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good suggestions.

First, I'll take your first suggestion. Once the suggestion is updated, the residual task will be as follows:
Whether to change Store to CompareAndSwap.

sb.isRestoringSnapshots.Store(true) // sb.isRestoringSnapshots.CompareAndSwap(false, true)
		defer func() {
			sb.isRestoringSnapshots.Store(false) // sb.isRestoringSnapshots.CompareAndSwap(true, false)
		}()

I think the existing code seems more intuitive and reliable.

Additionally, I couldn't understand what this implies.

BTW, is it okay to pass len(headers) == 1 case in the original code?

Could you give a detail? BTW, I installed a guard from proceeding when the length of header is one because it was already handled before regen handler is called.

Copy link
Contributor

@ian0371 ian0371 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed before that load-store can have race conditions, but ignored because it's practically impossible to trigger.
But now that it's mentioned, safer design would be better.

compare-and-swap lock goes like this (it doesn't need load or store):

if sb.isRestoringSnapshots.CompareAndSwap(false, true) {
  // enter critical section
}

Note that previous atomic load-store can have race conditions if two threads are running in parallel.

T1 - load and check
T2 - load and check
T1 - store
T2 - store
both threads enter critical section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly make sense. Thank you. Before making changes, let me ask one concern from your comment. Once the critical section executed, restoring true to false seems still required. Doesnt' it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ian0371 Updated. PTAL.
@aidan-kwon, I kept len(headers) <= 1. PTAL.

sb.isRestoringSnapshots.Store(true)
defer func() {
sb.isRestoringSnapshots.Store(false)
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to print a log message informing the start of snapshot restoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering entry message should not be printed excessively, that message also should be printed under the condition. Putting the initiating message seems redundant and not effective under this condition.

(2). Early check if the snapshot is not in database
@hyunsooda hyunsooda dismissed stale reviews from hyeonLewis and ian0371 via 5a2aca9 January 10, 2024 03:44
@blukat29
Copy link
Contributor

@hyunsooda Please fix for linter or rebase against the latest dev.

@hyunsooda hyunsooda merged commit dd5a0bf into klaytn:dev Jan 19, 2024
11 checks passed
@blukat29 blukat29 mentioned this pull request Jan 22, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants