Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot in latest storer fix #5381

Merged
merged 12 commits into from Aug 18, 2023

Conversation

BeniaminDrasovean
Copy link
Contributor

Reasoning behind the pull request

  • In some edgecases, the state snapshot would start before the storers could change the epoch.

Proposed changes

  • When starting a snapshot, wait for the storer to change to the new epoch, and only then continue with the snapshot process.

Testing procedure

  • Normal testing procedure

Pre-requisites

Based on the Contributing Guidelines the PR author and the reviewers must check the following requirements are met:

  • was the PR targeted to the correct branch?
  • if this is a larger feature that probably needs more than one PR, is there a feat branch created?
  • if this is a feat branch merging, do all satellite projects have a proper tag inside go.mod?

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 94.17% and no project coverage change.

Comparison is base (cf4fe2e) 80.09% compared to head (02d6610) 80.10%.
Report is 1 commits behind head on rc/v1.6.0.

❗ Current head 02d6610 differs from pull request most recent head 13e319a. Consider uploading reports for the commit 13e319a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           rc/v1.6.0    #5381   +/-   ##
==========================================
  Coverage      80.09%   80.10%           
==========================================
  Files            706      708    +2     
  Lines          93355    93408   +53     
==========================================
+ Hits           74770    74821   +51     
- Misses         13269    13275    +6     
+ Partials        5316     5312    -4     
Files Changed Coverage Δ
process/block/metablock.go 61.99% <0.00%> (ø)
process/block/shardblock.go 68.47% <0.00%> (-0.05%) ⬇️
state/errors.go 25.00% <ø> (ø)
state/stateMetrics/stateMetrics.go 85.18% <85.18%> (ø)
state/peerAccountsDB.go 96.00% <90.90%> (-4.00%) ⬇️
state/snapshotsManager.go 96.55% <96.55%> (ø)
...ocess/transactionEvaluator/simulationAccountsDB.go 61.62% <100.00%> (ø)
state/accountsDB.go 77.62% <100.00%> (-3.43%) ⬇️
state/accountsDBApi.go 91.45% <100.00%> (ø)
state/accountsDBApiWithHistory.go 89.21% <100.00%> (ø)
... and 1 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@iulianpascalau iulianpascalau self-requested a review June 28, 2023 11:02
}

select {
case <-time.After(args.WaitTimeForSnapshotEpochCheck):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use time.Timer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return nil
}

if storageManager.IsClosed() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be better moved on L1167, immediately after getting the storageManager pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1442,8 +1442,8 @@ func (mp *metaProcessor) updateState(lastMetaBlock data.MetaHeaderHandler, lastM
"rootHash", lastMetaBlock.GetRootHash(),
"prevRootHash", prevMetaBlock.GetRootHash(),
"validatorStatsRootHash", lastMetaBlock.GetValidatorStatsRootHash())
mp.accountsDB[state.UserAccountsState].SnapshotState(lastMetaBlock.GetRootHash())
mp.accountsDB[state.PeerAccountsState].SnapshotState(lastMetaBlock.GetValidatorStatsRootHash())
go mp.accountsDB[state.UserAccountsState].SnapshotState(lastMetaBlock.GetRootHash(), lastMetaBlock.GetEpoch())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work in import-db mode. Please remove the 2 go calls + go call on shardblock.go L1261.
Let's refactor the solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, refactored.

…st-storer-fix

# Conflicts:
#	process/transactionEvaluator/simulationAccountsDB_test.go
#	process/txsimulator/wrappedAccountsDB.go
#	state/accountsDB.go
#	state/export_test.go
#	state/interface.go
mutex sync.RWMutex
}

// NewSnapshotsManager creates a new snapshots manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// NewSnapshotsManager creates a new snapshots manager
// NewSnapshotsManager creates a new snapshots manager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 1108 to 1109
func (adb *AccountsDB) waitForStorageEpochChange(args StorageEpochChangeWaitArgs) error {
if args.SnapshotWaitTimeout < args.WaitTimeForSnapshotEpochCheck {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this func does not seem to be used, i think it was actually moved to snapshotsManager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it was moved. Removed this and moved the tests.

Comment on lines 18 to 19
// StorageEpochChangeWaitArgs are the args needed for calling the WaitForStorageEpochChange function
type StorageEpochChangeWaitArgs struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if waitForStorageEpochChange will be removed from accountsDB as it seems that it's not being used, i think we can unexport this struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 126 to 129
rootHash, epoch, err := sm.getSnapshotRootHashAndEpoch(trieStorageManager)
if err != nil {
log.Debug("could not retrieve snapshot info", "error", err)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are not starting the snapshot if there is an error getting the values here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also consider import db processing mode, when starting snapshot after restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do not start the snapshot because we do not know which rootHash to snapshot and for which epoch. When the node will reach the next epoch, it will start the snapshot for that epoch.
Regarding import-db, the process will be the same. If the import-db is stoped when a snapshot is in progress, that snapshot will restart when the node restarts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the log to WARN.

Comment on lines 138 to 140
func (sm *snapshotsManager) getSnapshotRootHashAndEpoch(trieStorageManager common.StorageManager) ([]byte, uint32, error) {
sm.mutex.RLock()
defer sm.mutex.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need mutex lock here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, removed.

}

func (sm *snapshotsManager) setStateCheckpoint(rootHash []byte, trieStorageManager common.StorageManager) {
log.Trace("accountsDB.SetStateCheckpoint", "root hash", rootHash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Trace("accountsDB.SetStateCheckpoint", "root hash", rootHash)
log.Trace("snapshotsManager.SetStateCheckpoint", "root hash", rootHash)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4,6 +4,7 @@ import (
"bytes"
"errors"
"fmt"
"github.com/multiversx/mx-chain-core-go/core/atomic"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

state/accountsDB.go Show resolved Hide resolved

select {
case <-timer.C:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

state/snapshotsManager.go Show resolved Hide resolved
sm.waitForCompletionIfAppropriate(stats)
}

func (sm *snapshotsManager) verifyAndStartSnapshot(rootHash []byte, epoch uint32, trieStorageManager common.StorageManager) (*snapshotStatistics, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, is a rename of prepareSnapshot() with some small changes. Reverted the name as prepareSnapshot is more accurate to what the func actually does

state/snapshotsManager_test.go Show resolved Hide resolved
sm, _ := state.NewSnapshotsManager(getDefaultSnapshotManagerArgs())
assert.Equal(t, state.ErrNilTrieSyncer, sm.StartSnapshotAfterRestartIfNeeded(&storageManager.StorageManagerStub{}))
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the line between sub-tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return isActiveDB(stsm)
}

func isActiveDB(stsm *snapshotTrieStorageManager) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this logic moved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed. The reason was that when the node starts, it opens as many dbs as necessary. At epoch change the node also manages the opening/closing of epoch storers. So these checks are redundant. As an added bonus, even if something happens and a certain snapshot can not be completed fully from the node's own storage, the data will be synced from other nodes (this will not happen if these checks were still in place because the snapshot would not start at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! 👍

Copy link
Collaborator

@gabi-vuls gabi-vuls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ Log scanner @@

snapshot-in-latest-storer-fix

================================================================================

  • Known Warnings 13
  • New Warnings 7
  • Known Errors 0
  • New Errors 1
  • Panics 0
    ================================================================================
  • block hash does not match 22167
  • wrong nonce in block 7366
  • miniblocks does not match 0
  • num miniblocks does not match 0
  • miniblock hash does not match 0
  • block bodies does not match 0
  • receipts hash missmatch 1
    ================================================================================
  • No jailed nodes on the testnet
    ================================================================================

@BeniaminDrasovean BeniaminDrasovean merged commit 44c0f9c into rc/v1.6.0 Aug 18, 2023
6 checks passed
@BeniaminDrasovean BeniaminDrasovean deleted the snapshot-in-latest-storer-fix branch August 18, 2023 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants