Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split validator databases #6048

Merged

Conversation

rkapka
Copy link
Contributor

@rkapka rkapka commented May 29, 2020

What type of PR is this?

Feature

What does this PR do? Why is it needed?
This PR introduces a new command: validator accounts split. Is takes a source directory and a target directory as inputs and creates one validator database for each public key in the source directory's validator database. Each database is saved in a subdirectory of the target directory. The subdirectory's name is the hex-encoded public key whose data has been split into the database. The encoding is the same that is used for naming validator private key files. This way it will be easy to identify which database belongs to which key.

Which issues(s) does this PR fix?
This PR addresses #5638, however it does not close it yet. There are two other operations to implement: importing and exporting data for selected keys.

@rkapka rkapka requested a review from a team as a code owner May 29, 2020 16:20
validator/accounts/account.go Outdated Show resolved Hide resolved
validator/accounts/account.go Outdated Show resolved Hide resolved
validator/db/manage.go Outdated Show resolved Hide resolved
return nil
}

func addAttestations(bucket *bolt.Bucket, attestations pubKeyAttestations) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to be assuming that it will called immediately after addProposals right? It doesn't feel right to make that assumption, as someone can use this function incorrectly for some other use case. Perhaps refactor to resolve that risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I do always add attestations after adding proposals, there is no direct dependency between them. One can add attestations before proposals, or even add one but not the other. Attestations and proposals live in separate buckets.

var allAttestations []pubKeyAttestations

for _, store := range stores {
if err := store.db.View(func(tx *bolt.Tx) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's risky to clog up the boltdb thread and do all these expensive operations within a single bolt transaction. Any chance we could do these as a separate bolt tx's?

Copy link
Contributor Author

@rkapka rkapka May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way to do this, but it's not too elegant.

Firstly I open a transaction in which I extract all keys and assign them to a variable for later use. Then I open one transaction for every stored key and get proposals/attestations for that key.

There might be a better way to do this, but this is the best I could come up with.

validator/db/manage_test.go Outdated Show resolved Hide resolved
validator/db/manage_test.go Outdated Show resolved Hide resolved
validator/db/manage_test.go Outdated Show resolved Hide resolved
validator/db/manage_test.go Outdated Show resolved Hide resolved
@@ -54,6 +58,220 @@ func TestMerge(t *testing.T) {
assertMergedStore(t, mergedStore, firstStorePubKeys, secondStorePubKeys, history)
}

func TestSplit(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is extremely long. Any way it can be simplified? Perhaps the setup can be a somewhat general helper function, or perhaps can be split into several tests and not just one big TestSplit. For example, TestSplit_OK, TestSplit_CouldNotDoSomething, etc.

Copy link
Contributor Author

@rkapka rkapka May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redesigned the tests, I made helper functions more granular so that they can be reused by tests for merging and splitting. I hope most of these changes will make it easier to write tests for future functionalities, e.g. importing.

validator/db/manage.go Outdated Show resolved Hide resolved
@rkapka
Copy link
Contributor Author

rkapka commented May 31, 2020

I had to rename the flags related with merging and splitting to avoid duplicate names. I find it a bit awkward that now the commands have redundancy in flag names, e.g. validator accounts merge --merge-source-dirs="x" --merge-target-dir="y". It's obvious from the command name what action is being invoked - there's no need to repeat it in flag names. I believe that having at most 4 flags: source-dir, source-dirs, target-dir and target-dirs should be sufficient for all validator operations. However this would require generic descriptions of the flags. Any thoughts on this?

@codecov
Copy link

codecov bot commented May 31, 2020

Codecov Report

Merging #6048 into master will decrease coverage by 0.01%.
The diff coverage is 51.29%.

@@            Coverage Diff             @@
##           master    #6048      +/-   ##
==========================================
- Coverage   59.51%   59.50%   -0.02%     
==========================================
  Files         320      320              
  Lines       27023    27129     +106     
==========================================
+ Hits        16083    16143      +60     
- Misses       8749     8777      +28     
- Partials     2191     2209      +18     

allAttestations []pubKeyAttestations) (err error) {

var storesToClose []*Store
defer (func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer (func() {
defer func() {

err = errors.New(errorMessage)
}
}
})()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
})()
}()

if err != nil {
err = errors.Wrapf(err, errorMessage)
} else {
err = errors.New(errorMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do something about these errors in this deferred function. The function will not run until createSplitTargetStores returns, so this error will have nowhere to do. I would suggest creating an error log here instead. There's no way to control nor extract the outputs of deferred functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is:
#6027 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow awesome! Thanks for sharing, good thing to know :)

rauljordan
rauljordan previously approved these changes May 31, 2020
rauljordan
rauljordan previously approved these changes May 31, 2020
@@ -225,17 +225,26 @@ func HandleEmptyKeystoreFlags(cliCtx *cli.Context, confirmPassword bool) (string
func Merge(ctx context.Context, sourceDirectories []string, targetDirectory string) (err error) {
var sourceStores []*db.Store
defer func() {
errorMessage := "failed to close one or more source databases"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this error a package variable ?
ex:

errSource := errors.New("failed to close one or more source databases")

You can then resuse the error here and in other places instead of redefining it again.

@rauljordan rauljordan dismissed their stale review June 1, 2020 17:05

Making errors package variables would be helpful

rauljordan
rauljordan previously approved these changes Jun 1, 2020
@prylabs-bulldozer prylabs-bulldozer bot merged commit ac138ea into prysmaticlabs:master Jun 2, 2020
@rkapka rkapka deleted the split-validator-databases branch June 3, 2020 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants