Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add command neard validate-config #8485

Merged
merged 24 commits into from
Feb 22, 2023
Merged

feat: add command neard validate-config #8485

merged 24 commits into from
Feb 22, 2023

Conversation

ppca
Copy link
Contributor

@ppca ppca commented Feb 2, 2023

This command validates the config files including: config.json, genesis.json, node_key.json, validator_key.json.

To run the command:
./target/debug/neard --home ~/.near/localnet/node1 validate-config

Example output after modifying config.json to be invalid:
Screenshot 2023-02-01 at 4 53 15 PM

The changes in this PR are roughly the following:

  1. add a .validate_with_panic() method for Config.
    This function panics when any assertion fails. The assertions include the ones listed in https://pagodaplatform.atlassian.net/browse/ND-272?focusedCommentId=21738. Directly validating Config rather than ClientConfig is handy because Config has the same structure as config.json, while ClientConfig went thru some transformations, so directly validating Config can report more informative error messages for users to find the fields that are invalid.
    .validate_with_panic() is to be used in Config::load_file (replacing current .validate())and in validate-config command. The neard Run command calls the Config::load_file, we want the neard Run command to fail if any of the configs are invalid, thus panicking is ok here. The panicking behavior is also aligned with that of validate_genesis().

  2. add a .validate_configs(dir:&Path) method for Config.
    This method is to be used in validate-config command. This method panics on any error encountered and shows the corresponding error message.
    This method loads config.json to create Config, then loads validator_key.json to create SignerKey, then loads node_key.json to create network signer, then loads genesis.json to create Genesis. Any error including file not found, cannot open file, failed to match data type and semantic matching error in genesis.json (achieved by validate_genesis()) and semantic error in config.json, will lead to panic with a corresponding message.

  3. add a ValidateConfig value in enum NeardSubCommand, a ValidateConfigCommand struct and ValidateConfigCommand.run method.
    run method takes argument home path, which is supplied by neard_cmd.opts.

  4. change load_config() function: replace argument genesis_validation that enables validation for genesis only with argument config_validation that enables validation for all configs
    Create new enum ConfigValidationMode with Full and UnsafeFast to represent config_validation. ConfigValidationMode::Full would indicate GenesisValidationMode::Full.
    The existing methods in genesis_config.rs and genesis_validate.rs that takes in genesis_validation: GenesisValidationMode remain unchanged.

  5. add config_validation parameter to Config::from_file()
    We want to make sure user also has control over whether running the validation. Except in Config::load_config(), where when Config::from_file() is called, the config_validation param is supplied by user, all other places in code we enable validation.

@ppca ppca requested a review from nikurt February 2, 2023 00:54
@ppca ppca marked this pull request as ready for review February 2, 2023 01:19
@ppca ppca requested a review from a team as a code owner February 2, 2023 01:19
core/chain-configs/src/client_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/client_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/client_config.rs Outdated Show resolved Hide resolved
core/chain-configs/src/genesis_validate.rs Outdated Show resolved Hide resolved
nearcore/src/config.rs Outdated Show resolved Hide resolved
@ppca
Copy link
Contributor Author

ppca commented Feb 8, 2023

@nikurt Refactored the code so that we can now check multiple config files, only panic once to report all the failed checks.
I did this by creating a struct ValidationErrors(vec<ValidationError>) with push_errors() and panic_if_errors(). Basically I can instantiate a ValidationErrors and push all errors including file related errors in and only call panic_if_errors() at the end. Inspiration: https://near.zulipchat.com/#narrow/stream/300659-Rust-.F0.9F.A6.80/topic/inheritance.20in.20rust
Currently I've suffixed all functions that relates to loading config to create config objects with _no_panic or _panic_last to indicate the function's behavior was different. Only using these new functions for Run and ValidateConfig command now. Not sure if the other places should be changed tho if they were fine with the prev panic behavior.
I've tried Run and ValidateConfig commands, they work:
Screenshot 2023-02-07 at 6 36 48 PM

@ppca
Copy link
Contributor Author

ppca commented Feb 8, 2023

will fix the tests

core/chain-configs/src/genesis_config.rs Outdated Show resolved Hide resolved
genesis_validation: GenesisValidationMode,
validation_errors: &mut ValidationErrors,
) -> anyhow::Result<Self> {
let mut file = match File::open(path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more readable way would be to use .map_err(), for example

let env_filter = builder.finish().map_err(ReloadError::Parse)?;

core/chain-configs/src/genesis_validate.rs Outdated Show resolved Hide resolved
utils/config/src/lib.rs Show resolved Hide resolved
@ppca
Copy link
Contributor Author

ppca commented Feb 9, 2023

Refactored. Now all _no_panic() and _panic() are unified as functions with return type Result<T, ValidationError>. In places where we previously panicked, now we do .unwrap().

Output of validate-config as follows, run command has same output:
Screenshot 2023-02-08 at 9 27 05 PM

note: if config.json cannot be read or deserialized to Config, the program will directly panic since we would have no knowledge of the file location of node_key_file, genesis_file and validator_key_file and we cannot validate them; If config.json can be loaded and a Config object can be created, but there are semantic checks that failed, this error will be reported with all other errors in other config files in one panic.

This is the output when config.json is missing:
Screenshot 2023-02-08 at 9 32 10 PM

@ppca ppca requested a review from nikurt February 9, 2023 05:34
pub fn validate_genesis(genesis: &Genesis) -> Result<(), ValidationError> {
let mut validation_errors = ValidationErrors::new();
let mut genesis_validator = GenesisValidator::new(&genesis.config, &mut validation_errors);
println!("\nValidating Genesis config and records, extracted from genesis.json. This could take a few minutes...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Records don't have to be in genesis.json, they can be in a separate records file, which is usually records.json.

Suggested change
println!("\nValidating Genesis config and records, extracted from genesis.json. This could take a few minutes...");
tracing::info!(target: "config", "Validating Genesis config and records. This could take a few minutes...");

/// If config file issues occur, a ValidationError::ConfigFileError will be returned;
/// If config semantic checks failed, a ValidationError::ConfigSemanticError will be returned
pub fn from_file(path: &Path) -> Result<Self, ValidationError> {
match Self::from_file_skip_validation(path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match Self::from_file_skip_validation(path) {
Self::from_file_skip_validation(path).map(|config| {
config.validate()?;
Ok(config)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for and_then()

// if config.json has file issues, the program will directly panic
let config = Config::from_file_skip_validation(&dir.join(CONFIG_FILENAME))?;
// do config.json validation separately so that genesis_file, validator_file and genesis_file can be validated before program panic
match config.validate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work?

Suggested change
match config.validate() {
config.validate().map_err(|e|validation_errors.push_errors(e));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would complain the Err arm not handled, I either have let _ = , which is kinda ugly, or the following:
config.validate().map_or_else( |e|validation_errors.push_errors(e), |_| (), );

pub fn validate_config(config: &Config) -> Result<(), ValidationError> {
let mut validation_errors = ValidationErrors::new();
let mut config_validator = ConfigValidator::new(config, &mut validation_errors);
println!("\nValidating Config, extracted from config.json...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid println!() as all output of neard normally goes to stderr, which is done via the tracing create.

Suggested change
println!("\nValidating Config, extracted from config.json...");
tracing::info!(target: "nearcore", "Validating Config, extracted from config.json...");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a rule as to what we put for target? Searching in nearcore, seems like it does not always match the name of the library the code sits in...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm putting config as target for all messages related to config validation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some guidelines https://github.com/near/nearcore/blob/master/docs/practices/style.md#tracing
Avoid creating unnecessary targets, because filtering them in RUST_LOG will be too much work.

}
}

#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests! 👍

neard/src/cli.rs Outdated
@@ -146,7 +149,8 @@ struct NeardOpts {
/// Directory for config and data.
#[clap(long, parse(from_os_str), default_value_os = crate::DEFAULT_HOME.as_os_str())]
home: PathBuf,
/// Skips consistency checks of the 'genesis.json' file upon startup.
/// Skips consistency checks of the config files including
/// genesis.json, config.json, node_key.json and validator_key.json upon startup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at fn load_config() I see that config.json is always validated, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to change comments here. Done.

@ppca ppca requested a review from nikurt February 9, 2023 17:59
Copy link
Contributor

@nikurt nikurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you very much!

@ppca
Copy link
Contributor Author

ppca commented Feb 10, 2023

@nikurt Question: I'm thinking maybe the config validation should only be enforced for mainnet/betanet/testnet? I see we have a bunch of tests that simply use default configs to start node, and we sometimes start localnet nodes with special configs like tracked_shards=[], it makes more sense to allow those. wdyt?

@nikurt
Copy link
Contributor

nikurt commented Feb 10, 2023

Localnet and the tests should have valid configs.

tracked_shards is a special case that I missed, sorry.
It's valid to have tracked_shards=[] but tracked_accounts=[<list of your accounts of interest>].

@ppca
Copy link
Contributor Author

ppca commented Feb 10, 2023

Localnet and the tests should have valid configs.

tracked_shards is a special case that I missed, sorry. It's valid to have tracked_shards=[] but tracked_accounts=[<list of your accounts of interest>].

@nikurt So tracked_accounts should always be non-empty.
Verifying: for mainnet/betanet/testnet, tracked_shards still must be non-empty? They can be empty for other chain_ids?

I'm also seeing many runtime tests using default config where epoch_length = 0:

. epoch=0 is not allowed for any net? I will plant a epoch_length = 60 (same as localnet) for these tests.

@ppca ppca force-pushed the xiangyi/ND-272 branch 2 times, most recently from 9436529 to fa6b1ef Compare February 10, 2023 22:41
@@ -1254,7 +1253,8 @@ pub fn init_testnet_configs(

genesis.to_file(&node_dir.join(&configs[i].genesis_file));
configs[i].write_to_file(&node_dir.join(CONFIG_FILENAME)).expect("Error writing config");
info!(target: "near", "Generated node key, validator key, genesis file in {}", node_dir.display());
info!(target: "near", "create_testnet_configs_from_seeds: config.tracked_shards are {:?}", &configs[i].tracked_accounts);
// info!(target: "near", "Generated node key, validator key, genesis file in {}", node_dir.display());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed?

@@ -1078,7 +1085,8 @@ pub fn init_configs(
};
let genesis = Genesis::new(genesis_config, records.into());
genesis.to_file(&dir.join(config.genesis_file));
info!(target: "near", "Generated node key, validator key, genesis file in {}", dir.display());
//info!(target: "near", "Generated node key, validator key, genesis file in {}", dir.display());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed?

@ppca
Copy link
Contributor Author

ppca commented Feb 13, 2023 via email

@ppca ppca force-pushed the xiangyi/ND-272 branch 3 times, most recently from 4587f65 to 3427709 Compare February 14, 2023 22:12
@ppca
Copy link
Contributor Author

ppca commented Feb 14, 2023

@nikurt I'm removing the check for !config.tracked_accounts.is_empty() since this does not seem required in our current setup. Our mainnet genesis: https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json has empty tracked_accounts, and seems like only when config.tracked_shards.is_empty() we would be using tracked_accounts:

TrackedConfig::Accounts(config.tracked_accounts.clone())
.

With Wac and Marcelo's help, figured out that previously the db_migration.py, backward_compatible.py and upgradable.py tests were failing because the configs are initialized with the release version of the binary, and never re-initialized with my branch. But since we actually don't need tracked_accounts to be un-empty, I'll just delete the code relating to that rule and everything should be good.

@wacban
Copy link
Contributor

wacban commented Feb 15, 2023

Will there be any changes required for node owners during release when upgarding their running binary to a new version? For example would the node owner need to run the validate config command and ensure that config is correct for the new neard version before stopping the old neard and starting new neard?

I know that the upgradable test checked that config generated by the current stable is valid according to your logic now. What will happen if a node owner has a config that was generated way sooner than that? In other words we know that we're one version backwards compatible. Do we want to be infinitely backwards compatible? If not how do we ensure that node owners don't get an unpleasant surprise when upgrading to new neard?

For comparison our database is versioned and we automatically apply migrations when neard is restarted. Would we need a similar (automated or manual) process for config changes?

@nikurt
Copy link
Contributor

nikurt commented Feb 15, 2023

We don't have processes around config upgrades.
The best we can do is mention required config changes in the release notes.
We don't need to be infinitely backwards compatible. For example, see the recent migration_snapshot option introduction.

} else {
let error_message =
format!("validator key file does not exist at the path {}", validator_file.display());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to give an error here? It's valid not to have a validator key present, and in fact most people probably don't have one present, since there are only a handful validators compared to the total number of nodes in the network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense to not throw error here, also read the master branch code again, the logic there also does not throw error in case file does not exist, simply use a None for signer value.

Some(Arc::new(signer) as Arc<dyn ValidatorSigner>)
match InMemoryValidatorSigner::from_file(&validator_file) {
Ok(signer) => Some(Arc::new(signer) as Arc<dyn ValidatorSigner>),
Err(_) => None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still want to give an error here the way things were before right? Because now with this change, if you mistakenly edit your validator_key.json in a way that results in invalid JSON, now neard will just silently run as a non-validator instead of warning you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

genesis
.validate(genesis_validation)
.map_or_else(|e| validation_errors.push_errors(e), |_| ());
if matches!(genesis.config.chain_id.as_ref(), "mainnet" | "testnet" | "betanet")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to revert #8509? If not, then makes sense to add in the if validator_signer.is_some() again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I don't think so, must have been a mistake while rebasing master. Will modify

anyhow::ensure!(!config.tracked_shards.is_empty(),
"Validator must track all shards. Please change `tracked_shards` field in config.json to be any non-empty vector");
}
validation_errors.panic_if_errors();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just return an Err(anyhow::Error)? Could replace the panic_if_errors() function with a function that returns an error, maybe like:

    pub fn ok(&self) -> anyhow::Result<()> {
        match self.generate_error_message_per_type() {
            Some(e) => Err(anyhow::Error::msg(e)),
            None => Ok(()),
        }
    }

or called whatever makes more sense to you. I bring it up just because I think the codebase in general has a problem with panicking too much as an error handling method in situations where nothing unexpected has happened: #5485

neard/src/cli.rs Outdated
impl ValidateConfigCommand {
pub(super) fn run(&self, home_dir: &Path) {
let _ = nearcore::config::load_config(&home_dir, GenesisValidationMode::Full)
.unwrap_or_else(|e| panic!("Error loading config: {:#}", e));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing with panics here. It's probably cleaner to have this function return an anyhow::Result<()>, and just add a question mark to the place where this is called above. Giving a stacktrace here when there's a validatoion error doesn't feel great since nothing unexpected is happening. We can just print out the error to stderr normally and exit w/ nonzero code

let mut file = File::open(&path).map_err(|_| ValidationError::GenesisFileError {
error_message: format!(
"Could not open genesis config file at path {}.",
&path.as_ref().to_path_buf().display()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just delete the to_path_buf()

self.config.consensus.max_block_wait_delay
);
self.validation_errors
.push_errors(ValidationError::ConfigSemanticsError { error_message: error_message })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels better to be consistent w/ either push_config_semantics_error() or push_errors(ConfigSemanticsError {}) in every case

…uding config.json, genesis.json, node_key.json and validator_key.json
… will report all check results from all config files
…le config files are involved and add tracked_accounts to create_testnet_configs_from_seeds()
…age and add one for tracked_shards in init_testnet_configs
… to return Result and some minor improvements
@ppca ppca merged commit 179b3ee into near:master Feb 22, 2023
@ppca ppca deleted the xiangyi/ND-272 branch February 22, 2023 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants