feat: add command neard validate-config #8485

ppca · 2023-02-02T00:51:59Z

This command validates the config files including: config.json, genesis.json, node_key.json, validator_key.json.

To run the command:
./target/debug/neard --home ~/.near/localnet/node1 validate-config

Example output after modifying config.json to be invalid:

The changes in this PR are roughly the following:

add a .validate_with_panic() method for Config.
This function panics when any assertion fails. The assertions include the ones listed in https://pagodaplatform.atlassian.net/browse/ND-272?focusedCommentId=21738. Directly validating Config rather than ClientConfig is handy because Config has the same structure as config.json, while ClientConfig went thru some transformations, so directly validating Config can report more informative error messages for users to find the fields that are invalid.
.validate_with_panic() is to be used in Config::load_file (replacing current .validate())and in validate-config command. The neard Run command calls the Config::load_file, we want the neard Run command to fail if any of the configs are invalid, thus panicking is ok here. The panicking behavior is also aligned with that of validate_genesis().
add a .validate_configs(dir:&Path) method for Config.
This method is to be used in validate-config command. This method panics on any error encountered and shows the corresponding error message.
This method loads config.json to create Config, then loads validator_key.json to create SignerKey, then loads node_key.json to create network signer, then loads genesis.json to create Genesis. Any error including file not found, cannot open file, failed to match data type and semantic matching error in genesis.json (achieved by validate_genesis()) and semantic error in config.json, will lead to panic with a corresponding message.
add a ValidateConfig value in enum NeardSubCommand, a ValidateConfigCommand struct and ValidateConfigCommand.run method.
run method takes argument home path, which is supplied by neard_cmd.opts.
change load_config() function: replace argument genesis_validation that enables validation for genesis only with argument config_validation that enables validation for all configs
Create new enum ConfigValidationMode with Full and UnsafeFast to represent config_validation. ConfigValidationMode::Full would indicate GenesisValidationMode::Full.
The existing methods in genesis_config.rs and genesis_validate.rs that takes in genesis_validation: GenesisValidationMode remain unchanged.
add config_validation parameter to Config::from_file()
We want to make sure user also has control over whether running the validation. Except in Config::load_config(), where when Config::from_file() is called, the config_validation param is supplied by user, all other places in code we enable validation.

core/chain-configs/src/client_config.rs

core/chain-configs/src/genesis_validate.rs

nearcore/src/config.rs

ppca · 2023-02-08T02:37:12Z

@nikurt Refactored the code so that we can now check multiple config files, only panic once to report all the failed checks.
I did this by creating a struct ValidationErrors(vec<ValidationError>) with push_errors() and panic_if_errors(). Basically I can instantiate a ValidationErrors and push all errors including file related errors in and only call panic_if_errors() at the end. Inspiration: https://near.zulipchat.com/#narrow/stream/300659-Rust-.F0.9F.A6.80/topic/inheritance.20in.20rust
Currently I've suffixed all functions that relates to loading config to create config objects with _no_panic or _panic_last to indicate the function's behavior was different. Only using these new functions for Run and ValidateConfig command now. Not sure if the other places should be changed tho if they were fine with the prev panic behavior.
I've tried Run and ValidateConfig commands, they work:

ppca · 2023-02-08T16:14:44Z

will fix the tests

core/chain-configs/src/genesis_config.rs

nikurt · 2023-02-08T16:48:27Z

core/chain-configs/src/genesis_config.rs

+        genesis_validation: GenesisValidationMode,
+        validation_errors: &mut ValidationErrors,
+    ) -> anyhow::Result<Self> {
+        let mut file = match File::open(path) {


A more readable way would be to use .map_err(), for example

nearcore/core/o11y/src/lib.rs

Line 488 in 67a527b

let env_filter = builder.finish().map_err(ReloadError::Parse)?;

core/chain-configs/src/genesis_validate.rs

utils/config/src/lib.rs

ppca · 2023-02-09T05:33:51Z

Refactored. Now all _no_panic() and _panic() are unified as functions with return type Result<T, ValidationError>. In places where we previously panicked, now we do .unwrap().

Output of validate-config as follows, run command has same output:

note: if config.json cannot be read or deserialized to Config, the program will directly panic since we would have no knowledge of the file location of node_key_file, genesis_file and validator_key_file and we cannot validate them; If config.json can be loaded and a Config object can be created, but there are semantic checks that failed, this error will be reported with all other errors in other config files in one panic.

This is the output when config.json is missing:

nikurt · 2023-02-09T13:13:24Z

core/chain-configs/src/genesis_validate.rs

+pub fn validate_genesis(genesis: &Genesis) -> Result<(), ValidationError> {
+    let mut validation_errors = ValidationErrors::new();
+    let mut genesis_validator = GenesisValidator::new(&genesis.config, &mut validation_errors);
+    println!("\nValidating Genesis config and records, extracted from genesis.json. This could take a few minutes...");


Records don't have to be in genesis.json, they can be in a separate records file, which is usually records.json.

Suggested change

println!("\nValidating Genesis config and records, extracted from genesis.json. This could take a few minutes...");

tracing::info!(target: "config", "Validating Genesis config and records. This could take a few minutes...");

nikurt · 2023-02-09T13:16:12Z

nearcore/src/config.rs

+    /// If config file issues occur, a ValidationError::ConfigFileError will be returned;
+    /// If config semantic checks failed, a ValidationError::ConfigSemanticError will be returned
+    pub fn from_file(path: &Path) -> Result<Self, ValidationError> {
+        match Self::from_file_skip_validation(path) {


Suggested change

match Self::from_file_skip_validation(path) {

Self::from_file_skip_validation(path).map(|config| {

config.validate()?;

Ok(config)

})

I went for and_then()

nikurt · 2023-02-09T13:17:10Z

nearcore/src/config.rs

+    // if config.json has file issues, the program will directly panic
+    let config = Config::from_file_skip_validation(&dir.join(CONFIG_FILENAME))?;
+    // do config.json validation separately so that genesis_file, validator_file and genesis_file can be validated before program panic
+    match config.validate() {


Would this work?

Suggested change

match config.validate() {

config.validate().map_err(|e|validation_errors.push_errors(e));

This would complain the Err arm not handled, I either have let _ = , which is kinda ugly, or the following:
config.validate().map_or_else( |e|validation_errors.push_errors(e), |_| (), );

nikurt · 2023-02-09T13:19:53Z

nearcore/src/config_validate.rs

+pub fn validate_config(config: &Config) -> Result<(), ValidationError> {
+    let mut validation_errors = ValidationErrors::new();
+    let mut config_validator = ConfigValidator::new(config, &mut validation_errors);
+    println!("\nValidating Config, extracted from config.json...");


Please avoid println!() as all output of neard normally goes to stderr, which is done via the tracing create.

Suggested change

println!("\nValidating Config, extracted from config.json...");

tracing::info!(target: "nearcore", "Validating Config, extracted from config.json...");

is there a rule as to what we put for target? Searching in nearcore, seems like it does not always match the name of the library the code sits in...

I'm putting config as target for all messages related to config validation

We have some guidelines https://github.com/near/nearcore/blob/master/docs/practices/style.md#tracing
Avoid creating unnecessary targets, because filtering them in RUST_LOG will be too much work.

nikurt · 2023-02-09T13:20:56Z

nearcore/src/config_validate.rs

+    }
+}
+
+#[cfg(test)]


Nice tests! 👍

nikurt · 2023-02-09T13:23:11Z

neard/src/cli.rs

@@ -146,7 +149,8 @@ struct NeardOpts {
    /// Directory for config and data.
    #[clap(long, parse(from_os_str), default_value_os = crate::DEFAULT_HOME.as_os_str())]
    home: PathBuf,
-    /// Skips consistency checks of the 'genesis.json' file upon startup.
+    /// Skips consistency checks of the config files including
+    /// genesis.json, config.json, node_key.json and validator_key.json upon startup.


Looking at fn load_config() I see that config.json is always validated, isn't it?

forgot to change comments here. Done.

nikurt

Looks great, thank you very much!

ppca · 2023-02-10T16:39:59Z

@nikurt Question: I'm thinking maybe the config validation should only be enforced for mainnet/betanet/testnet? I see we have a bunch of tests that simply use default configs to start node, and we sometimes start localnet nodes with special configs like tracked_shards=[], it makes more sense to allow those. wdyt?

nikurt · 2023-02-10T17:00:45Z

Localnet and the tests should have valid configs.

tracked_shards is a special case that I missed, sorry.
It's valid to have tracked_shards=[] but tracked_accounts=[<list of your accounts of interest>].

ppca · 2023-02-10T17:14:29Z

Localnet and the tests should have valid configs.

tracked_shards is a special case that I missed, sorry. It's valid to have tracked_shards=[] but tracked_accounts=[<list of your accounts of interest>].

@nikurt So tracked_accounts should always be non-empty.
Verifying: for mainnet/betanet/testnet, tracked_shards still must be non-empty? They can be empty for other chain_ids?

I'm also seeing many runtime tests using default config where epoch_length = 0:

nearcore/runtime/runtime/tests/runtime_group_tools/mod.rs

Line 67 in c308df1

..Default::default()

. epoch=0 is not allowed for any net? I will plant a epoch_length = 60 (same as localnet) for these tests.

nikurt · 2023-02-13T07:51:41Z

nearcore/src/config.rs

@@ -1254,7 +1253,8 @@ pub fn init_testnet_configs(

        genesis.to_file(&node_dir.join(&configs[i].genesis_file));
        configs[i].write_to_file(&node_dir.join(CONFIG_FILENAME)).expect("Error writing config");
-        info!(target: "near", "Generated node key, validator key, genesis file in {}", node_dir.display());
+        info!(target: "near", "create_testnet_configs_from_seeds: config.tracked_shards are {:?}", &configs[i].tracked_accounts);
+        // info!(target: "near", "Generated node key, validator key, genesis file in {}", node_dir.display());


nikurt · 2023-02-13T07:52:26Z

nearcore/src/config.rs

@@ -1078,7 +1085,8 @@ pub fn init_configs(
            };
            let genesis = Genesis::new(genesis_config, records.into());
            genesis.to_file(&dir.join(config.genesis_file));
-            info!(target: "near", "Generated node key, validator key, genesis file in {}", dir.display());
+            //info!(target: "near", "Generated node key, validator key, genesis file in {}", dir.display());


ppca · 2023-02-13T09:01:38Z

I was trying to fix the 3 tests that’s still failing: db_migration, upgradable and backward_compatible. The tests report that tracked_accounts are empty so panicked. I added logic in code to fix that, but did not seem to alter test results. So I wanted to simply change info!() message to verify if the buildkite actually pick up my changes. Although I comment out “generated ..”, this message still shows up in the buildkite test results, makes me wonder is something wrong with buildkite? Sent from my iPhoneOn Feb 12, 2023, at 11:52 PM, nikurt ***@***.***> wrote: @nikurt approved this pull request. In nearcore/src/config.rs:

@@ -1254,7 +1253,8 @@ pub fn init_testnet_configs(

genesis.to_file(&node_dir.join(&configs[i].genesis_file)); configs[i].write_to_file(&node_dir.join(CONFIG_FILENAME)).expect("Error writing config"); - info!(target: "near", "Generated node key, validator key, genesis file in {}", node_dir.display()); + info!(target: "near", "create_testnet_configs_from_seeds: config.tracked_shards are {:?}", &configs[i].tracked_accounts); + // info!(target: "near", "Generated node key, validator key, genesis file in {}", node_dir.display()); needed? In nearcore/src/config.rs:

@@ -1078,7 +1085,8 @@ pub fn init_configs(

}; let genesis = Genesis::new(genesis_config, records.into()); genesis.to_file(&dir.join(config.genesis_file)); - info!(target: "near", "Generated node key, validator key, genesis file in {}", dir.display()); + //info!(target: "near", "Generated node key, validator key, genesis file in {}", dir.display()); Needed? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

ppca · 2023-02-14T22:50:45Z

@nikurt I'm removing the check for !config.tracked_accounts.is_empty() since this does not seem required in our current setup. Our mainnet genesis: https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json has empty tracked_accounts, and seems like only when config.tracked_shards.is_empty() we would be using tracked_accounts:

nearcore/nearcore/src/shard_tracker.rs

Line 21 in 161e3e3

TrackedConfig::Accounts(config.tracked_accounts.clone())

.

With Wac and Marcelo's help, figured out that previously the db_migration.py, backward_compatible.py and upgradable.py tests were failing because the configs are initialized with the release version of the binary, and never re-initialized with my branch. But since we actually don't need tracked_accounts to be un-empty, I'll just delete the code relating to that rule and everything should be good.

wacban · 2023-02-15T13:48:05Z

Will there be any changes required for node owners during release when upgarding their running binary to a new version? For example would the node owner need to run the validate config command and ensure that config is correct for the new neard version before stopping the old neard and starting new neard?

I know that the upgradable test checked that config generated by the current stable is valid according to your logic now. What will happen if a node owner has a config that was generated way sooner than that? In other words we know that we're one version backwards compatible. Do we want to be infinitely backwards compatible? If not how do we ensure that node owners don't get an unpleasant surprise when upgrading to new neard?

For comparison our database is versioned and we automatically apply migrations when neard is restarted. Would we need a similar (automated or manual) process for config changes?

nikurt · 2023-02-15T14:00:07Z

We don't have processes around config upgrades.
The best we can do is mention required config changes in the release notes.
We don't need to be infinitely backwards compatible. For example, see the recent migration_snapshot option introduction.

marcelo-gonzalez · 2023-02-15T18:31:01Z

nearcore/src/config.rs

    } else {
+        let error_message =
+            format!("validator key file does not exist at the path {}", validator_file.display());


Do we want to give an error here? It's valid not to have a validator key present, and in fact most people probably don't have one present, since there are only a handful validators compared to the total number of nodes in the network

yeah makes sense to not throw error here, also read the master branch code again, the logic there also does not throw error in case file does not exist, simply use a None for signer value.

marcelo-gonzalez · 2023-02-15T18:33:25Z

nearcore/src/config.rs

-        Some(Arc::new(signer) as Arc<dyn ValidatorSigner>)
+        match InMemoryValidatorSigner::from_file(&validator_file) {
+            Ok(signer) => Some(Arc::new(signer) as Arc<dyn ValidatorSigner>),
+            Err(_) => None,


we still want to give an error here the way things were before right? Because now with this change, if you mistakenly edit your validator_key.json in a way that results in invalid JSON, now neard will just silently run as a non-validator instead of warning you

nice catch!

marcelo-gonzalez · 2023-02-15T18:54:42Z

nearcore/src/config.rs

+            genesis
+                .validate(genesis_validation)
+                .map_or_else(|e| validation_errors.push_errors(e), |_| ());
+            if matches!(genesis.config.chain_id.as_ref(), "mainnet" | "testnet" | "betanet")


did you mean to revert #8509? If not, then makes sense to add in the if validator_signer.is_some() again

oops I don't think so, must have been a mistake while rebasing master. Will modify

marcelo-gonzalez · 2023-02-15T19:04:47Z

nearcore/src/config.rs

-        anyhow::ensure!(!config.tracked_shards.is_empty(),
-                        "Validator must track all shards. Please change `tracked_shards` field in config.json to be any non-empty vector");
-    }
+    validation_errors.panic_if_errors();


why not just return an Err(anyhow::Error)? Could replace the panic_if_errors() function with a function that returns an error, maybe like:

pub fn ok(&self) -> anyhow::Result<()> { match self.generate_error_message_per_type() { Some(e) => Err(anyhow::Error::msg(e)), None => Ok(()), } }

or called whatever makes more sense to you. I bring it up just because I think the codebase in general has a problem with panicking too much as an error handling method in situations where nothing unexpected has happened: #5485

marcelo-gonzalez · 2023-02-15T19:10:01Z

neard/src/cli.rs

+impl ValidateConfigCommand {
+    pub(super) fn run(&self, home_dir: &Path) {
+        let _ = nearcore::config::load_config(&home_dir, GenesisValidationMode::Full)
+            .unwrap_or_else(|e| panic!("Error loading config: {:#}", e));


same thing with panics here. It's probably cleaner to have this function return an anyhow::Result<()>, and just add a question mark to the place where this is called above. Giving a stacktrace here when there's a validatoion error doesn't feel great since nothing unexpected is happening. We can just print out the error to stderr normally and exit w/ nonzero code

marcelo-gonzalez · 2023-02-22T03:00:16Z

core/chain-configs/src/genesis_config.rs

+        let mut file = File::open(&path).map_err(|_| ValidationError::GenesisFileError {
+            error_message: format!(
+                "Could not open genesis config file at path {}.",
+                &path.as_ref().to_path_buf().display()


could just delete the to_path_buf()

marcelo-gonzalez · 2023-02-22T03:03:10Z

nearcore/src/config_validate.rs

+                self.config.consensus.max_block_wait_delay
+            );
+            self.validation_errors
+                .push_errors(ValidationError::ConfigSemanticsError { error_message: error_message })


feels better to be consistent w/ either push_config_semantics_error() or push_errors(ConfigSemanticsError {}) in every case

…uding config.json, genesis.json, node_key.json and validator_key.json

… will report all check results from all config files

…g track of ConfigSemanticError

…er than tracked_shards to be non-empty

…le config files are involved and add tracked_accounts to create_testnet_configs_from_seeds()

…age and add one for tracked_shards in init_testnet_configs

… to return Result and some minor improvements

ppca requested a review from nikurt February 2, 2023 00:54

ppca marked this pull request as ready for review February 2, 2023 01:19

ppca requested a review from a team as a code owner February 2, 2023 01:19

nikurt reviewed Feb 2, 2023

View reviewed changes

ppca force-pushed the xiangyi/ND-272 branch from 81a482b to 34b6136 Compare February 8, 2023 02:24

nikurt reviewed Feb 8, 2023

View reviewed changes

ppca force-pushed the xiangyi/ND-272 branch from e4de278 to 7aa5052 Compare February 9, 2023 02:15

ppca requested a review from nikurt February 9, 2023 05:34

nikurt reviewed Feb 9, 2023

View reviewed changes

ppca requested a review from nikurt February 9, 2023 17:59

nikurt approved these changes Feb 9, 2023

View reviewed changes

ppca force-pushed the xiangyi/ND-272 branch 2 times, most recently from 9436529 to fa6b1ef Compare February 10, 2023 22:41

nikurt approved these changes Feb 13, 2023

View reviewed changes

ppca force-pushed the xiangyi/ND-272 branch 3 times, most recently from 4587f65 to 3427709 Compare February 14, 2023 22:12

marcelo-gonzalez reviewed Feb 15, 2023

View reviewed changes

ppca force-pushed the xiangyi/ND-272 branch from 3427709 to 634118f Compare February 17, 2023 15:21

ppca requested a review from marcelo-gonzalez February 17, 2023 15:35

ppca force-pushed the xiangyi/ND-272 branch from edb0f3d to 2fc63d5 Compare February 21, 2023 19:03

ppca requested a review from marcelo-gonzalez February 21, 2023 19:08

marcelo-gonzalez approved these changes Feb 22, 2023

View reviewed changes

ppca added 24 commits February 21, 2023 19:40

feat: add command neard validate-config to validate config files incl…

17ef59c

…uding config.json, genesis.json, node_key.json and validator_key.json

add config_validation parameter to Config::from_file()

2469396

refactor code so that when running validate-config command, one panic…

44d564d

… will report all check results from all config files

remove unnecessary change

a21a1f3

correct typo

aa8051d

remove all _panic and _no_panic functions and unify them

06e4dd5

add load_file_skip_validation() to enable reading Config while keepin…

464a7f3

…g track of ConfigSemanticError

fix existing tests and add unit tests for config_validate

aa0a565

change println! to tracing::info

65e55dc

fix epoch_length for runtime tests and validate tracked_accounts rath…

849c414

…er than tracked_shards to be non-empty

add CrossFileSemanticError to represent issues that occur when multip…

8281402

…le config files are involved and add tracked_accounts to create_testnet_configs_from_seeds()

modify init_configs() for case not(mainnet|betanet|testnet)

4971ea7

format

34d1708

add info!() to init_configs to debug config.tracked_accounts

076b4d6

add info! to create_testnet_configs_from_seeds

340f9af

switch to println

a3a9b0c

comment out the message in init_configs

5ee328a

comment out both Generated node key, validator key, genesis file mess…

dbbf5c2

…age and add one for tracked_shards in init_testnet_configs

rebase

c796227

remove check for config.tracked_accounts

47f2d11

return anyhow::Result<()> rather than panic

ea393de

format

3c9e1f3

address comments and change Genesis.new() and Genesis.new_with_path()…

8b0ca8a

… to return Result and some minor improvements

remove .as_path_buf() and keep push_xx_error consistent

845d33d

ppca force-pushed the xiangyi/ND-272 branch from 2fc63d5 to 845d33d Compare February 22, 2023 03:40

ppca merged commit 179b3ee into near:master Feb 22, 2023

ppca deleted the xiangyi/ND-272 branch February 22, 2023 04:08

	println!("\nValidating Genesis config and records, extracted from genesis.json. This could take a few minutes...");
	tracing::info!(target: "config", "Validating Genesis config and records. This could take a few minutes...");

-        match Self::from_file_skip_validation(path) {
+        Self::from_file_skip_validation(path).map(|config| {
+            config.validate()?;
+            Ok(config)
+        })

	match config.validate() {
	config.validate().map_err(\|e\|validation_errors.push_errors(e));

	println!("\nValidating Config, extracted from config.json...");
	tracing::info!(target: "nearcore", "Validating Config, extracted from config.json...");

feat: add command neard validate-config #8485

feat: add command neard validate-config #8485

Conversation

ppca commented Feb 2, 2023 • edited Loading

ppca commented Feb 8, 2023

ppca commented Feb 8, 2023

Choose a reason for hiding this comment

ppca commented Feb 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikurt left a comment

Choose a reason for hiding this comment

ppca commented Feb 10, 2023

nikurt commented Feb 10, 2023

ppca commented Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppca commented Feb 13, 2023 via email

ppca commented Feb 14, 2023

wacban commented Feb 15, 2023

nikurt commented Feb 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppca commented Feb 2, 2023 •

edited

Loading

ppca commented Feb 10, 2023 •

edited

Loading