Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly handle genesis as part of stateless validation #10633

Merged
merged 4 commits into from
Feb 23, 2024

Conversation

pugachAG
Copy link
Contributor

@pugachAG pugachAG commented Feb 20, 2024

This PR removes "approve anything" shortcut when previous chunk is part of genesis. The only difference in case of genesis is that we don't want to execute main state transition and instead just check that post state root matches genesis state root for that shard.

This also exposed some issue I had to fix to make tests work:

  • MockEpochManager::get_epoch_chunk_producers returns empty Vec which results in Chain:: should_produce_state_witness_for_this_or_next_epoch returning false. Fixed by adding is_chunk_producer_for_epoch as part of EpochManagerAdapter so MockEpochManager can override it.
  • test_chunk_state_witness_bad_shard_id test started failing: this actually uncovered a real issue which could result in crashing chunk validator when state witness contains invalid shard id, fixed in c5b2c5e

Closes #10502.

@pugachAG pugachAG added the A-stateless-validation Area: stateless validation label Feb 20, 2024
@pugachAG pugachAG force-pushed the proper-stateless-validation-genesis branch 7 times, most recently from a8575d1 to bddbc41 Compare February 21, 2024 13:39
@pugachAG pugachAG marked this pull request as ready for review February 21, 2024 13:44
@pugachAG pugachAG requested a review from a team as a code owner February 21, 2024 13:44
@pugachAG pugachAG force-pushed the proper-stateless-validation-genesis branch from bddbc41 to 6de4311 Compare February 21, 2024 14:15
@Longarithm
Copy link
Member

@staffik @jancionear could you review please? If you LGTM it then I'll stamp an approval as well.

@pugachAG pugachAG force-pushed the proper-stateless-validation-genesis branch 2 times, most recently from 2d92a9d to 0a408eb Compare February 22, 2024 07:39
@pugachAG pugachAG force-pushed the proper-stateless-validation-genesis branch from 0a408eb to e974b63 Compare February 22, 2024 08:05
@staffik
Copy link
Contributor

staffik commented Feb 22, 2024

@pugachAG please let me know when I should proceed with review, I see quite of lot changes keep being added now.

@pugachAG pugachAG force-pushed the proper-stateless-validation-genesis branch from 53d8e08 to dbcfa97 Compare February 22, 2024 10:44
@pugachAG pugachAG force-pushed the proper-stateless-validation-genesis branch from dbcfa97 to c5b2c5e Compare February 22, 2024 10:45
@pugachAG pugachAG marked this pull request as draft February 22, 2024 10:46
Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 77.84431% with 37 lines in your changes are missing coverage. Please review.

Project coverage is 72.38%. Comparing base (6dfaa36) to head (5794736).
Report is 1 commits behind head on master.

Files Patch % Lines
...client/src/stateless_validation/chunk_validator.rs 79.31% 8 Missing and 4 partials ⚠️
chain/chain/src/chain.rs 60.00% 2 Missing and 8 partials ⚠️
core/primitives/src/epoch_manager.rs 53.84% 2 Missing and 4 partials ⚠️
tools/state-viewer/src/epoch_info.rs 0.00% 3 Missing ⚠️
...src/stateless_validation/state_witness_producer.rs 87.50% 0 Missing and 2 partials ⚠️
core/primitives/src/errors.rs 50.00% 2 Missing ⚠️
chain/chain/src/test_utils/kv_runtime.rs 91.66% 0 Missing and 1 partial ⚠️
chain/epoch-manager/src/adapter.rs 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10633      +/-   ##
==========================================
- Coverage   72.42%   72.38%   -0.05%     
==========================================
  Files         729      729              
  Lines      148996   149046      +50     
  Branches   148996   149046      +50     
==========================================
- Hits       107912   107884      -28     
- Misses      36207    36263      +56     
- Partials     4877     4899      +22     
Flag Coverage Δ
backward-compatibility 0.24% <0.00%> (-0.01%) ⬇️
db-migration 0.24% <0.00%> (-0.01%) ⬇️
genesis-check 1.43% <0.00%> (-0.01%) ⬇️
integration-tests 36.91% <76.04%> (-0.23%) ⬇️
linux 71.27% <12.17%> (-0.03%) ⬇️
linux-nightly 71.80% <77.84%> (-0.09%) ⬇️
macos 55.39% <12.17%> (-0.01%) ⬇️
pytests 1.65% <0.00%> (-0.01%) ⬇️
sanity-checks 1.43% <0.00%> (-0.01%) ⬇️
unittests 68.29% <34.13%> (-0.02%) ⬇️
upgradability 0.29% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pugachAG
Copy link
Contributor Author

@staffik @jancionear this PR is ready for review now

@pugachAG pugachAG marked this pull request as ready for review February 22, 2024 11:22
Copy link
Contributor

@staffik staffik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// would be skipped when using stateless validation.
this_block_should_be_skipped = height < 3;
// included in the block at all.
this_block_should_be_skipped = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case uses_stateless_validation it looks like we do not even need to run the containing loop.
What about if height > 1 && !uses_stateless_validation { and move the comment above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like we can skip the loop since it also updates invalid_chunks_in_this_block which is used later.
I've updated this to avoid setting this_block_should_be_skipped = false here since it is noop: 10a99ff

Copy link
Contributor

@jancionear jancionear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, left some comments

Comment on lines 295 to 299
/// This function is only needed to have a 'return true' implementation
/// as part of MockEpochManager to make tests work with stateless validation
/// enabled.
/// TODO(#10640): remove this after we get rid of MockEpochManager
fn is_chunk_producer_for_epoch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is also used in Chain::should_produce_state_witness_for_this_or_next_epoch, so I think the comment is wrong. It still will be needed after MockEpochManager is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see how it can be confusing. What I meant here is that it doesn't have to be a part of EpochManagerAdapter trait. I guess it is not that important, so I will just drop the comment and close the issue.

chain/client/Cargo.toml Show resolved Hide resolved
chain/client/src/client.rs Show resolved Hide resolved
Comment on lines 355 to 358
if block.header().is_genesis() {
assert_eq!(receipt_source_blocks.len(), 1);
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel uneasy about having an assert in the validation code. I think it would be better to have an error, there's no reason to risk a panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe it would be nicer to have an explicit condition: if witnes for genesis { source receipts should be empty}. I feel that it would be easier to reason about than the current solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is not part of validation logic, it ensures that our chain is in the valid state (receipt_source_blocks is not calculated using our own chain state, not state witness). I will replace it with proper error handling so you feel better about it 😉

Maybe it would be nicer to have an explicit condition

It is a bit more code, but indeed should be easier to understand, done

core/primitives/src/block_header.rs Show resolved Hide resolved
@@ -280,7 +280,7 @@ impl ViewClientActor {
let cps: Vec<AccountId> = shard_ids
.iter()
.map(|&shard_id| {
let cp = epoch_info.sample_chunk_producer(block_height, shard_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to unwrap() here? The function returns an Error, so maybe it would be better to return an error.
(Also applies to other unwraps() on the new sample_chunk_producer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard id should be valid here let shard_ids = self.epoch_manager.shard_ids(&epoch_id)?
returning error requires a bit of refactoring here since it is part of map, so we cannot just add ?, so probably should be done in a separate PR

Comment on lines +1113 to 1117
let sample = v4.chunk_producers_sampler.get(shard_id)?.sample(seed);
v4.chunk_producers_settlement.get(shard_id)?.get(sample).copied()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_chunk_state_witness_bad_shard_id test started failing: this actually uncovered a real issue which could result in crashing chunk validator when state witness contains invalid shard id, fixed in c5b2c5e

So IIUC we are now validating the shard_id of ChunkStateWitness by trying to sample a chunk producer for this shard_id. That feels a bit hacky, maybe we should have some function that prevalidates the witness header (or just shard_id) before doing anything else. Sampling the producer could potentially work with invalid shard ids (e.g producers[(height + shard) % producers.len()), so it's not obvious that it can be done with untrusted data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's another time that an invalid shard_id could cause a crash (#10621 ...), I think it would be good to go through the code and just remove all usages of [shard_id]. It causes problems now, and it's gonna be even worse with resharding :/ I'll make an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #10648

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock

@pugachAG pugachAG added this pull request to the merge queue Feb 23, 2024
Merged via the queue into master with commit 0918295 Feb 23, 2024
26 of 28 checks passed
@pugachAG pugachAG deleted the proper-stateless-validation-genesis branch February 23, 2024 15:03
@@ -118,24 +118,6 @@ impl ChunkStateWitnessInner {
}
}

impl ChunkStateWitness {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yayy!

github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2024
Noticed the introduction of `is_genesis` function in
#10633

This PR tries to use the `is_genesis` function in place of comparing
prev_hash with CryptoHash::default()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[stateless_validation] Handle production of state witness for first chunk after genesis
5 participants