-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix gc for sharding upgrade #5040
Conversation
72d29eb
to
33f4d63
Compare
for item in trie_iterator { | ||
unwrap_or_err!(item, "Can't find ShardChunk {:?} in Trie", chunk_header); | ||
} | ||
// 2) Chunk Extra with `block_hash` and `shard_uid` should be available and match with the new root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to change this function for two reasons:
- During the epoch when we split shards,
TrieChanges
for shards in the next epoch will be stored. These trie changes do not have correspondingShardChunk
. - The block may not have a chunk that has
chunk.shard_id() == shard_id
. This is because for the first block in the new epoch after sharding is upgraded, if some chunks are missing, the block will include chunk from the last block, which has the shard id in the old shard layout. For example, if the chain is changing from 1 shard to 4 shards, and the first block in the new epoch does not include any new chunks, then all chunks in the block will have shard_id0
, because they are copied from the last block. Therefore, I changed this function to get chunks byblock.chunks.get(shard_id)
and only check the chunk content if the chunk is new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is a bit confusing to me. It does not seem to test that anything is actually garbage collected. We should probably check that after the state split is done, data for new shards can also be garbage collected
Good point, I just copied the code from an existing test |
core/primitives/src/shard_layout.rs
Outdated
@@ -158,6 +158,11 @@ impl ShardLayout { | |||
Self::V1(v1) => (v1.fixed_shards.len() + v1.boundary_accounts.len() + 1) as NumShards, | |||
} | |||
} | |||
|
|||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: don't think this inline helps here. For other functions here, #[inline]
is needed so that compiler can inline getters across the crates and completely replace function calls with just loads at offsets. Here, we are allocating a Vec
anyway, so there's going to be non-trivial non-inlined logic anyway, so inline doesn't make sense.
Once you have spare time, consider looking at https://matklad.github.io/2021/07/09/inline-in-rust.html -- #[inline]
semantics is subtle in Rust and is useful to know.
# all old data should be GCed | ||
blocks_count = 0 | ||
for height in range(1, 60): | ||
block0 = nodes[0].json_rpc('block', [height], timeout=15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bowenwang1996 I'm just checking block here, because store_validator
checks if a block is gc'ed, all information related to the block, including ChunkExtra
, TrieChanges
are also gc'ed
@mzhangmzz CI failed :( |
In #5040 we introduced a change that could try to access already garbage collected information in the call to `get_shard_uids_to_gc`. More specifically, in `get_next_epoch_id_from_prev_block` we would try to access the block info for the first block of the epoch, which is presumably already garbage collected at this point. The reason why we did not catch it in tests is because epoch manager has a cache of size 1024 and we do not clean up the cache properly during garbage collection since `EpochManager` is not part of `Chain`. Fixes #5074 Test plan ---------- `cargo test -p integration-tests --features no_cache test_gc_long_epoch`
resolve #4710 Also added a python test for gc with sharding upgrade
In #5040 we introduced a change that could try to access already garbage collected information in the call to `get_shard_uids_to_gc`. More specifically, in `get_next_epoch_id_from_prev_block` we would try to access the block info for the first block of the epoch, which is presumably already garbage collected at this point. The reason why we did not catch it in tests is because epoch manager has a cache of size 1024 and we do not clean up the cache properly during garbage collection since `EpochManager` is not part of `Chain`. Fixes #5074 Test plan ---------- `cargo test -p integration-tests --features no_cache test_gc_long_epoch`
resolve #4710
Also added a python test for gc with sharding upgrade