Add test_long_chain_with_restart_from_snapshot #10193

posvyatokum · 2023-11-17T00:49:42Z

The main motivation for the test is to see how garbage collection will behave without caches if we restart from the middle of an epoch.

… prev_epoch_id (#10195) We had an issue in the `clear_resharding_data` while fetching the `prev_epoch_id`. The implementation of `prev_epoch_id` relied on fetching the block_info of the last block of prev_epoch to get the epoch_id. This unfortunately failed for the case of GC as the block_info was already garbage collected. The new implementation here relies on using the block_header to get the epoch_id instead of block_info. This was unfortunately only caught in mocknet and not integration tests as having a small enough epoch_length lead to the block_info being cached in the epoch_manager (even though it was GC'd) Zulip post: https://near.zulipchat.com/#narrow/stream/295558-pagoda.2Fcore/topic/Master.20binary.20Can't.20clear.20old.20data/near/402240517 Pending: Figure out a way to include test from this PR: #10193 We would need to enable no_cache feature for the test.

posvyatokum · 2023-11-17T13:49:44Z

Adding test that should fail before the fix mentioned above. But the CI is green.
The fail is supposed to come from log_assert! here

nearcore/chain/client/src/client.rs

Line 1542 in 4eb4d0d

log_assert!(result.is_ok(), "Can't clear old data, {:?}", result);

log_assert! only panics in debug mode, and our CI pipelines use --cargo-profile quick-release
@Ekleog-NEAR do you have ideas? I really want this to fail per-commit. Also, this test can be much shorter, if I can enable no_cache feature in it.

Ekleog-NEAR · 2023-11-17T13:55:44Z

@nagisa I recall we discussed quick-release vs. dev for CI. It seems like quick-release is running without debug assertions? If so, I think we should revert and switch back to using the dev profile for CI, even though it might be slower, because it could miss intentionally-placed debug assertions :/

(Or we could add debug-assertions for quick-release, which might make sense considering it’s not for production usage, but then it’d probably be better to rename it for something like dev-release to be clear it’s not intended for prod)

Ekleog-NEAR · 2023-11-17T13:57:12Z

Also, this test can be much shorter, if I can enable no_cache feature in it.

I guess you mean the compile-time no_cache? It’s something that we unfortunately can’t enable in a single test, not without making it a real snowflake that will become hard to maintain (unless we have a lot of tests that require this).

Would it be hard to remove the compile-time no_cache feature, and replace it with a runtime configuration, that could be turned on/off from the test itself?

nagisa · 2023-11-17T14:05:49Z

(Or we could add debug-assertions for quick-release, which might make sense considering it’s not for production usage, but then it’d probably be better to rename it for something like dev-release to be clear it’s not intended for prod)

I’m in favour of making it dev-release. My memory is that unoptimized builds will start taking too long to run to fit within even the timeouts we have right now, especially on MacOS.

Ekleog-NEAR · 2023-11-17T14:29:07Z

I just opened #10205 :)

wacban

LGTM

wacban · 2023-11-20T09:59:59Z

integration-tests/src/tests/client/process_blocks.rs

+        .archive(false)
+        .build();
+
+    env1.clients[0].config.gc.gc_blocks_limit = 2;


nit: Can you add a small comment on why this is needed?

wacban · 2023-11-20T10:01:00Z

integration-tests/src/tests/client/process_blocks.rs

+    }
+
+    let mut env2 = TestEnv::builder(chain_genesis)
+        .stores(vec![env1.clients[0].chain.store().store().clone()])


Tested on updated #10193 `is_next_block_epoch_start` reads `BlockInfo` of the first block of the epoch, which is not correct in the case of garbage collection. The previous version of the test was passing, because that `BlockInfo` was still in the cache of the epoch manager (as we call `is_next_block_epoch_start` on every block) Implemented safer version of `is_next_block_epoch_start` -- `is_last_block_in_finished_epoch` (didn't think too much about the name, open to suggestions). `is_last_block_in_finished_epoch` works by relying on the fact that if we processed block, and it was the last block of an epoch according to `is_next_block_epoch_start`, then we wrote an `EpochInfo` with the hash of that block, and `EpochInfo` is not garbage collectible.

add test_long_chain

de14f33

shreyan-gupta mentioned this pull request Nov 17, 2023

[garbage_colection] Fix crash in clear_resharding_data while fetching prev_epoch_id #10195

Merged

avoid cache during gc

9864fc1

posvyatokum mentioned this pull request Nov 17, 2023

reintroduce clear_resharding_data #10211

Merged

posvyatokum changed the title ~~Add test_long_chain~~ Add test_long_chain_with_restart_from_snapshot Nov 17, 2023

posvyatokum requested a review from wacban November 17, 2023 23:15

Merge branch 'master' into long_chain_test

4ab66b0

posvyatokum marked this pull request as ready for review November 17, 2023 23:15

posvyatokum requested a review from a team as a code owner November 17, 2023 23:15

wacban approved these changes Nov 20, 2023

View reviewed changes

add comment

4b4bdb3

posvyatokum enabled auto-merge November 20, 2023 11:45

Merge branch 'master' into long_chain_test

8847752

posvyatokum added this pull request to the merge queue Nov 20, 2023

Merged via the queue into master with commit d0a771d Nov 20, 2023
16 checks passed

posvyatokum deleted the long_chain_test branch November 20, 2023 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test_long_chain_with_restart_from_snapshot #10193

Add test_long_chain_with_restart_from_snapshot #10193

posvyatokum commented Nov 17, 2023 •

edited

posvyatokum commented Nov 17, 2023

Ekleog-NEAR commented Nov 17, 2023 •

edited

Ekleog-NEAR commented Nov 17, 2023

nagisa commented Nov 17, 2023

Ekleog-NEAR commented Nov 17, 2023

wacban left a comment

wacban Nov 20, 2023

wacban Nov 20, 2023

Add test_long_chain_with_restart_from_snapshot #10193

Add test_long_chain_with_restart_from_snapshot #10193

Conversation

posvyatokum commented Nov 17, 2023 • edited

posvyatokum commented Nov 17, 2023

Ekleog-NEAR commented Nov 17, 2023 • edited

Ekleog-NEAR commented Nov 17, 2023

nagisa commented Nov 17, 2023

Ekleog-NEAR commented Nov 17, 2023

wacban left a comment

Choose a reason for hiding this comment

wacban Nov 20, 2023

Choose a reason for hiding this comment

wacban Nov 20, 2023

Choose a reason for hiding this comment

posvyatokum commented Nov 17, 2023 •

edited

Ekleog-NEAR commented Nov 17, 2023 •

edited