subsystem-bench: add regression tests for availability read and write #3311

AndreiEres · 2024-02-13T13:46:58Z

What's been done

subsystem-bench has been split into two parts: a cli benchmark runner and a library.
The cli runner is quite simple. It just allows us to run .yaml based test sequences. Now it should only be used to run benchmarks during development.
The library is used in the cli runner and in regression tests. Some code is changed to make the library independent of the runner.
Added first regression tests for availability read and write that replicate existing test sequences.

How we run regression tests

Regression tests are simply rust integration tests without the harnesses.
They should only be compiled under the subsystem-benchmarks feature to prevent them from running with other tests.
This doesn't work when running tests with nextest in CI, so additional filters have been added to the nextest runs.
Each benchmark run takes a different time in the beginning, so we "warm up" the tests until their CPU usage differs by only 1%.
After the warm-up, we run the benchmarks a few more times and compare the average with the exception using a precision.

What is still wrong?

I haven't managed to set up approval voting tests. The spread of their results is too large and can't be narrowed down in a reasonable amount of time in the warm-up phase.
The tests start an unconfigurable prometheus endpoint inside, which causes errors because they use the same 9999 port. I disable it with a flag, but I think it's better to extract the endpoint launching outside the test, as we already do with valgrind and pyroscope. But we still use prometheus inside the tests.

Future work

AndreiEres · 2024-02-16T14:21:40Z

polkadot/node/subsystem-bench/examples/approvals_throughput.yaml

  n_validators: 500
  n_cores: 100
-  n_included_candidates: 100


Unused parameter.

AndreiEres · 2024-02-16T14:22:27Z

polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml

@@ -1,14 +1,14 @@
 TestConfiguration:
 # Test 1
 - objective: !ApprovalVoting
-    last_considered_tranche: 89


Just sorted to unify the order in all test cases.

AndreiEres · 2024-02-16T14:27:25Z

polkadot/node/subsystem-bench/src/cli/subsystem-bench.rs

+			let benchmark_name = format!("{} #{} {}", &self.path, index + 1, objective);
+			gum::info!(target: LOG_TARGET, "{}", format!("Step {}/{}", index + 1, num_steps).bright_purple(),);
+			gum::info!(target: LOG_TARGET, "[{}] {}", format!("objective = {:?}", objective).green(), test_config);
+			test_config.generate_pov_sizes();


Better to do it automatically, but not in the into_vec() method, how it was.

AndreiEres · 2024-02-16T15:34:54Z

polkadot/node/subsystem-bench/src/lib/approval/mod.rs

@@ -841,15 +835,22 @@ fn build_overseer(
 pub fn prepare_test(
 	config: TestConfiguration,
 	options: ApprovalsOptions,
+	with_prometheus_endpoint: bool,


If we run a few tests, they all are trying to use the same port, so I just decided to turn off the endpoint initialization because we don't need it in the CI. I don't like how it's done (an easy way). I think I should start the endpoint outside the test, like we do with pyroscope. It can be done in next PRs.

alexggh

Great job, looks good to me.

alexggh · 2024-02-26T08:36:51Z

.gitlab/pipeline/test.yml

@@ -68,7 +69,7 @@ test-linux-stable-runtime-benchmarks:
    # but still want to have debug assertions.
    RUSTFLAGS: "-Cdebug-assertions=y -Dwarnings"
  script:
-    - time cargo nextest run --workspace --features runtime-benchmarks benchmark --locked --cargo-profile testnet
+    - time cargo nextest run --filter-expr 'not deps(/polkadot-subsystem-bench/)' --workspace --features runtime-benchmarks benchmark --locked --cargo-profile testnet


Why isn't this required-features = ["subsystem-benchmarks"] enough ?

It's a good question, and I wish I knew the answer. Somehow, it keeps running regression tests while cargo test exclude them based on required feature.

polkadot/node/core/approval-voting/tests/approval-voting-regression-bench.rs

sandreim · 2024-02-26T09:24:01Z

...t/node/network/availability-distribution/tests/availability-distribution-regression-bench.rs

+};
+
+const BENCH_COUNT: usize = 3;
+const WARM_UP_COUNT: usize = 20;


How many benches in practice are needed until WARM_UP_PRECISION is achieved ?

Based on previous runs you never know, can be 3, can be 13.

sandreim · 2024-02-26T09:24:39Z

...t/node/network/availability-distribution/tests/availability-distribution-regression-bench.rs

+	config.max_pov_size = 5120;
+	config.peer_bandwidth = 52428800;
+	config.bandwidth = 52428800;
+	config.connectivity = 75;


Copied from availability_write.yaml

sandreim · 2024-02-26T09:30:06Z

polkadot/node/core/approval-voting/tests/approval-voting-regression-bench.rs

+	config.n_cores = 100;
+	config.min_pov_size = 1120;
+	config.max_pov_size = 5120;
+	config.peer_bandwidth = 524288000000;


Is this value correct as per validator specs (500 Mbit/s (= 62.5 MB/s)) ?

@alexggh what do you think? I just copied it from your yaml.

sandreim · 2024-02-26T09:32:48Z

...t/node/network/availability-distribution/tests/availability-distribution-regression-bench.rs

+	config.max_validators_per_core = 5;
+	config.min_pov_size = 5120;
+	config.max_pov_size = 5120;
+	config.peer_bandwidth = 52428800;


This is different than approval voting setup. It would be better to have the correct value set in default impl of config, and include a comment when we override it.

...t/node/network/availability-distribution/tests/availability-distribution-regression-bench.rs

polkadot/node/core/approval-voting/tests/approval-voting-regression-bench.rs

sandreim · 2024-02-26T10:32:36Z

polkadot/node/core/approval-voting/tests/approval-voting-regression-bench.rs

+	let usage = benchmark(test_case, options.clone());
+
+	messages.extend(usage.check_network_usage(&[
+		("Received from peers", 2950.0, 0.05),


Have these been calibrated on reference hw ?

These tests itself run on reference hw.

sandreim

Looks good overall! Nice work!

One thing I expect to see solved here is using a different test pipeline for which I added a comment.

Otherwise, let's not block this one and continue with the followup PRs to achieve a properly parametrised and stable regression test.

sandreim · 2024-03-01T11:43:41Z

.gitlab/pipeline/test.yml

@@ -25,6 +25,7 @@ test-linux-stable:
    # "upgrade_version_checks_should_work" is currently failing
    - |
      time cargo nextest run \
+        --filter-expr 'not deps(/polkadot-subsystem-bench/)' \


I'd move these to a separate pipeline, that allows for CI to provide the same hw specs as for validators and guarantee CPU/mem resources per POD.

cc @alvicsam

Ok, my bad, we aren't running the tests here 🙈 . let's address this properly in #3530

@AndreiEres could you please add a comment above the command with a brief description why you added the filter or a link to your PR.

sandreim · 2024-03-01T11:46:29Z

polkadot/node/network/availability-recovery/tests/availability-recovery-regression-bench.rs

+	let mut config = TestConfiguration::default();
+	config.latency = Some(PeerLatency { mean_latency_ms: 100, std_dev: 1.0 });
+	config.n_validators = 300;
+	config.n_cores = 20;


This value should be set to the number of block validators are expected to check per relay chain block. This depends on approval voting parameters. We should also factor in async backing, that is we'd expect to have 1 included candidate at every block.

We can calibrate these in the followup PR.

* master: Finish documenting `#[pallet::xxx]` macros (#2638) Remove `as frame_system::DefaultConfig` from the required syntax in `derive_impl` (#3505) provisioner: allow multiple cores assigned to the same para (#3233) subsystem-bench: add regression tests for availability read and write (#3311) make SelfParaId a metadata constant (#3517) Fix crash of synced parachain node run with `--sync=warp` (#3523) [Backport] Node version and spec_version bumps and ordering of the prdoc files from 1.8.0 (#3508) Add `claim_assets` extrinsic to `pallet-xcm` (#3403)

…data * ao-collator-parent-head-data: add a comment (review) Finish documenting `#[pallet::xxx]` macros (#2638) Remove `as frame_system::DefaultConfig` from the required syntax in `derive_impl` (#3505) provisioner: allow multiple cores assigned to the same para (#3233) subsystem-bench: add regression tests for availability read and write (#3311) make SelfParaId a metadata constant (#3517) Fix crash of synced parachain node run with `--sync=warp` (#3523) [Backport] Node version and spec_version bumps and ordering of the prdoc files from 1.8.0 (#3508) Add `claim_assets` extrinsic to `pallet-xcm` (#3403)

…paritytech#3311) ### What's been done - `subsystem-bench` has been split into two parts: a cli benchmark runner and a library. - The cli runner is quite simple. It just allows us to run `.yaml` based test sequences. Now it should only be used to run benchmarks during development. - The library is used in the cli runner and in regression tests. Some code is changed to make the library independent of the runner. - Added first regression tests for availability read and write that replicate existing test sequences. ### How we run regression tests - Regression tests are simply rust integration tests without the harnesses. - They should only be compiled under the `subsystem-benchmarks` feature to prevent them from running with other tests. - This doesn't work when running tests with `nextest` in CI, so additional filters have been added to the `nextest` runs. - Each benchmark run takes a different time in the beginning, so we "warm up" the tests until their CPU usage differs by only 1%. - After the warm-up, we run the benchmarks a few more times and compare the average with the exception using a precision. ### What is still wrong? - I haven't managed to set up approval voting tests. The spread of their results is too large and can't be narrowed down in a reasonable amount of time in the warm-up phase. - The tests start an unconfigurable prometheus endpoint inside, which causes errors because they use the same 9999 port. I disable it with a flag, but I think it's better to extract the endpoint launching outside the test, as we already do with `valgrind` and `pyroscope`. But we still use `prometheus` inside the tests. ### Future work * paritytech#3528 * paritytech#3529 * paritytech#3530 * paritytech#3531 --------- Co-authored-by: Alexander Samusev <41779041+alvicsam@users.noreply.github.com>

AndreiEres added 3 commits February 13, 2024 14:26

Split subsystem-bench to lib and cli tool

c01894b

Remove unused values from BenchCli

f71c502

Move usage display to tests

733f0eb

AndreiEres added R0-silent Changes should not be mentioned in any release notes T12-benchmarks This PR/Issue is related to benchmarking and weights. labels Feb 13, 2024

AndreiEres added 13 commits February 13, 2024 14:53

Update displaying of the configuration

fdc1392

fixup: add log target

815c023

Remove toml fmt

0a65126

fixup: remove toml fmt

42963a9

Move approval to lib

98224fd

Move availability to lib

d98b899

Return usage

5175897

Add skeleton tests

a721bca

Add approval tests

b27edaa

Add regression tests

9136955

Address clippy warnings

18a0c7e

Address clippy warnings

d72a035

Address clippy warnings

d161de1

AndreiEres commented Feb 16, 2024

View reviewed changes

AndreiEres added 2 commits February 16, 2024 17:09

Update

ab2e179

Add draft ci run

1e59be1

AndreiEres changed the title ~~[WIP] subsystem-bench: split to a lib and a cli tool~~ [WIP] subsystem-bench: add regression tests Feb 16, 2024

AndreiEres added 5 commits February 16, 2024 17:51

Filter regression tests

b54b85d

Update nexttest config

0e4432a

Skip benches

3b6a3e9

Skip benches

2fec1c1

Rename test job

2657432

AndreiEres changed the title ~~[WIP] subsystem-bench: add regression tests~~ subsystem-bench: add regression tests Feb 23, 2024

Merge branch 'master' into AndreiEres/subsystem-bench-lib

a03fd98

alexggh approved these changes Feb 26, 2024

View reviewed changes

sandreim reviewed Feb 26, 2024

View reviewed changes

AndreiEres added 3 commits February 29, 2024 17:42

Remove approval-voting tests

77f3da5

Remove regression tests from CI

36b3f6c

Add comments

ff91819

AndreiEres changed the title ~~subsystem-bench: add regression tests~~ subsystem-bench: add regression tests for availability read and write Feb 29, 2024

This was referenced Feb 29, 2024

Adjust configuration in subsystem regression tests to Kusama network #3528

Closed

Add regression tests for approval voting subsystem #3529

Closed

Add subsystem regression tests to CI #3530

Closed

Visualize subsystem regression tests results #3531

Closed

sandreim approved these changes Mar 1, 2024

View reviewed changes

Merge branch 'master' into AndreiEres/subsystem-bench-lib

46ec768

AndreiEres enabled auto-merge March 1, 2024 12:01

AndreiEres disabled auto-merge March 1, 2024 12:03

alvicsam approved these changes Mar 1, 2024

View reviewed changes

Add comments

07b4662

AndreiEres added this pull request to the merge queue Mar 1, 2024

Merged via the queue into master with commit f0e589d Mar 1, 2024
11 checks passed

AndreiEres deleted the AndreiEres/subsystem-bench-lib branch March 1, 2024 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subsystem-bench: add regression tests for availability read and write #3311

subsystem-bench: add regression tests for availability read and write #3311

AndreiEres commented Feb 13, 2024 •

edited

Loading

AndreiEres Feb 16, 2024

AndreiEres Feb 16, 2024

AndreiEres Feb 16, 2024

AndreiEres Feb 16, 2024 •

edited

Loading

alexggh left a comment

alexggh Feb 26, 2024

AndreiEres Feb 26, 2024

sandreim Feb 26, 2024

AndreiEres Feb 26, 2024

sandreim Feb 26, 2024

AndreiEres Feb 26, 2024

sandreim Feb 26, 2024

AndreiEres Feb 26, 2024

sandreim Feb 26, 2024

sandreim Feb 26, 2024

AndreiEres Feb 26, 2024

sandreim left a comment

sandreim Mar 1, 2024

sandreim Mar 1, 2024

sandreim Mar 1, 2024

alvicsam Mar 1, 2024

sandreim Mar 1, 2024

subsystem-bench: add regression tests for availability read and write #3311

subsystem-bench: add regression tests for availability read and write #3311

Conversation

AndreiEres commented Feb 13, 2024 • edited Loading

What's been done

How we run regression tests

What is still wrong?

Future work

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreiEres Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

alexggh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreiEres commented Feb 13, 2024 •

edited

Loading

AndreiEres Feb 16, 2024 •

edited

Loading