chore(ci): introduce parallel test runs for our acceptance tests #5097

thisislawatts · 2024-03-08T15:44:16Z

Pull Request Submission

Please check the boxes once done.

The pull request must:

Reviewer Documentation
- follow CONTRIBUTING rules
- be accompanied by a detailed description of the changes
- contain a risk assessment of the change (Low | Medium | High) with regards to breaking existing functionality. A change e.g. of an underlying language plugin can completely break the functionality for that language, but appearing as only a version change in the dependencies.
- highlight breaking API if applicable
- contain a link to the automatic tests that cover the updated functionality.
- contain testing instructions in case that the reviewer wants to manual verify as well, to add to the manual testing done by the author.
- link to the link to the PR for the User-facing documentation
User facing Documentation
- update any relevant documentation in gitbook by submitting a gitbook PR, and including the PR link here
- ensure that the message of the final single commit is descriptive and prefixed with either feat: or fix: , others might be used in rare occasions as well, if there is no need to document the changes in the release notes. The changes or fixes should be described in detail in the commit message for the changelog & release notes.
Testing
- Changes, removals and additions to functionality must be covered by acceptance / integration tests or smoke tests - either already existing ones, or new ones, created by the author of the PR.

What does this PR do?

This PR implements CircleCI parallelism for all acceptance test suites, significantly reducing the duration for the acceptance-tests windows amd64 suite from 41 minutes (p95 last 90 days) to ~14 minutes.

Where should the reviewer start?

Reviewers should start by examining the changes made to the CircleCI configuration files to implement parallelism for acceptance test suites.
76fa2da Remove hard coded port from tests which introduced issues when parallelising them.
Making use of recently upgraded jest dep. and adopted --shard flag.
Note that parallelism has been introduced as a configurable parameter, this allows us to increase it further for windows whilst balancing against cost for the other platforms.

How should this be manually tested?

Review the pipelines associated with this PR.

Any background context you want to provide?

Previously, the acceptance-tests windows amd64 suite had a duration of 41 minutes. By adopting CircleCI parallelism for all acceptance test suites, we've managed to reduce the duration to 14 minutes, significantly improving our testing efficiency.

PeterSchafer · 2024-03-11T16:46:23Z

scripts/capture-used-ports.js

@@ -0,0 +1,35 @@
+'use strict';


Question: I appreciate the need of having a better solution for the ports used in tests. The question for me is if this needs to be combined with the parallelization of tests. As long as the tests in one shard run synchronously, the previous hard-coded port should be fine. The reason I ask, is that the solution implemented here seems quite complex and I wonder if we actually need this.

I can expand a bit. As far as I see, there is a timing aspect involved here. The port state at the point in time when the file is populated and the time when random values are being filtered based on the file. It is well possible that new ports get used in between these points in time, which is why I wonder if a random value with a retry wouldn't be a simpler and more robust solution.

https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use is the issue here. It's unlikely in a container but stranger things can happen over time that might confound future us.

Binding to port 0 is probably the best way to get a unique port, but this would be hard to implement in the existing codebase. Here's another idea though:

Reserve the port range in advance with sysctl, https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html#ip-variables, see ip_local_reserved_ports. This will prevent the ports from being automatically assigned.

This would be set up either in a VM with sudo sysctl or with the --sysctl argument to docker run.

It might require more changes to how the executor is set up in CircleCI.

The "random port + retry" approach @PeterSchafer suggests also SGTM. The potential payoff in doing this (either approach) I see, is that it could potentially save us CircleCI credits if we can push more of the concurrency into an executor, rather than only scaling by adding more executor instances.

Concurrence inside of an executor might have multiple issues to address besides port usage. There will occur issues in the usage of shared filesystem resources, like the config file. Some tests need to change the config file and others might start to fail randomly.
Parallelizing via the Executors is a first good approach and their isolation helps with the other topics of shared resources.

PeterSchafer · 2024-03-12T12:14:21Z

Very nice!

Windows is the slowest test run, a problem made worse by the time consuming build process that runs before it. Perhaps a short term workaround until we have time to optimise the build step is to increase the number of shards.

thisislawatts force-pushed the chore/ci-experiments-in-parallel branch 3 times, most recently from 78502bc to c678502 Compare March 11, 2024 12:58

thisislawatts marked this pull request as ready for review March 11, 2024 13:23

thisislawatts requested a review from a team as a code owner March 11, 2024 13:23

PeterSchafer reviewed Mar 11, 2024

View reviewed changes

thisislawatts force-pushed the chore/ci-experiments-in-parallel branch 2 times, most recently from c71629b to 8e60f31 Compare March 12, 2024 12:13

PeterSchafer approved these changes Mar 12, 2024

View reviewed changes

thisislawatts force-pushed the chore/ci-experiments-in-parallel branch 3 times, most recently from e41d3dc to fb8d7bc Compare March 12, 2024 15:17

chore(ci): experiments in parallelism

a0a358b

Windows is the slowest test run, a problem made worse by the time consuming build process that runs before it. Perhaps a short term workaround until we have time to optimise the build step is to increase the number of shards.

thisislawatts force-pushed the chore/ci-experiments-in-parallel branch from fb8d7bc to a0a358b Compare March 12, 2024 16:04

thisislawatts enabled auto-merge (squash) March 12, 2024 16:08

thisislawatts merged commit aaa1074 into main Mar 12, 2024
14 checks passed

thisislawatts deleted the chore/ci-experiments-in-parallel branch March 12, 2024 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(ci): introduce parallel test runs for our acceptance tests #5097

chore(ci): introduce parallel test runs for our acceptance tests #5097

thisislawatts commented Mar 8, 2024 •

edited

Loading

PeterSchafer Mar 11, 2024 •

edited

Loading

PeterSchafer Mar 11, 2024

cmars Mar 11, 2024 •

edited

Loading

cmars Mar 11, 2024 •

edited

Loading

PeterSchafer Mar 11, 2024

PeterSchafer commented Mar 12, 2024

chore(ci): introduce parallel test runs for our acceptance tests #5097

chore(ci): introduce parallel test runs for our acceptance tests #5097

Conversation

thisislawatts commented Mar 8, 2024 • edited Loading

Pull Request Submission

What does this PR do?

Where should the reviewer start?

How should this be manually tested?

Any background context you want to provide?

PeterSchafer Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

PeterSchafer Mar 11, 2024

Choose a reason for hiding this comment

cmars Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

cmars Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

PeterSchafer Mar 11, 2024

Choose a reason for hiding this comment

PeterSchafer commented Mar 12, 2024

thisislawatts commented Mar 8, 2024 •

edited

Loading

PeterSchafer Mar 11, 2024 •

edited

Loading

cmars Mar 11, 2024 •

edited

Loading

cmars Mar 11, 2024 •

edited

Loading