Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ci): introduce parallel test runs for our acceptance tests #5097

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

thisislawatts
Copy link
Member

@thisislawatts thisislawatts commented Mar 8, 2024

Pull Request Submission

Please check the boxes once done.

The pull request must:

  • Reviewer Documentation
    • follow CONTRIBUTING rules
    • be accompanied by a detailed description of the changes
    • contain a risk assessment of the change (Low | Medium | High) with regards to breaking existing functionality. A change e.g. of an underlying language plugin can completely break the functionality for that language, but appearing as only a version change in the dependencies.
    • highlight breaking API if applicable
    • contain a link to the automatic tests that cover the updated functionality.
    • contain testing instructions in case that the reviewer wants to manual verify as well, to add to the manual testing done by the author.
    • link to the link to the PR for the User-facing documentation
  • User facing Documentation
    • update any relevant documentation in gitbook by submitting a gitbook PR, and including the PR link here
    • ensure that the message of the final single commit is descriptive and prefixed with either feat: or fix: , others might be used in rare occasions as well, if there is no need to document the changes in the release notes. The changes or fixes should be described in detail in the commit message for the changelog & release notes.
  • Testing
    • Changes, removals and additions to functionality must be covered by acceptance / integration tests or smoke tests - either already existing ones, or new ones, created by the author of the PR.

What does this PR do?

This PR implements CircleCI parallelism for all acceptance test suites, significantly reducing the duration for the acceptance-tests windows amd64 suite from 41 minutes (p95 last 90 days) to ~14 minutes.

Where should the reviewer start?

  • Reviewers should start by examining the changes made to the CircleCI configuration files to implement parallelism for acceptance test suites.
  • 76fa2da Remove hard coded port from tests which introduced issues when parallelising them.
  • Making use of recently upgraded jest dep. and adopted --shard flag.
  • Note that parallelism has been introduced as a configurable parameter, this allows us to increase it further for windows whilst balancing against cost for the other platforms.

How should this be manually tested?

Review the pipelines associated with this PR.

Any background context you want to provide?

Previously, the acceptance-tests windows amd64 suite had a duration of 41 minutes. By adopting CircleCI parallelism for all acceptance test suites, we've managed to reduce the duration to 14 minutes, significantly improving our testing efficiency.

@thisislawatts thisislawatts force-pushed the chore/ci-experiments-in-parallel branch 3 times, most recently from 78502bc to c678502 Compare March 11, 2024 12:58
@thisislawatts thisislawatts marked this pull request as ready for review March 11, 2024 13:23
@thisislawatts thisislawatts requested a review from a team as a code owner March 11, 2024 13:23
@@ -0,0 +1,35 @@
'use strict';
Copy link
Contributor

@PeterSchafer PeterSchafer Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I appreciate the need of having a better solution for the ports used in tests. The question for me is if this needs to be combined with the parallelization of tests. As long as the tests in one shard run synchronously, the previous hard-coded port should be fine. The reason I ask, is that the solution implemented here seems quite complex and I wonder if we actually need this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can expand a bit. As far as I see, there is a timing aspect involved here. The port state at the point in time when the file is populated and the time when random values are being filtered based on the file. It is well possible that new ports get used in between these points in time, which is why I wonder if a random value with a retry wouldn't be a simpler and more robust solution.

Copy link
Contributor

@cmars cmars Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use is the issue here. It's unlikely in a container but stranger things can happen over time that might confound future us.

Binding to port 0 is probably the best way to get a unique port, but this would be hard to implement in the existing codebase. Here's another idea though:

Reserve the port range in advance with sysctl, https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html#ip-variables, see ip_local_reserved_ports. This will prevent the ports from being automatically assigned.

This would be set up either in a VM with sudo sysctl or with the --sysctl argument to docker run.

It might require more changes to how the executor is set up in CircleCI.

Copy link
Contributor

@cmars cmars Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "random port + retry" approach @PeterSchafer suggests also SGTM. The potential payoff in doing this (either approach) I see, is that it could potentially save us CircleCI credits if we can push more of the concurrency into an executor, rather than only scaling by adding more executor instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrence inside of an executor might have multiple issues to address besides port usage. There will occur issues in the usage of shared filesystem resources, like the config file. Some tests need to change the config file and others might start to fail randomly.
Parallelizing via the Executors is a first good approach and their isolation helps with the other topics of shared resources.

@thisislawatts thisislawatts force-pushed the chore/ci-experiments-in-parallel branch 2 times, most recently from c71629b to 8e60f31 Compare March 12, 2024 12:13
@PeterSchafer
Copy link
Contributor

Very nice!

@thisislawatts thisislawatts force-pushed the chore/ci-experiments-in-parallel branch 3 times, most recently from e41d3dc to fb8d7bc Compare March 12, 2024 15:17
Windows is the slowest test run, a problem made worse
by the time consuming build process that runs before it.

Perhaps a short term workaround until we have time to
optimise the build step is to increase the number of shards.
@thisislawatts thisislawatts force-pushed the chore/ci-experiments-in-parallel branch from fb8d7bc to a0a358b Compare March 12, 2024 16:04
@thisislawatts thisislawatts enabled auto-merge (squash) March 12, 2024 16:08
@thisislawatts thisislawatts merged commit aaa1074 into main Mar 12, 2024
14 checks passed
@thisislawatts thisislawatts deleted the chore/ci-experiments-in-parallel branch March 12, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants