-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(ci): introduce parallel test runs for our acceptance tests #5097
Conversation
78502bc
to
c678502
Compare
scripts/capture-used-ports.js
Outdated
@@ -0,0 +1,35 @@ | |||
'use strict'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I appreciate the need of having a better solution for the ports used in tests. The question for me is if this needs to be combined with the parallelization of tests. As long as the tests in one shard run synchronously, the previous hard-coded port should be fine. The reason I ask, is that the solution implemented here seems quite complex and I wonder if we actually need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can expand a bit. As far as I see, there is a timing aspect involved here. The port state at the point in time when the file is populated and the time when random values are being filtered based on the file. It is well possible that new ports get used in between these points in time, which is why I wonder if a random value with a retry wouldn't be a simpler and more robust solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use is the issue here. It's unlikely in a container but stranger things can happen over time that might confound future us.
Binding to port 0 is probably the best way to get a unique port, but this would be hard to implement in the existing codebase. Here's another idea though:
Reserve the port range in advance with sysctl
, https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html#ip-variables, see ip_local_reserved_ports
. This will prevent the ports from being automatically assigned.
This would be set up either in a VM with sudo sysctl
or with the --sysctl
argument to docker run
.
It might require more changes to how the executor is set up in CircleCI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "random port + retry" approach @PeterSchafer suggests also SGTM. The potential payoff in doing this (either approach) I see, is that it could potentially save us CircleCI credits if we can push more of the concurrency into an executor, rather than only scaling by adding more executor instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrence inside of an executor might have multiple issues to address besides port usage. There will occur issues in the usage of shared filesystem resources, like the config file. Some tests need to change the config file and others might start to fail randomly.
Parallelizing via the Executors is a first good approach and their isolation helps with the other topics of shared resources.
c71629b
to
8e60f31
Compare
Very nice! |
e41d3dc
to
fb8d7bc
Compare
Windows is the slowest test run, a problem made worse by the time consuming build process that runs before it. Perhaps a short term workaround until we have time to optimise the build step is to increase the number of shards.
fb8d7bc
to
a0a358b
Compare
Pull Request Submission
Please check the boxes once done.
The pull request must:
feat:
orfix:
, others might be used in rare occasions as well, if there is no need to document the changes in the release notes. The changes or fixes should be described in detail in the commit message for the changelog & release notes.What does this PR do?
This PR implements CircleCI parallelism for all acceptance test suites, significantly reducing the duration for the acceptance-tests windows amd64 suite from 41 minutes (p95 last 90 days) to ~14 minutes.
Where should the reviewer start?
--shard
flag.How should this be manually tested?
Review the pipelines associated with this PR.
Any background context you want to provide?
Previously, the acceptance-tests windows amd64 suite had a duration of 41 minutes. By adopting CircleCI parallelism for all acceptance test suites, we've managed to reduce the duration to 14 minutes, significantly improving our testing efficiency.