-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8287906: Rewrite of GitHub Actions (GHA) sanity tests #9063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back ihse! A progress list of the required criteria for merging this PR into |
Webrevs
|
Le me clarify a bit a regarding switching from Cygwin to MSYS2. First of all, Cygwin as a supported environment is not going away. We use Cygwin for Windows builds in the Oracle internal CI system, and will continue to do so for the foreseeable future. Most individual JDK developers (I assume) will use Cygwin when working on Windows. We will never stop supporting Cygwin as a way of building on Windows. This change however changes how GHA tests are run. In a way, this could be considered to be a good way of actually getting more testing even of the MSYS2 code, which has hitherto had no automatic testing. But the main point is this: without changing to MSYS2, this entire PR falls. There is no way that we can get this unified unix-like behaviour needed to run the new YAML code, our configure, build and test scripts on GHA Windows hosts with Cygwin. Note that the old script did not use Cygwin. It used Powershell. All YAML rules were written in Powershell, and the only use of Cygwin was that the path to Cygwin utils were passed to configure and jtreg. I spent literally days and days trying to wrangle Cygwin into behaving on GHA, but in the end the tooling provided to us by Github turned out to be insufficient to achieve this. So it's MSYS2, or no unification at all. I don't think that's very controversial -- Cygwin was previously chosen just since it was the most well-known and officially best supported environment, not because of some conscious deliberation between Cygwin and MSYS2. It has in fact turned out to be highly problematic, just go search JBS for how many Cygwin-related bugs on GHA there have been... And finally, what I meant by my note about recommending MSYS2 is just that the current build README heavily favors Cygwin, and discredits the alternatives (WSL and MSYS2). I think the wordings there might be soften up, and MSYS2 presented as a fully viable alternative to Cygin; and perhaps even presented as the simplest choice for new developers. |
…ref to PR that fixes build on msys2. (This will be updated once the fix is in an official jtreg release.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks very nice. I have a few inline comments.
The general one is that we need to make sure:
- jtreg needs to be built from its mainline at a given tag;
- tests need to pass (I assume you are waiting for test fixes to land first);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jtreg changes to build with MSYS2 should now be available in the main jtreg repo.
@shipilev JDK-8287902 and JDK-8287895 are now fixed, and with that, all msys2-related test problems should be resolved. |
I'm restoring the old "functionality" of removing the newly built bundles from the set of artifacts. Even though the GHA runs in the users' personal forks, and the artifacts expire in 24 h, some of my co-workers felt this could be misconstrued as a binary distribution from the JDK project, which we are not in the business of doing. (If anyone disagrees with this interpretation, please save that discussion for a separate time. It is easy to restore the functionality if need be.) |
@magicus This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 15 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting logic into smaller actions, scripts and workflow definitions is great!
jtreg
-related changes look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, let's go!
/integrate |
Going to push as commit 1a65332.
Your commit was automatically rebased without conflicts. |
elif [[ $GITHUB_EVENT_NAME == push ]]; then | ||
input='${{ secrets.JDK_SUBMIT_PLATFORMS }}' | ||
else | ||
echo 'Internal error in GHA' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magicus isn't this check too strict? The Corretto team runs these steps for all our PRs to make sure we will not break the OpenJDK later if our fix will be proposed upstream. But this line prevents us to run it by the "pull_request". I can apply the local patch for sure, but probably this check can be relaxed?
Something like:
elif [[ $GITHUB_EVENT_NAME == pull_request ]]; then
input='${{ secrets.JDK_SUBMIT_PLATFORMS }}'
else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrserb possibly related: JDK-8291444
With project Skara, the ability to run a set of sanity build and test jobs on selected platforms was added. This functionality was driven by
.github/workflows/submit.yml
. This file unfortunately lacks any real structure, and contains a lot of code duplication and redundancy. This has made it hard to add functionality, and new platforms to test, and it has made it even harder to debug issues. (This is hard enough as it is, since we have no direct access to the platforms that GHA runs on.)Since the GHA tests are important for a large subset of the community, we need to do better.
GitHub Actions framework rewrite
This is a complete overhaul of the GHA testing framework. I started out trying to just tease the old
submit.yml
apart, trying to de-duplicate code, but I soon realized a much more thorough rework was needed.Design description
The principle for the new design was to avoid code duplication, and to improve readability of the code. The latter is extra important since the GHA "language" is very limited, needs a lot of quirks and workarounds, and is probably not well known by many OpenJDK developers. I've strived to find useful layers of abstraction to make the expressions as clear as possible.
Unfortunately, the Workflow/Action YAML language is quite limited. There are two ways to avoid duplication, "local composite actions" and "callable workflows". They both have several limitations:
"Callable workflows" can only be used in a single redirection. They are (apparently) inlined into the "calling workflow" at run time, and as such, they are present without having to check out the source code. (Which is a lengthy process.)
"Local composite actions" can use other actions, but you must start by checking out the repo.
To use the strength of both kinds of sub-modules, I'm using "callable workflows" from
main.yml
to callbuild-<platform>.yml
andtest.yml
. It is not allowed to mix "strategies" (that is, the method of automatically creating a test matrix) when calling callable workflows, so I needed to have some amount of duplication inmain.yml
that could have been avoided otherwise.All the callable workflows need to check out the source code anyway, so there is no real additional cost of using "local composite actions" for abstraction of these workflows. (A bit of a lucky break.) I've created "high level" actions, corresponding to something like a function call. The goal here was both to avoid duplication, and to improve readability of the workflows.
The four
build-<platform>.yml
files are very similar. But in the end of the day, only like 50% of the source code is shared, and the platform specific changes permeate the files. So I decided to keep them separately, since mixing them all into one would have made a mess, due to the lack of proper abstraction mechanisms. But that also mean that if we change platform independent code in building, we need to remember to update it in all four places.In the strictest sense, this is a "refactoring" in that the functionality should be equal to the old
submit.yml
. The same platforms should build, with the same arguments, and the same tests should run. When I look at the code now, I see lots of potential for improvement here, by rethinking what we do run. But let's save that discussion for the next PR.There is one major change, though. Windows is no longer running on Cygwin, but on MSYS2. This was not really triggered by the recurring build issues on Cygwin (though that certainly did help me in thinking I made the right choice), but the sheer impossibility of getting Cygwin to behave as a normal unix shell on GHA Windows hosts. I spent countless hours trying to work out limitations, by setting
SHELLOPTS=igncr
, by runningset +x posix
to turn of the POSIX compliance mode that kept turning on by itself and made bash choke on several of our scripts, by playing tricks with thePATH
, but in the end to no avail. There were no single combination of hacks and workarounds that could get us past the entire chain from configure, to build, to testing. (The old solution user PowerShell instead to get around these limitations.) I'm happy to report that I have had absolutely zero issues with MSYS2 since I made the switch (and understood how to set the PATH properly), and I'm seriously considering switching stance to recommend using MSYS2 instead of Cygwin as the primary winenv for building the JDK.Example run
A good example on how a run looks like with the new GHA system is the run for this PR.
New features
While the primary focus was to convert the old system to a new framework, more accommodating to development, and to wait with further enhancements for the future, I have made a few additional features already in this PR. Most of them are related to needs that arose during development of this PR.
A build failure summary, similar to the recently added test failure summary, is added when the build step fails
The test reporting has been extended to all platforms, including Windows
Test reporting has been improved slightly, and gotten multiple bug fixes
All artifacts are now available for individual download. This includes:
The build bundles have a retention period of 24 h, but the rest uses GitHub's default retention period (currently 90 days). The idea is that you can use GHA to download builds for platforms you might not have access to, but after that, conserving the builds does not make sense. GitHub currently provides free, unlimited storage (within the retention period) for artifacts, so we can afford this.
The GHA process starts up much faster, which mean that e.g. a build failure on an exotic platform will show up earlier. This will not really affect the overall run time though, since it is bounded by variables such as queuing for workers, and waiting on tests with somewhat arbitrarily run times to finish.
Additional changes outside GHA
I also needed to make a few tweaks to the build system to play nice with the new GHA code.
The build failure summary is now stored in build/$BUILD/make-support/failure-summary.log
The configure summary now indicates what devkit or sysroot is used, if any
The --with-sysroot argument is now properly normalized
Test failures
A handful of tests, which relies on shell behavior, turned out to fail on Windows when running under MSYS2. I have filed separate bugs, and submitted PRs, to get these fixed:
https://bugs.openjdk.org/browse/JDK-8287902
https://bugs.openjdk.org/browse/JDK-8287895
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9063/head:pull/9063
$ git checkout pull/9063
Update a local copy of the PR:
$ git checkout pull/9063
$ git pull https://git.openjdk.org/jdk pull/9063/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9063
View PR using the GUI difftool:
$ git pr show -t 9063
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9063.diff