Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8287906: Rewrite of GitHub Actions (GHA) sanity tests #9063

Closed
wants to merge 13 commits into from

Conversation

magicus
Copy link
Member

@magicus magicus commented Jun 7, 2022

With project Skara, the ability to run a set of sanity build and test jobs on selected platforms was added. This functionality was driven by .github/workflows/submit.yml. This file unfortunately lacks any real structure, and contains a lot of code duplication and redundancy. This has made it hard to add functionality, and new platforms to test, and it has made it even harder to debug issues. (This is hard enough as it is, since we have no direct access to the platforms that GHA runs on.)

Since the GHA tests are important for a large subset of the community, we need to do better.

GitHub Actions framework rewrite

This is a complete overhaul of the GHA testing framework. I started out trying to just tease the old submit.yml apart, trying to de-duplicate code, but I soon realized a much more thorough rework was needed.

Design description

The principle for the new design was to avoid code duplication, and to improve readability of the code. The latter is extra important since the GHA "language" is very limited, needs a lot of quirks and workarounds, and is probably not well known by many OpenJDK developers. I've strived to find useful layers of abstraction to make the expressions as clear as possible.

Unfortunately, the Workflow/Action YAML language is quite limited. There are two ways to avoid duplication, "local composite actions" and "callable workflows". They both have several limitations:

  • "Callable workflows" can only be used in a single redirection. They are (apparently) inlined into the "calling workflow" at run time, and as such, they are present without having to check out the source code. (Which is a lengthy process.)

  • "Local composite actions" can use other actions, but you must start by checking out the repo.

To use the strength of both kinds of sub-modules, I'm using "callable workflows" from main.yml to call build-<platform>.yml and test.yml. It is not allowed to mix "strategies" (that is, the method of automatically creating a test matrix) when calling callable workflows, so I needed to have some amount of duplication in main.yml that could have been avoided otherwise.

All the callable workflows need to check out the source code anyway, so there is no real additional cost of using "local composite actions" for abstraction of these workflows. (A bit of a lucky break.) I've created "high level" actions, corresponding to something like a function call. The goal here was both to avoid duplication, and to improve readability of the workflows.

The four build-<platform>.yml files are very similar. But in the end of the day, only like 50% of the source code is shared, and the platform specific changes permeate the files. So I decided to keep them separately, since mixing them all into one would have made a mess, due to the lack of proper abstraction mechanisms. But that also mean that if we change platform independent code in building, we need to remember to update it in all four places.

In the strictest sense, this is a "refactoring" in that the functionality should be equal to the old submit.yml. The same platforms should build, with the same arguments, and the same tests should run. When I look at the code now, I see lots of potential for improvement here, by rethinking what we do run. But let's save that discussion for the next PR.

There is one major change, though. Windows is no longer running on Cygwin, but on MSYS2. This was not really triggered by the recurring build issues on Cygwin (though that certainly did help me in thinking I made the right choice), but the sheer impossibility of getting Cygwin to behave as a normal unix shell on GHA Windows hosts. I spent countless hours trying to work out limitations, by setting SHELLOPTS=igncr, by running set +x posix to turn of the POSIX compliance mode that kept turning on by itself and made bash choke on several of our scripts, by playing tricks with the PATH, but in the end to no avail. There were no single combination of hacks and workarounds that could get us past the entire chain from configure, to build, to testing. (The old solution user PowerShell instead to get around these limitations.) I'm happy to report that I have had absolutely zero issues with MSYS2 since I made the switch (and understood how to set the PATH properly), and I'm seriously considering switching stance to recommend using MSYS2 instead of Cygwin as the primary winenv for building the JDK.

Example run

A good example on how a run looks like with the new GHA system is the run for this PR.

New features

While the primary focus was to convert the old system to a new framework, more accommodating to development, and to wait with further enhancements for the future, I have made a few additional features already in this PR. Most of them are related to needs that arose during development of this PR.

  • A build failure summary, similar to the recently added test failure summary, is added when the build step fails

  • The test reporting has been extended to all platforms, including Windows

  • Test reporting has been improved slightly, and gotten multiple bug fixes

  • All artifacts are now available for individual download. This includes:

    • The build bundles, per platform
    • The test results, per platform and test suite
    • Build failure logs, in case of build failure

    The build bundles have a retention period of 24 h, but the rest uses GitHub's default retention period (currently 90 days). The idea is that you can use GHA to download builds for platforms you might not have access to, but after that, conserving the builds does not make sense. GitHub currently provides free, unlimited storage (within the retention period) for artifacts, so we can afford this.

  • The GHA process starts up much faster, which mean that e.g. a build failure on an exotic platform will show up earlier. This will not really affect the overall run time though, since it is bounded by variables such as queuing for workers, and waiting on tests with somewhat arbitrarily run times to finish.

Additional changes outside GHA

I also needed to make a few tweaks to the build system to play nice with the new GHA code.

  • The build failure summary is now stored in build/$BUILD/make-support/failure-summary.log

  • The configure summary now indicates what devkit or sysroot is used, if any

  • The --with-sysroot argument is now properly normalized

Test failures

A handful of tests, which relies on shell behavior, turned out to fail on Windows when running under MSYS2. I have filed separate bugs, and submitted PRs, to get these fixed:


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8287906: Rewrite of GitHub Actions (GHA) sanity tests

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9063/head:pull/9063
$ git checkout pull/9063

Update a local copy of the PR:
$ git checkout pull/9063
$ git pull https://git.openjdk.org/jdk pull/9063/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9063

View PR using the GUI difftool:
$ git pr show -t 9063

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9063.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 7, 2022

👋 Welcome back ihse! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 7, 2022
@openjdk
Copy link

openjdk bot commented Jun 7, 2022

@magicus The following label will be automatically applied to this pull request:

  • build

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the build build-dev@openjdk.org label Jun 7, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 7, 2022

@magicus
Copy link
Member Author

magicus commented Jun 9, 2022

Le me clarify a bit a regarding switching from Cygwin to MSYS2.

First of all, Cygwin as a supported environment is not going away. We use Cygwin for Windows builds in the Oracle internal CI system, and will continue to do so for the foreseeable future. Most individual JDK developers (I assume) will use Cygwin when working on Windows. We will never stop supporting Cygwin as a way of building on Windows.

This change however changes how GHA tests are run. In a way, this could be considered to be a good way of actually getting more testing even of the MSYS2 code, which has hitherto had no automatic testing. But the main point is this: without changing to MSYS2, this entire PR falls. There is no way that we can get this unified unix-like behaviour needed to run the new YAML code, our configure, build and test scripts on GHA Windows hosts with Cygwin. Note that the old script did not use Cygwin. It used Powershell. All YAML rules were written in Powershell, and the only use of Cygwin was that the path to Cygwin utils were passed to configure and jtreg. I spent literally days and days trying to wrangle Cygwin into behaving on GHA, but in the end the tooling provided to us by Github turned out to be insufficient to achieve this. So it's MSYS2, or no unification at all. I don't think that's very controversial -- Cygwin was previously chosen just since it was the most well-known and officially best supported environment, not because of some conscious deliberation between Cygwin and MSYS2. It has in fact turned out to be highly problematic, just go search JBS for how many Cygwin-related bugs on GHA there have been...

And finally, what I meant by my note about recommending MSYS2 is just that the current build README heavily favors Cygwin, and discredits the alternatives (WSL and MSYS2). I think the wordings there might be soften up, and MSYS2 presented as a fully viable alternative to Cygin; and perhaps even presented as the simplest choice for new developers.

…ref to PR that fixes build on msys2. (This will be updated once the fix is in an official jtreg release.)
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks very nice. I have a few inline comments.

The general one is that we need to make sure:

  • jtreg needs to be built from its mainline at a given tag;
  • tests need to pass (I assume you are waiting for test fixes to land first);

.github/actions/get-jtreg/action.yml Outdated Show resolved Hide resolved
.github/actions/get-bundles/action.yml Show resolved Hide resolved
.github/actions/get-bootjdk/action.yml Show resolved Hide resolved
.github/actions/get-msys/action.yml Outdated Show resolved Hide resolved
.github/workflows/build-cross-compile.yml Outdated Show resolved Hide resolved
.github/workflows/build-cross-compile.yml Outdated Show resolved Hide resolved
.github/workflows/main.yml Outdated Show resolved Hide resolved
make/InitSupport.gmk Outdated Show resolved Hide resolved
Copy link
Contributor

@jonathan-gibbons jonathan-gibbons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jtreg changes to build with MSYS2 should now be available in the main jtreg repo.

@magicus
Copy link
Member Author

magicus commented Jun 10, 2022

@shipilev JDK-8287902 and JDK-8287895 are now fixed, and with that, all msys2-related test problems should be resolved.

@magicus
Copy link
Member Author

magicus commented Jun 13, 2022

I'm restoring the old "functionality" of removing the newly built bundles from the set of artifacts. Even though the GHA runs in the users' personal forks, and the artifacts expire in 24 h, some of my co-workers felt this could be misconstrued as a binary distribution from the JDK project, which we are not in the business of doing. (If anyone disagrees with this interpretation, please save that discussion for a separate time. It is easy to restore the functionality if need be.)

@openjdk
Copy link

openjdk bot commented Jun 13, 2022

@magicus This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8287906: Rewrite of GitHub Actions (GHA) sanity tests

Reviewed-by: shade, erikj, cstein

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 15 new commits pushed to the master branch:

  • 8f400b9: 8286779: javax.crypto.CryptoPolicyParser#isConsistent always returns 'true'
  • e0baf01: 8287007: [cgroups] Consistently use stringStream throughout parsing code
  • 1769596: 8285263: Minor cleanup could be done in java.security
  • b97a4f6: 8288114: Update JIRA link in vcs.xml
  • 2adef6a: 8267860: Off-by-one bug when searching arrays in AlpnGreaseTest
  • 4aede33: 8288282: Zero-release build is broken after JDK-8279047 due to UseHeavyMonitors is read-only
  • 0207d76: 8287926: AArch64: intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long
  • 33ed036: 8283775: better dump: VM support for graph querying in debugger with BFS traversal and node filtering
  • ac28be7: 8283612: IGV: Remove Graal module
  • 0cb0ecf: 8209935: Test to cover CodeSource.getCodeSigners()
  • ... and 5 more: https://git.openjdk.org/jdk/compare/f2e10dce786a01768436f32e233d72cb4257fbcf...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 13, 2022
Copy link
Member

@sormuras sormuras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting logic into smaller actions, scripts and workflow definitions is great!
jtreg-related changes look good to me.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, let's go!

@magicus
Copy link
Member Author

magicus commented Jun 14, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Jun 14, 2022

Going to push as commit 1a65332.
Since your change was applied there have been 28 commits pushed to the master branch:

  • c2ccf4c: 8288003: log events for os::dll_unload
  • 03dca56: 8287525: Extend IR annotation with new options to test specific target feature.
  • 86c9241: 8287028: AArch64: [vectorapi] Backend implementation of VectorMask.fromLong with SVE2
  • fbe9266: 8288378: [BACKOUT] DST not applying properly with zone id offset set with TZ env variable
  • 1904353: Merge
  • 7aafc69: 8288105: [PPC64] Problems with -XX:+VerifyStack
  • f4b05a1: 8288173: JDK-8202449 fix causes conformance test failure : api/java_util/Random/RandomGenerator/NextFloat.html
  • d9c1364: 8288101: False build warning-as-error with GCC 9 after JDK-8214976
  • a9c2ab6: 8288080: (fc) FileChannel::map for MemorySegments should state it always throws UOE
  • e90b579: 8288332: Tier1 validate-source fails after 8279614
  • ... and 18 more: https://git.openjdk.org/jdk/compare/f2e10dce786a01768436f32e233d72cb4257fbcf...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 14, 2022
@openjdk openjdk bot closed this Jun 14, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 14, 2022
@openjdk
Copy link

openjdk bot commented Jun 14, 2022

@magicus Pushed as commit 1a65332.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@magicus magicus deleted the gha-refactor branch June 14, 2022 07:50
elif [[ $GITHUB_EVENT_NAME == push ]]; then
input='${{ secrets.JDK_SUBMIT_PLATFORMS }}'
else
echo 'Internal error in GHA'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magicus isn't this check too strict? The Corretto team runs these steps for all our PRs to make sure we will not break the OpenJDK later if our fix will be proposed upstream. But this line prevents us to run it by the "pull_request". I can apply the local patch for sure, but probably this check can be relaxed?
Something like:

            elif [[ $GITHUB_EVENT_NAME == pull_request ]]; then
              input='${{ secrets.JDK_SUBMIT_PLATFORMS }}'
            else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrserb possibly related: JDK-8291444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org integrated Pull request has been integrated
7 participants