8256127: Add cross-compiled foreign architectures builds to submit workflow #1147

shipilev · 2020-11-10T19:05:31Z

It is possible to efficiently cross-compile to foreign architectures on current GH actions that are driven by Ubuntu. I have been using this method for years to produce binaries at builds.shipilev.net.

These cross-compilation targets frequently find arch-specific build bugs. Hotspot is rather well insulated from the fundamental problems in arch-support, so the overwhelming majority of those arch-specific bugs are simple omissions of #include-s, inconsistent #ifdef-ing, missed method renames / signature changes, or incorrect removals of methods that are used by arch-specific code. Those are straightforward to fix, if the contributor knows about them. My experience says the common attitude to these fixes is: "Oh, I would have fixed it in the original change if I knew about this."

This improvement adds several foreign architectures to GH actions workflow to provide the automatic safety net to give early warning for these cases. In fact, many Hotspot contributors already build AArch64, ARM, PPC64 targets as pre-integration sanity checks, and this does the important checks automatically for them.

To optimize workflow costs, the change does the following:

The build is only done for hotspot-debug-no-pch, as the most frequent place where arch-specific build bugs show up.
There are no tests, and in fact the arch-specific binary does not even run. The build is purely about the build.
Foreign architectures reuse the Linux x86_64 release build as build JDK. This avoids building "host" build JDK as it would otherwise happen with cross-compiling.
The created sysroot is cached and keyed on submit.yml hash. So it would regenerate only if the workflow itself changes. This saves about 10 minutes per arch.

Space-wise, sysroots in GH cache take about 580M uncompressed, and 270M zstd-compressed bundle. In the end, this adds 4x270 = 1080M into local cache.

Time-wise, ball-parking the current workflow budget looks like this:

Linux x86_64:
- builds: 120m
- tests: 200m
Linux x86_32:
- builds: 75m
- tests: 235m
Windows x86_64
- builds: 120m
- tests: 290m
MacOS x86_64
- builds: 75m
- tests: 165m
Linux aarch64:
- builds: 25m (+10m to create uncached sysroot)
Linux arm:
- builds: 20m (+10m to create uncached sysroot)
Linux ppc64le:
- builds: 20m (+15m to create uncached sysroot)
Linux s390x:
- builds: 20m (+10m to create uncached sysroot)

In other words, new workflow takes about 715 Linux-host-minutes, 410 Windows-host-minutes, 240 Mac-host-minutes. Out of which new jobs take about 85 Linux-host-minutes. So the cost of new jobs is roughly:

11.9% of Linux-host-minutes
6.2% of all-host-minutes
2.2% of runner-minutes, if you weigh in the proportional cost of non-Linux runners

In other words, the additional runner costs are pale in comparison with what is already done in build and test jobs.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Testing

	Linux aarch64	Linux arm	Linux ppc64le	Linux s390x	Linux x64	Linux x86	Windows x64	macOS x64
Build	✔️ (1/1 passed)	✔️ (1/1 passed)	✔️ (1/1 passed)	✔️ (1/1 passed)	✔️ (5/5 passed)	✔️ (2/2 passed)	✔️ (2/2 passed)	✔️ (2/2 passed)
Test (tier1)					⏳ (5/9 running)	⏳ (3/9 running)	⏳ (8/9 running)	⏳ (1/9 running)

Issue

JDK-8256127: Add cross-compiled foreign architectures builds to submit workflow

Reviewers

Magnus Ihse Bursie (@magicus - Reviewer) ⚠️ Review applies to 81e07fe
Robin Westberg (@rwestberg - Committer) ⚠️ Review applies to 81e07fe

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1147/head:pull/1147
$ git checkout pull/1147

bridgekeeper · 2020-11-10T19:06:06Z

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2020-11-10T19:06:52Z

@shipilev The following label will be automatically applied to this pull request:

build

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

…rkflow

mlbridge · 2020-11-11T13:52:01Z

Webrevs

TheRealMDoerr · 2020-11-11T15:17:44Z

Awesome! Thanks a lot for doing this!

magicus

Aleksey,

I'm not at all sure I want to accept these changes. I understand that you care about these exotic platforms, and so do I, but adding them to the submit workflow for all changes seem like the wrong way to proceed.

My concern vary all the way from increased build time, to increased usage of the Github CPU quota (which, as I understand it, is fairly large but not unlimited), to the elephant in the room: the question on who the burden lays to fix bugs in these exotic platforms. It is not at all obvious that these builds should be tested before commit.

I think this change warrants a much wider and deeper discussion about the purpose of the Github commit hook actions, how the free Github CPU quota should be spent, and on whom the responsibility lays to either make sure no. commit is pushed that breaks an exotic platform, or to rectify the problem with a follow-up issue.

I recommend you withdraw this PR and instead initiate a wider discussion on a suitable mailing list, e.g. jdk-dev.

TheRealMDoerr · 2020-11-11T21:57:47Z

Maybe it would be possible to build hotspot only as this is the part where things often break? This would save a lot of CPU time.
E.g. I think it'd be valueable to have at least one Big Endian platform included.
I noticed that some Oracle people use cross builds to check their changes on these platforms. It would save a bit of their time as well.

shipilev · 2020-11-12T09:34:51Z

I'm not at all sure I want to accept these changes. I understand that you care about these exotic platforms, and so do I, but adding them to the submit workflow for all changes seem like the wrong way to proceed.

I understand the first reaction, but please read the updated description. It includes the discussion what bugs it tries to catch, and updated cost estimates.

Submit workflow is a convenience thing for better testing coverage. Having the early warnings for fixable issues in foreign architectures helps to avoid follow-up churn without increasing the costs all too much. Do note that it does not require contributors to fix the runtime issues in foreign architectures, it only asks for passing a rather low bar of not breaking the builds for them, which is, in my experience, an easily achievable goal. It is easily achievable because -- I am speaking from experience of fixing a lot of them over the years! -- the overwhelming majority of arch-specific build breakages are about silly things that are obvious from the build logs.

So I am leaving this PR open, letting others to chime in. If there is really a need for a wider discussion here, I could start a thread at jdk-dev@, but so far it does not look as a good use of our collective time. Please read through the description, and see if that soothes your concerns.

Aside: the issues about costs were raised before, but I think all of them resolve as "enjoy your free tier". If we target to optimize the GH action costs, IMO we should instead consider to avoid triggering them on every push, as I hinted before, and which I think requires better Skara integration (i.e. for /test command). Meanwhile, this PR gives us extended build coverage at fraction of the additional cost.

simonis · 2020-11-12T10:08:39Z

I think this is a really useful feature and I'm a little surprised that it gets rejected. Please find my comments inline.

Aleksey,

I'm not at all sure I want to accept these changes. I understand that you care about these exotic platforms, and so do I, but adding them to the submit workflow for all changes seem like the wrong way to proceed.

My concern vary all the way from increased build time, to increased usage of the Github CPU quota (which, as I understand it, is fairly large but not unlimited), to the elephant in the room: the question on who the burden lays to fix bugs in these exotic platforms. It is not at all obvious that these builds should be tested before commit.

I think "on who's the burden to fix issues" on exotic platforms and "testing changes on all platforms" at commit time are two totally distinct questions. While I agree that problems on a exotic platform shouldn't block reviewed changes for an unreasonable amount of time, I also think it is very valuable for both, authors of changes and maintainers of exotic platforms to at least get early notifications of problems on other platforms.

My experience as a maintainer of the AIX7PowerPC/s390 ports has always been very positive with regards to the collaboration with authors of changes which impact these ports. Authors are anxious to not break other platforms but usually don't have the possibility to test on these platforms. Aleksey's change comes in very handy here. It will help authors to fix trivial build problems right in their initial submission which I'm sure they'll appreciate. For more complex problems it gives Authors the possibility to notify port maintainers early, before a change gets pushed. I'm sure this alerting could even be automatized in the future if builds/tests fail on certain platforms.

Of course such problems shouldn't block changes which are otherwise reviewed and ready to push for an unreasonable amount of time, so I have nothing against making the test on such platform optional submit requirements.

But overall I think Aleksey's enhancement are a great means to improve the development experience and to keep the OpenJDK repos clean of trivial follow-up fixes.

PS: I specially emphasized exotic platforms because it seems to me that your comment implies that all not Oracle-supported platforms are considered exotic for you. I think in an open source project it should be decided by the community which platforms are considered main (or primary) and which are considered secondary platforms.

I think this change warrants a much wider and deeper discussion about the purpose of the Github commit hook actions, how the free Github CPU quota should be spent, and on whom the responsibility lays to either make sure no. commit is pushed that breaks an exotic platform, or to rectify the problem with a follow-up issue.

I agree that we should we frugal with resources, especially if these resources are being offered to us at no charge. But from my understanding, the resources for GitHub actions are accounted on a personal account base for every developer and not to the upstream project. So every developer who clones OpenJDK and submits changes to his fork will use his personal "GitHub Actions" budget. According to the documentation and @rwestberg's explanations on skara-dev this budget contains 20 parallel jobs (with up to 6 hours runtime) 24/7. Even with Aleksey's changes I think we're well in this budget.

Are you afraid tha GitHub/Microsoft will cut down this free-tier in general or for for OpenJDK users? If that's the case we should probably try to get an offical statement from GitHub/Microsoft on this topic before unnecessarily restricting ourselves.

As Aleksey explained, the additional builds should still finish before the corresponding Windows/MacOS builds and tests terminate, so there shouldn't be any usability issue either.

I recommend you withdraw this PR and instead initiate a wider discussion on a suitable mailing list, e.g. jdk-dev.

I really like this change and I'd appreciate if we could get it merged. I'm of course always open to discussion on details (e.g. if the new biuilds/tests should be mandatory or. optional, how to notify port maintainers, etc...).

magicus · 2020-11-13T09:44:43Z

The change to only build hotspot is good. It alleviates some of my concern over this patch.

I still think there is a wide difference in how different contributors view these Github Actions submit hooks. Some would never consider to push anything that is not green everywhere, while you think it could be okay to push with broken platform if it takes an unreasonable amount of time to fix.

I do think we need to align ourself as a community on how to relate to these submit hook tests. Perhaps getting some ground rules down in the new Developer's Guide, if we can agree on such ground rules. (It's not like there's an old history of oral tradition on how to do things that needs to be written down like the rest of the Guide, it's more like we decide now on how to deal with them.)

Finally, I'm worried about reproducibility and debuggability of the Github Actions. We've seen that Github Actions can be updated right under our feet, and have already had trouble with that. And if the logs are not enough, there is no simple way to reproduce the build environment locally do examine the situation. But these are general concerns about the Github Actions themselves, and not specifically aimed at this patch. And I've initiated an off-list discussion on how we can try to address some of these issues.

openjdk · 2020-11-13T09:45:21Z

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8256127: Add cross-compiled foreign architectures builds to submit workflow

Reviewed-by: ihse, rwestberg

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

c8dd0b5: 8256320: ZGC: Update zDebug to support UseCompressedClassPointers
5973e91: 8253005: Add @throws IOException in javadoc for HttpEchange.sendResponseHeaders
8c31bd2: 8256275: Optimized build is broken
b0c28fa: 8256011: Shenandoah: Don't resurrect finalizably reachable objects
41139e3: 8255964: Add all details to jstack log in jtreg timeout handler

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

rwestberg

Looks good! I did a similar attempt at cross building a while ago, but never got around to finishing it, so it's nice to see it materializing! I do have a general comment on reducing the amount of duplicated content though. Since all these cross-build platforms share the same prerequisites, they can be expressed as matrix builds. Here's what I did: https://github.com/rwestberg/jdk/blob/947c934621c3013c055152356615e0120382cedf/.github/workflows/submit.yml#L102

You'd have to adjust the details obviously, but I think it could help with future maintainability.

Another minor comment is that it may be faster to use http://debian-archive.trafficmanager.net/debian/ instead of http://httpredir.debian.org/debian/ (the former is Azure-specific but I don't think it's part of the list that the latter uses). But since it will be cached after first use it probably doesn't matter much.

shipilev · 2020-11-13T10:54:01Z

Looks good! I did a similar attempt at cross building a while ago, but never got around to finishing it, so it's nice to see it materializing! I do have a general comment on reducing the amount of duplicated content though. Since all these cross-build platforms share the same prerequisites, they can be expressed as matrix builds. Here's what I did: https://github.com/rwestberg/jdk/blob/947c934621c3013c055152356615e0120382cedf/.github/workflows/submit.yml#L102

Right. AFAIU your code, it bootstraps the chroot and builds x86_64 build JDK for every config and every run, something this PR is able to avoid. We'd need to figure that out. I think that is pretty doable, but it would require a few days worth of pipeline testing to work out the kinks. So, would you mind we do that in the follow-ups?

rwestberg · 2020-11-13T11:04:38Z

Yeah, it would certainly need a bit of adaptation to capture the differences properly, so perfectly fine to look into later!

shipilev · 2020-11-13T11:14:42Z

Thanks folks! I'll merge the master and see if it is still green, and if it is, I'll integrate.

shipilev · 2020-11-13T12:36:13Z

Cross-compiled builds still look green. Integrating.

/integrate

openjdk · 2020-11-13T12:37:23Z

@shipilev Since your change was applied there have been 5 commits pushed to the master branch:

c8dd0b5: 8256320: ZGC: Update zDebug to support UseCompressedClassPointers
5973e91: 8253005: Add @throws IOException in javadoc for HttpEchange.sendResponseHeaders
8c31bd2: 8256275: Optimized build is broken
b0c28fa: 8256011: Shenandoah: Don't resurrect finalizably reachable objects
41139e3: 8255964: Add all details to jstack log in jtreg timeout handler

Your commit was automatically rebased without conflicts.

Pushed as commit e9956fe.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

…rkflow Reviewed-by: ihse, rwestberg

This adds the cross-compiled build only, as no Windows+Arm64 machines are available on GitHub Action that we could use to run the tests. Due to cross-compilation a build JDK is required. Initially I added EA builds to be downloaded from https://jdk.java.net/16/ and used for that, but then I saw how @shipiliv attempted it for the linux cross-compilation builds in openjdk#1147. That is using the JDK image produced by the x64 variant. This however add more stress to the "critical path", as now two more jobs depend on the x64 build first. Let's see how it works out in the long-run. A Windows+AArch64 build takes 40-50min.

openjdk bot added the build build-dev@openjdk.org label Nov 10, 2020

shipilev force-pushed the JDK-8256127-foreign-build branch from c8b17b8 to fe71198 Compare November 11, 2020 07:51

8256127: Add cross-compiled foreign architectures builds to submit wo…

bed8d47

…rkflow

shipilev force-pushed the JDK-8256127-foreign-build branch from 5016df7 to bed8d47 Compare November 11, 2020 10:49

Remove special directories from chroot to make it archivable

98cbf2f

shipilev marked this pull request as ready for review November 11, 2020 13:48

openjdk bot added the rfr Pull request is ready for review label Nov 11, 2020

shipilev added 4 commits November 11, 2020 17:11

Merge branch 'master' into JDK-8256127-foreign-build

c69b529

Incomplete fix: remove special dirs

eb6b10a

Merge branch 'master' into JDK-8256127-foreign-build

f0170df

Merge branch 'master' into JDK-8256127-foreign-build

8509843

magicus suggested changes Nov 11, 2020

View reviewed changes

Build only hotspot reduced build

81e07fe

magicus approved these changes Nov 13, 2020

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Nov 13, 2020

rwestberg approved these changes Nov 13, 2020

View reviewed changes

Merge branch 'master' into JDK-8256127-foreign-build

b0dfab5

openjdk bot closed this Nov 13, 2020

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 13, 2020

openjdk-notifier bot referenced this pull request Nov 13, 2020

8256127: Add cross-compiled foreign architectures builds to submit wo…

e9956fe

…rkflow Reviewed-by: ihse, rwestberg

shipilev deleted the JDK-8256127-foreign-build branch November 13, 2020 12:56

lewurm mentioned this pull request Nov 23, 2020

8256657: Add cross-compiled build for Windows+Arm64 to submit workflow #1379

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8256127: Add cross-compiled foreign architectures builds to submit workflow #1147

8256127: Add cross-compiled foreign architectures builds to submit workflow #1147

shipilev commented Nov 10, 2020 •

edited by openjdk bot

Loading

bridgekeeper bot commented Nov 10, 2020

openjdk bot commented Nov 10, 2020

mlbridge bot commented Nov 11, 2020 •

edited

Loading

TheRealMDoerr commented Nov 11, 2020

magicus left a comment

TheRealMDoerr commented Nov 11, 2020

shipilev commented Nov 12, 2020

simonis commented Nov 12, 2020

magicus commented Nov 13, 2020

openjdk bot commented Nov 13, 2020 •

edited

Loading

rwestberg left a comment

shipilev commented Nov 13, 2020

rwestberg commented Nov 13, 2020

shipilev commented Nov 13, 2020

shipilev commented Nov 13, 2020

openjdk bot commented Nov 13, 2020

8256127: Add cross-compiled foreign architectures builds to submit workflow #1147

8256127: Add cross-compiled foreign architectures builds to submit workflow #1147

Conversation

shipilev commented Nov 10, 2020 • edited by openjdk bot Loading

Progress

Testing

Issue

Reviewers

Download

bridgekeeper bot commented Nov 10, 2020

openjdk bot commented Nov 10, 2020

mlbridge bot commented Nov 11, 2020 • edited Loading

Webrevs

TheRealMDoerr commented Nov 11, 2020

magicus left a comment

Choose a reason for hiding this comment

TheRealMDoerr commented Nov 11, 2020

shipilev commented Nov 12, 2020

simonis commented Nov 12, 2020

magicus commented Nov 13, 2020

openjdk bot commented Nov 13, 2020 • edited Loading

rwestberg left a comment

Choose a reason for hiding this comment

shipilev commented Nov 13, 2020

rwestberg commented Nov 13, 2020

shipilev commented Nov 13, 2020

shipilev commented Nov 13, 2020

openjdk bot commented Nov 13, 2020

shipilev commented Nov 10, 2020 •

edited by openjdk bot

Loading

mlbridge bot commented Nov 11, 2020 •

edited

Loading

openjdk bot commented Nov 13, 2020 •

edited

Loading