Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8256127: Add cross-compiled foreign architectures builds to submit workflow #1147

Closed
wants to merge 8 commits into from

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Nov 10, 2020

It is possible to efficiently cross-compile to foreign architectures on current GH actions that are driven by Ubuntu. I have been using this method for years to produce binaries at builds.shipilev.net.

These cross-compilation targets frequently find arch-specific build bugs. Hotspot is rather well insulated from the fundamental problems in arch-support, so the overwhelming majority of those arch-specific bugs are simple omissions of #include-s, inconsistent #ifdef-ing, missed method renames / signature changes, or incorrect removals of methods that are used by arch-specific code. Those are straightforward to fix, if the contributor knows about them. My experience says the common attitude to these fixes is: "Oh, I would have fixed it in the original change if I knew about this."

This improvement adds several foreign architectures to GH actions workflow to provide the automatic safety net to give early warning for these cases. In fact, many Hotspot contributors already build AArch64, ARM, PPC64 targets as pre-integration sanity checks, and this does the important checks automatically for them.

To optimize workflow costs, the change does the following:

  • The build is only done for hotspot-debug-no-pch, as the most frequent place where arch-specific build bugs show up.
  • There are no tests, and in fact the arch-specific binary does not even run. The build is purely about the build.
  • Foreign architectures reuse the Linux x86_64 release build as build JDK. This avoids building "host" build JDK as it would otherwise happen with cross-compiling.
  • The created sysroot is cached and keyed on submit.yml hash. So it would regenerate only if the workflow itself changes. This saves about 10 minutes per arch.

Space-wise, sysroots in GH cache take about 580M uncompressed, and 270M zstd-compressed bundle. In the end, this adds 4x270 = 1080M into local cache.

Time-wise, ball-parking the current workflow budget looks like this:

  • Linux x86_64:
    • builds: 120m
    • tests: 200m
  • Linux x86_32:
    • builds: 75m
    • tests: 235m
  • Windows x86_64
    • builds: 120m
    • tests: 290m
  • MacOS x86_64
    • builds: 75m
    • tests: 165m
  • Linux aarch64:
    • builds: 25m (+10m to create uncached sysroot)
  • Linux arm:
    • builds: 20m (+10m to create uncached sysroot)
  • Linux ppc64le:
    • builds: 20m (+15m to create uncached sysroot)
  • Linux s390x:
    • builds: 20m (+10m to create uncached sysroot)

In other words, new workflow takes about 715 Linux-host-minutes, 410 Windows-host-minutes, 240 Mac-host-minutes. Out of which new jobs take about 85 Linux-host-minutes. So the cost of new jobs is roughly:

In other words, the additional runner costs are pale in comparison with what is already done in build and test jobs.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux aarch64 Linux arm Linux ppc64le Linux s390x Linux x64 Linux x86 Windows x64 macOS x64
Build ✔️ (1/1 passed) ✔️ (1/1 passed) ✔️ (1/1 passed) ✔️ (1/1 passed) ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ⏳ (5/9 running) ⏳ (3/9 running) ⏳ (8/9 running) ⏳ (1/9 running)

Issue

  • JDK-8256127: Add cross-compiled foreign architectures builds to submit workflow

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1147/head:pull/1147
$ git checkout pull/1147

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 10, 2020

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 10, 2020

@shipilev The following label will be automatically applied to this pull request:

  • build

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the build build-dev@openjdk.org label Nov 10, 2020
@shipilev shipilev marked this pull request as ready for review November 11, 2020 13:48
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 11, 2020
@mlbridge
Copy link

mlbridge bot commented Nov 11, 2020

Webrevs

@TheRealMDoerr
Copy link
Contributor

Awesome! Thanks a lot for doing this!

Copy link
Member

@magicus magicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aleksey,

I'm not at all sure I want to accept these changes. I understand that you care about these exotic platforms, and so do I, but adding them to the submit workflow for all changes seem like the wrong way to proceed.

My concern vary all the way from increased build time, to increased usage of the Github CPU quota (which, as I understand it, is fairly large but not unlimited), to the elephant in the room: the question on who the burden lays to fix bugs in these exotic platforms. It is not at all obvious that these builds should be tested before commit.

I think this change warrants a much wider and deeper discussion about the purpose of the Github commit hook actions, how the free Github CPU quota should be spent, and on whom the responsibility lays to either make sure no. commit is pushed that breaks an exotic platform, or to rectify the problem with a follow-up issue.

I recommend you withdraw this PR and instead initiate a wider discussion on a suitable mailing list, e.g. jdk-dev.

@TheRealMDoerr
Copy link
Contributor

Maybe it would be possible to build hotspot only as this is the part where things often break? This would save a lot of CPU time.
E.g. I think it'd be valueable to have at least one Big Endian platform included.
I noticed that some Oracle people use cross builds to check their changes on these platforms. It would save a bit of their time as well.

@shipilev
Copy link
Member Author

I'm not at all sure I want to accept these changes. I understand that you care about these exotic platforms, and so do I, but adding them to the submit workflow for all changes seem like the wrong way to proceed.

I understand the first reaction, but please read the updated description. It includes the discussion what bugs it tries to catch, and updated cost estimates.

Submit workflow is a convenience thing for better testing coverage. Having the early warnings for fixable issues in foreign architectures helps to avoid follow-up churn without increasing the costs all too much. Do note that it does not require contributors to fix the runtime issues in foreign architectures, it only asks for passing a rather low bar of not breaking the builds for them, which is, in my experience, an easily achievable goal. It is easily achievable because -- I am speaking from experience of fixing a lot of them over the years! -- the overwhelming majority of arch-specific build breakages are about silly things that are obvious from the build logs.

So I am leaving this PR open, letting others to chime in. If there is really a need for a wider discussion here, I could start a thread at jdk-dev@, but so far it does not look as a good use of our collective time. Please read through the description, and see if that soothes your concerns.

Aside: the issues about costs were raised before, but I think all of them resolve as "enjoy your free tier". If we target to optimize the GH action costs, IMO we should instead consider to avoid triggering them on every push, as I hinted before, and which I think requires better Skara integration (i.e. for /test command). Meanwhile, this PR gives us extended build coverage at fraction of the additional cost.

@simonis
Copy link
Member

simonis commented Nov 12, 2020

I think this is a really useful feature and I'm a little surprised that it gets rejected. Please find my comments inline.

Aleksey,

I'm not at all sure I want to accept these changes. I understand that you care about these exotic platforms, and so do I, but adding them to the submit workflow for all changes seem like the wrong way to proceed.

My concern vary all the way from increased build time, to increased usage of the Github CPU quota (which, as I understand it, is fairly large but not unlimited), to the elephant in the room: the question on who the burden lays to fix bugs in these exotic platforms. It is not at all obvious that these builds should be tested before commit.

I think "on who's the burden to fix issues" on exotic platforms and "testing changes on all platforms" at commit time are two totally distinct questions. While I agree that problems on a exotic platform shouldn't block reviewed changes for an unreasonable amount of time, I also think it is very valuable for both, authors of changes and maintainers of exotic platforms to at least get early notifications of problems on other platforms.

My experience as a maintainer of the AIX7PowerPC/s390 ports has always been very positive with regards to the collaboration with authors of changes which impact these ports. Authors are anxious to not break other platforms but usually don't have the possibility to test on these platforms. Aleksey's change comes in very handy here. It will help authors to fix trivial build problems right in their initial submission which I'm sure they'll appreciate. For more complex problems it gives Authors the possibility to notify port maintainers early, before a change gets pushed. I'm sure this alerting could even be automatized in the future if builds/tests fail on certain platforms.

Of course such problems shouldn't block changes which are otherwise reviewed and ready to push for an unreasonable amount of time, so I have nothing against making the test on such platform optional submit requirements.

But overall I think Aleksey's enhancement are a great means to improve the development experience and to keep the OpenJDK repos clean of trivial follow-up fixes.

PS: I specially emphasized exotic platforms because it seems to me that your comment implies that all not Oracle-supported platforms are considered exotic for you. I think in an open source project it should be decided by the community which platforms are considered main (or primary) and which are considered secondary platforms.

I think this change warrants a much wider and deeper discussion about the purpose of the Github commit hook actions, how the free Github CPU quota should be spent, and on whom the responsibility lays to either make sure no. commit is pushed that breaks an exotic platform, or to rectify the problem with a follow-up issue.

I agree that we should we frugal with resources, especially if these resources are being offered to us at no charge. But from my understanding, the resources for GitHub actions are accounted on a personal account base for every developer and not to the upstream project. So every developer who clones OpenJDK and submits changes to his fork will use his personal "GitHub Actions" budget. According to the documentation and @rwestberg's explanations on skara-dev this budget contains 20 parallel jobs (with up to 6 hours runtime) 24/7. Even with Aleksey's changes I think we're well in this budget.

Are you afraid tha GitHub/Microsoft will cut down this free-tier in general or for for OpenJDK users? If that's the case we should probably try to get an offical statement from GitHub/Microsoft on this topic before unnecessarily restricting ourselves.

As Aleksey explained, the additional builds should still finish before the corresponding Windows/MacOS builds and tests terminate, so there shouldn't be any usability issue either.

I recommend you withdraw this PR and instead initiate a wider discussion on a suitable mailing list, e.g. jdk-dev.

I really like this change and I'd appreciate if we could get it merged. I'm of course always open to discussion on details (e.g. if the new biuilds/tests should be mandatory or. optional, how to notify port maintainers, etc...).

@magicus
Copy link
Member

magicus commented Nov 13, 2020

The change to only build hotspot is good. It alleviates some of my concern over this patch.

I still think there is a wide difference in how different contributors view these Github Actions submit hooks. Some would never consider to push anything that is not green everywhere, while you think it could be okay to push with broken platform if it takes an unreasonable amount of time to fix.

I do think we need to align ourself as a community on how to relate to these submit hook tests. Perhaps getting some ground rules down in the new Developer's Guide, if we can agree on such ground rules. (It's not like there's an old history of oral tradition on how to do things that needs to be written down like the rest of the Guide, it's more like we decide now on how to deal with them.)

Finally, I'm worried about reproducibility and debuggability of the Github Actions. We've seen that Github Actions can be updated right under our feet, and have already had trouble with that. And if the logs are not enough, there is no simple way to reproduce the build environment locally do examine the situation. But these are general concerns about the Github Actions themselves, and not specifically aimed at this patch. And I've initiated an off-list discussion on how we can try to address some of these issues.

@openjdk
Copy link

openjdk bot commented Nov 13, 2020

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8256127: Add cross-compiled foreign architectures builds to submit workflow

Reviewed-by: ihse, rwestberg

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

  • c8dd0b5: 8256320: ZGC: Update zDebug to support UseCompressedClassPointers
  • 5973e91: 8253005: Add @throws IOException in javadoc for HttpEchange.sendResponseHeaders
  • 8c31bd2: 8256275: Optimized build is broken
  • b0c28fa: 8256011: Shenandoah: Don't resurrect finalizably reachable objects
  • 41139e3: 8255964: Add all details to jstack log in jtreg timeout handler

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 13, 2020
Copy link
Member

@rwestberg rwestberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I did a similar attempt at cross building a while ago, but never got around to finishing it, so it's nice to see it materializing! I do have a general comment on reducing the amount of duplicated content though. Since all these cross-build platforms share the same prerequisites, they can be expressed as matrix builds. Here's what I did: https://github.com/rwestberg/jdk/blob/947c934621c3013c055152356615e0120382cedf/.github/workflows/submit.yml#L102

You'd have to adjust the details obviously, but I think it could help with future maintainability.

Another minor comment is that it may be faster to use http://debian-archive.trafficmanager.net/debian/ instead of http://httpredir.debian.org/debian/ (the former is Azure-specific but I don't think it's part of the list that the latter uses). But since it will be cached after first use it probably doesn't matter much.

@shipilev
Copy link
Member Author

Looks good! I did a similar attempt at cross building a while ago, but never got around to finishing it, so it's nice to see it materializing! I do have a general comment on reducing the amount of duplicated content though. Since all these cross-build platforms share the same prerequisites, they can be expressed as matrix builds. Here's what I did: https://github.com/rwestberg/jdk/blob/947c934621c3013c055152356615e0120382cedf/.github/workflows/submit.yml#L102

Right. AFAIU your code, it bootstraps the chroot and builds x86_64 build JDK for every config and every run, something this PR is able to avoid. We'd need to figure that out. I think that is pretty doable, but it would require a few days worth of pipeline testing to work out the kinks. So, would you mind we do that in the follow-ups?

@rwestberg
Copy link
Member

Yeah, it would certainly need a bit of adaptation to capture the differences properly, so perfectly fine to look into later!

@shipilev
Copy link
Member Author

Thanks folks! I'll merge the master and see if it is still green, and if it is, I'll integrate.

@shipilev
Copy link
Member Author

Cross-compiled builds still look green. Integrating.

/integrate

@openjdk openjdk bot closed this Nov 13, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 13, 2020
@openjdk
Copy link

openjdk bot commented Nov 13, 2020

@shipilev Since your change was applied there have been 5 commits pushed to the master branch:

  • c8dd0b5: 8256320: ZGC: Update zDebug to support UseCompressedClassPointers
  • 5973e91: 8253005: Add @throws IOException in javadoc for HttpEchange.sendResponseHeaders
  • 8c31bd2: 8256275: Optimized build is broken
  • b0c28fa: 8256011: Shenandoah: Don't resurrect finalizably reachable objects
  • 41139e3: 8255964: Add all details to jstack log in jtreg timeout handler

Your commit was automatically rebased without conflicts.

Pushed as commit e9956fe.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this pull request Nov 13, 2020
@shipilev shipilev deleted the JDK-8256127-foreign-build branch November 13, 2020 12:56
lewurm added a commit to lewurm/openjdk that referenced this pull request Nov 23, 2020
This adds the cross-compiled build only, as no Windows+Arm64 machines are available on GitHub Action that we could use to run the tests.

Due to cross-compilation a build JDK is required. Initially I added EA builds to be downloaded from https://jdk.java.net/16/ and used for that, but then I saw how @shipiliv attempted it for the linux cross-compilation builds in openjdk#1147.  That is using the JDK image produced by the x64 variant. This however add more stress to the "critical path", as now two more jobs depend on the x64 build first.

Let's see how it works out in the long-run. A Windows+AArch64 build takes 40-50min.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants