[CI] Engine CI Rework, Part 1 #9295

mwu-tow · 2024-03-05T20:33:16Z

Pull Request Description

I have created PR with the first set of changes for the Engine CI. The changes are small and effectively consist of:

Spltting the verifyLicensePackages. It is now run only on Linux. There are hardly any time benefits, as the actual job cost is dominated by the overhead of spinning a new job — but it is not expensive in the big picture.
Splitting the Scala Tests into separate job. This is probably the biggest "atomic" piece of work we have.
Splitting the Standard Library Tests into a separate job.

The time is nicely split across the jobs now. The last run has:

27 min for Scala tests;
25 min for Standard Library tests;
24 min for the "rest": the old job containing everything that has not been split.

While total CPU time has increased (as jobs are not effectively reusing the same build context), the wall time has decreased significantly. Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup.

The now-slowest Scala tests job is currently comparable with the native Rust tests (and they should improve when the old gui is gone) — which are the slowest job across all CI checks.

The PR is pretty minimal. Several future improvements can be made:

Reorganizing and splitting other "heavy" jobs, like the native image generation.
Reusing the built Engine distribution. However, this is probably a lower priority than I initially thought.
- Building package takes several minutes, so duplicating this job is not that expensive.
- The package is OS-specific.
- Scala tests don't really benefit from it, they'd need way more compilation artifacts.
  It'd make sense to reuse the distribution if we, for example, decided to split more jobs that actually benefit from it, like Standard Library tests.
Reusing the Rust build script binary.
- As our self-hosted runners reuse environment, we effectively get this for free. Especially when Rust part of codebase is less frequently changed.
- This is however significant cost for the GitHub-hosted runners, affecting our macOS runners. Reusing the binary does not save wall time for jobs that are run in parallel (as we have enough runners), but if we introduce job dependencies that'd force sequential execution of jobs on macOS, this would be a significant need.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

…seCheck

JaroslavTulach · 2024-03-06T05:28:11Z

Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup.

+1

I see Mac native image failure:

ERROR main_internal: enso_build_cli: error=Failed to get output of the command: Command:
	ENSO_DATA_DIRECTORY="/Users/runner/work/enso/enso/built-distribution/enso-engine-2024.1.1-dev-macos-amd64/enso-2024.1.1-dev" "/Users/runner/work/enso/enso/runner" "--run" "/Users/runner/work/enso/enso/engine/runner/src/test/resources/Factorial.enso" "6"


[WARN] [2024-03-05T21:12:22Z] [org.enso.editions.updater.EditionUpdater] Failed to fetch editions from [https://editions.release.enso.org/enso]: {}
       java.lang.UnsupportedOperationException: Preview Features not enabled, need to run with --enable-preview
       	at java.base@21.0.2/jdk.internal.misc.PreviewFeatures.ensureEnabled(PreviewFeatures.java:49)
       	at java.base@21.0.2/java.lang.Thread.ofVirtualWithoutLoom(Thread.java:680)
       	at java.base@21.0.2/java.util.concurrent.Executors.newVirtualThreadPerTaskExecutor(Executors.java:269)
       	at org.enso.downloader.http.HTTPDownload$.asyncDownload(HTTPDownload.scala:82)
       	at org.enso.downloader.http.HTTPDownload$.fetchString(HTTPDownload.scala:53)

  Exception in thread "main" org.graalvm.polyglot.PolyglotException: org.enso.editions.EditionResolutionError$CannotLoadEdition: Cannot load edition [2024.1.1-dev]: The edition was not found.
       	at org.enso.editions.EditionResolver.$anonfun$resolveParent$1(EditionResolver.scala:157)
       	at scala.util.Either$LeftProjection.map(Either.scala:614)
       	at org.enso.editions.EditionResolver.resolveParent(EditionResolver.scala:157)
       	at org.enso.editions.EditionResolver.resolveEdition(EditionResolver.scala:39)
       	at org.enso.editions.EditionResolver.resolve(EditionResolver.scala:29)
       	at org.enso.editions.updater.EditionManager.resolveEdition(EditionManager.scala:26)

Similar action on Linux is passing without any issues. CCing @Akirathan. I managed to reproduce the problem on my Mac. I am not yet sure why the failure happens, but I am assuming it is some mismatch in installed versions that triggers download which doesn't happen on Linux.

Btw. why is Mac CI only "optional"? It should be passing regularly, should it not?

JaroslavTulach · 2024-03-06T05:29:36Z

Shouldn't Engine CI / Scala Tests (linux, x86_64) and similar jobs be required to pass?

JaroslavTulach

More jobs (especially newly added) should in my opinion be required to pass.

radeusgd · 2024-03-06T11:19:50Z

build/build/src/engine/context.rs

@@ -108,6 +108,7 @@ impl RunContext {

    /// Check that required programs are present (if not, installs them, if supported). Set
    /// environment variables for the build to follow.
+    #[instrument(skip(self))]


what does this mean?

what does this mean?

This introduces a span coupled with the function execution.
Essentialy it logs function start and end, and groups logs from it under that span.

radeusgd · 2024-03-06T11:20:21Z

build/cli/src/arg/backend.rs

+    // /// Run an SBT command.
+    // Run {
+    //     #[clap(last = true)]
+    //     args: Vec<String>,
+    // },


Why is this commented out? Can we just remove that?

Yes, missed that one. Will remove.

build/cli/src/lib.rs

radeusgd · 2024-03-06T11:23:40Z

.github/workflows/scala-new.yml

+  enso-build-ci-gen-job-ci-check-backend-windows-x86_64:
+    name: Engine (windows, x86_64)
+    runs-on:
+      - self-hosted
+      - Windows


It feels like we could have a single job for all platforms and use the matrix to schedule runs on various configurations. But I assume this approach was easier to implement in the Rust generator? If so, that seems totally OK too.

It feels like we could have a single job for all platforms and use the matrix to schedule runs on various configurations. But I assume this approach was easier to implement in the Rust generator? If so, that seems totally OK too.

IIRC it was needed when the generator was first implemented and while we stopped requiring this, there was not enough motivation to get rid of it.

As it is right now, it could probably be replaced with matrix. Having this manually generated gives us more flexibility though, as it allows us e.g. to introduce cross-os dependencies between jobs. E.g. if we wanted Standard Library tests on MacOS to use Linux-built Engine distribution; or package IDE with Linux-generated WASM (while still building it on all supported platforms).

Also, this allows the generator to customize steps depending on os. While I avoid doing this (as I want CI stuff to be as cross-platform as possible), having one more tool in the box is nice.

radeusgd · 2024-03-06T11:24:19Z

.github/workflows/scala-new.yml

+      - self-hosted
+      - Windows
+    steps:
+      - if: startsWith(runner.name, 'GitHub Actions') || startsWith(runner.name, 'Hosted Agent')


btw. when is this condition not true? Sounds like it would be both for GH and self-hosted runners, so when not? Or am I misunderstanding something?

This is true only for Github-hosted runners. Our self-hosted runners follow different naming convention (like "enso-org-engine-demeter-5").

radeusgd

Looks great, I can't wait to be able to see this in action ❤️

As @JaroslavTulach mentioned, I think we have to make sure the newly split-off engine workflows: Scala Tests and Standard Library Tests are made required (excluding macOS, at least until we fix #9216). Otherwise we will start merging PRs with broken tests and that will be bad.

I added some general questions trying to understand the overall CI codebase in the context of these changes.

mwu-tow · 2024-03-06T12:07:10Z

@JaroslavTulach @radeusgd
Good point about adjusting required checks. I will do this after merging this, as this is repository-wide configuration setting. Applying it earlier, would block all other PRs (that are obviously missing the new checks).

radeusgd · 2024-03-06T12:40:54Z

@JaroslavTulach @radeusgd Good point about adjusting required checks. I will do this after merging this, as this is repository-wide configuration setting. Applying it earlier, would block all other PRs (that are obviously missing the new checks).

But won't that mean that for a while we will be able to merge (even by accident) PRs that are not passing the tests (i.e. as soon as I merge in these changes)?

Won't these changes only be in place once the PR with such changes lands? Thus they will start once this PR is merged. Then other pending PRs will just have to merge develop in to get the latest checks. That seems fine.

I don't see reason to do this in two parts, it feels much safer to do 'atomically'.

mwu-tow · 2024-03-06T12:45:12Z

But won't that mean that for a while we will be able to merge (even by accident) PRs that are not passing the tests (i.e. as soon as I merge in these changes)?

Won't these changes only be in place once the PR with such changes lands? Thus they will start once this PR is merged. Then other pending PRs will just have to merge develop in to get the latest checks. That seems fine.

I don't see reason to do this in two parts, it feels much safer to do 'atomically'.

Yes, yes. By "after" I meant "immediately after". I might as well do "immediately before".

…seCheck

I have created PR with the first set of changes for the Engine CI. The changes are small and effectively consist of: 1. Spltting the `verifyLicensePackages`. It is now run only on Linux. There are hardly any time benefits, as the actual job cost is dominated by the overhead of spinning a new job — but it is not expensive in the big picture. 2. Splitting the Scala Tests into separate job. This is probably the biggest "atomic" piece of work we have. 3. Splitting the Standard Library Tests into a separate job. The time is nicely split across the jobs now. The last run has: * 27 min for Scala tests; * 25 min for Standard Library tests; * 24 min for the "rest": the old job containing everything that has not been split. While total CPU time has increased (as jobs are not effectively reusing the same build context), the wall time has decreased significantly. Previously we had ~1 hour of wall time for the old monolithic job, so we are getting more than 2x speedup. The now-slowest Scala tests job is currently comparable with the native Rust tests (and they should improve when the old gui is gone) — which are the slowest job across all CI checks. The PR is pretty minimal. Several future improvements can be made: * Reorganizing and splitting other "heavy" jobs, like the native image generation. * Reusing the built Engine distribution. However, this is probably a lower priority than I initially thought. * Building package takes several minutes, so duplicating this job is not that expensive. * The package is OS-specific. * Scala tests don't really benefit from it, they'd need way more compilation artifacts. It'd make sense to reuse the distribution if we, for example, decided to split more jobs that actually benefit from it, like Standard Library tests. * Reusing the Rust build script binary. * As our self-hosted runners reuse environment, we effectively get this for free. Especially when Rust part of codebase is less frequently changed. * This is however significant cost for the GitHub-hosted runners, affecting our macOS runners. Reusing the binary does not save wall time for jobs that are run in parallel (as we have enough runners), but if we introduce job dependencies that'd force sequential execution of jobs on macOS, this would be a significant need.

mwu-tow added 8 commits February 28, 2024 22:17

experiments

6998b85

experiments

3b2cae0

experiments - splitting test

863d798

Merge remote-tracking branch 'origin/develop' into wip/mwu/splitLicen…

7a72b31

…seCheck

moving test reporters

8537509

check license only on linux

ef469c1

Merge remote-tracking branch 'origin/develop' into wip/mwu/splitLicen…

38dd0fd

…seCheck

post-merge refresh

2b229e7

mwu-tow self-assigned this Mar 5, 2024

mwu-tow added the CI: No changelog needed Do not require a changelog entry for this PR. label Mar 5, 2024

mwu-tow marked this pull request as ready for review March 5, 2024 21:40

mwu-tow requested review from hubertp, radeusgd, farmaazon and JaroslavTulach March 5, 2024 21:41

JaroslavTulach approved these changes Mar 6, 2024

View reviewed changes

radeusgd reviewed Mar 6, 2024

View reviewed changes

build/cli/src/lib.rs Show resolved Hide resolved

radeusgd reviewed Mar 6, 2024

View reviewed changes

radeusgd approved these changes Mar 6, 2024

View reviewed changes

farmaazon approved these changes Mar 6, 2024

View reviewed changes

mwu-tow added 2 commits March 6, 2024 17:43

Native runner requires engine distribution.

26339fe

Remove commented-out code.

fe5d244

Merge remote-tracking branch 'origin/develop' into wip/mwu/splitLicen…

19961e8

…seCheck

mwu-tow added the CI: Ready to merge This PR is eligible for automatic merge label Mar 6, 2024

mergify bot merged commit e930738 into develop Mar 6, 2024
33 of 36 checks passed

mergify bot deleted the wip/mwu/splitLicenseCheck branch March 6, 2024 18:56

enso-bot bot mentioned this pull request Mar 7, 2024

Prefix autoscoped constructors with .. #9275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Engine CI Rework, Part 1 #9295

[CI] Engine CI Rework, Part 1 #9295

mwu-tow commented Mar 5, 2024 •

edited

Loading

JaroslavTulach commented Mar 6, 2024 •

edited

Loading

JaroslavTulach commented Mar 6, 2024 •

edited

Loading

JaroslavTulach left a comment

radeusgd Mar 6, 2024

mwu-tow Mar 6, 2024

radeusgd Mar 6, 2024

mwu-tow Mar 6, 2024

radeusgd Mar 6, 2024

mwu-tow Mar 6, 2024

radeusgd Mar 6, 2024

mwu-tow Mar 6, 2024 •

edited

Loading

radeusgd left a comment

mwu-tow commented Mar 6, 2024

radeusgd commented Mar 6, 2024

mwu-tow commented Mar 6, 2024 •

edited

Loading

[CI] Engine CI Rework, Part 1 #9295

[CI] Engine CI Rework, Part 1 #9295

Conversation

mwu-tow commented Mar 5, 2024 • edited Loading

Pull Request Description

Important Notes

Checklist

JaroslavTulach commented Mar 6, 2024 • edited Loading

JaroslavTulach commented Mar 6, 2024 • edited Loading

JaroslavTulach left a comment

Choose a reason for hiding this comment

radeusgd Mar 6, 2024

Choose a reason for hiding this comment

mwu-tow Mar 6, 2024

Choose a reason for hiding this comment

radeusgd Mar 6, 2024

Choose a reason for hiding this comment

mwu-tow Mar 6, 2024

Choose a reason for hiding this comment

radeusgd Mar 6, 2024

Choose a reason for hiding this comment

mwu-tow Mar 6, 2024

Choose a reason for hiding this comment

radeusgd Mar 6, 2024

Choose a reason for hiding this comment

mwu-tow Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

mwu-tow commented Mar 6, 2024

radeusgd commented Mar 6, 2024

mwu-tow commented Mar 6, 2024 • edited Loading

mwu-tow commented Mar 5, 2024 •

edited

Loading

JaroslavTulach commented Mar 6, 2024 •

edited

Loading

JaroslavTulach commented Mar 6, 2024 •

edited

Loading

mwu-tow Mar 6, 2024 •

edited

Loading

mwu-tow commented Mar 6, 2024 •

edited

Loading