-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GitHub actions #11183
Refactor GitHub actions #11183
Conversation
Thanks for grasping this nettle, @Engil! Extra commits pushed to:
A review of the other parts to follow 🙂 |
|
@dra27 thank you for your additions, I will review the changes soon. :) |
1b38305
to
e54bd38
Compare
The credential store won't have been configured when the artefact is checked out, so disable the header override before uploading it.
This reverts commit bbbe9e5.
Especially now that it's in a separate build stage.
Interferes with the clean working directory other checks.
e54bd38
to
4cf63f0
Compare
As the four most recent commits testify, we really ought to get this merged 🙈 @Engil - the notes I'd had were:
However, this PR is most definitely a step-wise improvement over the current pipelines so I suggest that if you're happy with the commits I've pushed and as I'm happy with your part of the PR that we go ahead and merge and iterate in another PR? |
Your latest additions LGTM, I think this is a long overdue improvement and we need to build upon this. |
Ta - I'll merge it when CI returns (🤞) and cherry-pick to 5.0 |
Refactor GitHub actions (cherry picked from commit 3ad1567)
Am I reading correctly that the MacOs job is running without flambda, and thus we have at least one job without flambda enabled? |
Yes, that's correct - the Windows build is also done without flambda (as on 4.14) |
Thanks for the confirmation! |
I'd rather have most CI jobs without flambda and one job with flambda, if only because non-flambda builds take less time.
If |
It should be easy to add a |
Yes, we have lost a little bit of time because the debug-runtime testsuite is now being run with an flambda compiler. We could remove the @ctk21 has been a fan of the |
Speaking of the manual build: I'm now getting failures with the "normal" job when it tries to build the manual, see e.g. |
@xavierleroy - I'm 99% certain that what happened here was that the PR was merged before the full pipeline had finished, as the timestamp on the job start for normal and both the testsuite runs is after the PR's merging. That meant that when the "normal" job tried to fetch the additional commits needed to analyse the repo, the I'll open a PR to move that analysis to the "build" job and then have it communicate with output variables, but I don't think it's that testing the manual's build is itself fundamentally broken. |
Would it be possible to just skip the test in this case, rather than marking it as a failure? It's not that something wrong was found in the manual, just that the test could not be run because of circumstances. Whatever solution you choose, I would appreciate no longer receiving "PR run failed: Build" e-mails from Github, which I have to investigate every time, just to see that the problem is with the test of the manual. Thank you for your understanding. |
Refactoring and cleaning up Github Actions
This PR is a follow up to #10980
The Multicore merge greatly altered the workflow of Github Actions on ocaml/ocaml.
Multicore OCaml diverged greatly from it on multiple occasion during its lifespan.
This PR goals are manyfold:
full-flambda
run, as well as theother-checks
runner pass.)ocaml/ocaml
are worthwhile.This PR itself is a bit lengthy, a lost of meaning was lost in the history tree, and it ended up being squashed.
I think the better way to review it would be to compare this tree to the
4.14
equivalent. @dra27 may be interested by this PR.Cleanup and refactor
This PR introduces a few simplification:
super
testsuite run: I have never been convinced of the usefulness of this run and the code added torunner.sh
specifically to support this usecase is quite ugly.Historically, this run was introduced to run multiple time the
parallel
testsuite and catch possibly intermittent failures, in practice I do not think this is valuable as if such failures exists they will be catched eventually in another run on either CI system.TestLoop
was simplified toTestPrefix
inrunner.sh
.TestLoop
is not needed in its current form sincesuper
is gone.TestPrefix
also opts to run tests in parallel (to follow suit withTest
).build.yml
has now four main jobs:make install
, runsother-checks
, check for changes in the manual.macos
: Compile OCaml and run the full testsuite on MacOSlinux-O0
: Compile OCaml withCFLAGS='-O0'
and run a selection of test directories.debug
: Does a full run of the testsuite in with the debug runtime.debug-s4096
: Runs a selection of test directories with the debug runtime and a minor heap of 4096 words.normal
,debug
,debug-s4096
"jobs" all share the same compiler build.This build is compiled during the initial
build
job. The freshly built directory is then uploaded as a build artifact to be reused by the aforementioned steps.To be noted: the
macos
andlinux-O0
sub-jobs cannot reuse the compiler built during thebuild
job, because they either run on a different OS, or rely specific on differentconfigure
parameters when building the OCaml distribution.Bring back the features lost after the merge
The
normal
job is a mirror copy of what the GHA runs do on the 4.14 branch.One major difference from previous Multicore runs: as for
4.X
, the "main" testsuite run is now a full flambda run.The same applies for every subsequent jobs reusing the compiler artifact built during the
build
job.One omission from this, is that I did not reimport the
i386-static
run in the workflow: is it something desireable?Discussions
The current draft is an attempt at merging what used to be done in trunk before the merge and the various tweaks introduced by the Multicore team.
The both
debug
runs proved to be especially useful to the team when developping the Multicore runtime.We tried to strike a nice balance in running time by only running a few select directories (that historically proved useful for us to insist on.) when adequate.
I think the removal of the
super
job (which aimed to re-run select directories three times without further adjustements) is fine and spare us from burning more CPU cycles.Two runs I think are worth discussing about:
taskset-c0
(which was left out of this PR) was used to catch some synchronization issues within the Multicore runtime. It does not involve a full build of the compiler (as it can reuse the cached build of thenormal
testsuite run.), but proves to be problematic in some instances.parallel/pingpong.ml
is for instance very very slow in this run.(which, looking at the testcase makes sense to me). Do we want to bring it back?
linux-O0
(which was left in this PR) caught a few times some issues in the Multicore runtime, catching some problems when optimizations were removed. I could not trace back such scenarios. It involves a full, separate compiler build, as well as running some parts of the testsuite (a full run would be prohibitively slow.)One more thing worth pondering upon, the Github Actions runner does not use
OCAML_TEST_SIZE
to provide hints to the runners on the number of cores available: Should this be added as well?