Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement build system using self-hosted aarch64 runners, GitHub needs jobs feature and reusable workflows #1703

Merged
merged 137 commits into from
Jul 5, 2022

Conversation

mathbunnyru
Copy link
Member

@mathbunnyru mathbunnyru commented May 12, 2022

I've decided to implement our build system from scratch.

Ideas, that came to my mind:

  • one job in a workflow only does one essential job, only one platform
  • so, we will have lots of jobs, like build-x86-minimal-notebook, but we can easily write dependencies between them
  • share everything as GitHub artifacts between these jobs
  • amd64 and aarch64 are almost independent
  • do not even try to merge tags between amd64 and aarch64, let's add the aarch64- prefix for all arm builds

Implementation details:

  • I rely heavily on reusable workflows. This way there is almost zero code duplication
  • The duplication is only needed in the main docker.yml workflow. This file can be seen as a simple config file of build dependencies
  • No build system is actually needed because we will be relying on the needs feature of GitHub
  • Heavily rely on well-made GitHub workflows actions/upload-artifact and actions/download-artifact to pass the image between jobs

TODO:

  • aarch64: self-hosted runners
  • self-hosted runners: easy setup docs
  • adapt taggers to support aarch64 prefix
  • adapt manifest creation for the new build system
  • delete buildx Makefile parts (or event the Makefile itself)
  • add all images to the main workflow
  • fix documentation where it mentions aarch64 or make commands
  • test, that only fresh images are used - add an empty file to the image and check, that it persists in every image built

Fix: #1407
Fix: #1530
Fix: #1402
Fix: #1401
Fix: #1203
Supersedes: #1631
Fix is no longer needed for: #1539

Upsides:

  • (all the issues above)
  • no need to use Makefile
  • no sudo rm -rf to maximise space, because we will have a small amount of images in the local cache (hopefully)
  • the summary of jobs shows a nice dependency graph
  • aarch64 are first-class images now, there are no special hacks for building them (except for native runners and sharing state in VM)
  • adding new platforms is theoretically possible now, but we will have to have native runners for them as well (in practice - do not expect new platforms)
  • no need for a separate amd64 workflow
  • easily adaptable for GitHub native aarch64 runners when they will be available
  • no need to manually create sections to make GitHub steps look good - I think it will be easier to jump right into the error if one occurs
  • QEMU aarch64 bug fix removal (no longer needed), which will allow faster mamba execution on multi-CPU machines

Downsides:

  • Actions in self-hosted runners are run in VMs environment: this means worse security and some work to clean up the environment.
  • in theory should be extremely fast, because everything that can be run in parallel, will be. The overhead of using a clean environment is small. But docker save/load and uploading/downloading artifacts is quite slow.
  • maintenance self-hosted GitHub runners
  • we will have an aarch64- prefix for every arm tag
  • local experience will be a bit different from the remote one, I don't want to support docker buildx and multi-plaform in Makefile
  • to add a new core image, more lines of changes will be required. But since we haven't added new images for many years, this is not a problem.

@mathbunnyru mathbunnyru marked this pull request as draft May 12, 2022 10:57
@mathbunnyru mathbunnyru marked this pull request as ready for review May 12, 2022 13:01
@mathbunnyru mathbunnyru marked this pull request as draft May 12, 2022 13:01
@mathbunnyru
Copy link
Member Author

@manics I forgot to mention you initially.
Please, if you have some time, review this PR :)

@mathbunnyru
Copy link
Member Author

I gave a bit more thought and I'm going to reconfigure my runners - I'm going to have two runners with 2 CPU each.
Some things are working faster, when there are several cores - extraction in mamba, tests running, (maybe) docker save.
So, we should see some speed improvement for building aarch64 images.

@mathbunnyru mathbunnyru closed this Jul 5, 2022
@mathbunnyru mathbunnyru reopened this Jul 5, 2022
@mathbunnyru
Copy link
Member Author

I changed this option to require approval for all outside collaborators.
This is a simple measure to make self-hosted runners less vulnerable to arbitrary code execution.

Снимок экрана 2022-07-05 в 11 12 00

@mathbunnyru
Copy link
Member Author

It was with 3 runners with 1 CPU: 1h 38m 19s
Now it's with 2 runners with 2 CPU: 1h 21m 29s

The best we can get with this scheme is around 1h 11m.
To do this, we will have to have 6 self-hosted aarch64 runners.

This is actually something we might want to have, because in master branch pushing to registry will also take some time and we have only 2 runners, which means we're not doing it the best way we can.

@mathbunnyru mathbunnyru changed the title Implement build system using GitHub needs jobs feature and reusable workflows Implement build system using self-hosted aarch64 runners, GitHub needs jobs feature and reusable workflows Jul 5, 2022
@benz0li
Copy link
Contributor

benz0li commented Jul 5, 2022

@mathbunnyru Have you thought of running them on a Mac Mini (M1, 2020)?

I am hosting GitLab Runners in a Debian VM on a Mac mini (M1, 2020) to build my linux/arm64/v8 images.
ℹ️ Even the first generation Mx machines outperform everything else out there.

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Jul 5, 2022

@mathbunnyru Have you thought of running them on a Mac Mini (M1, 2020)?

I am hosting GitLab Runners in a Debian VM on a Mac mini (M1, 2020) to build my linux/arm64/v8 images. ℹ️ Even the first generation Mx machines outperform everything else out there.

I thought about running aarch64 builds on M1 Mac, but there is no free M1 Cloud Mac, as far as I know.
I might be wrong here.
You're using your personal Mac Mini, right?

Currently, 2 Oracle VMs with 2 CPUs each will be free for me.

@benz0li
Copy link
Contributor

benz0li commented Jul 5, 2022

I thought about running aarch64 builds on M1 Mac, but there is no free M1 Cloud Mac, as far as I know.
I might be wrong here.

No, you are not.

You're using your personal Mac Mini, right?

Yes. It was a $900 investment.

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Jul 5, 2022

To be honest, I don't think it would be a good idea to use personal equipment in a very popular open-source project.
I mean, it's totally fine for personal projects and I like your setup.

But if something happens to the person who owns the hardware or he decides to stop providing it as a service or the hardware breaks, it would be quite difficult to make this project work again.
In my case, I added a complete guide to set up a new runner. If I decide to stop using Oracle cloud, then someone has to follow the instructions and it should work just fine. I did nothing to my VMs, except the things I documented. Moreover, I recreated everything from scratch after the last change, so it should be fine.
Another possibility is that Oracle stops providing these runners, but I hope it's not gonna happen and there are other alternatives (Amazon, for example, but not as good in a free tier).

@mathbunnyru
Copy link
Member Author

There were some reviews in the past, so I'm going to merge this when this passes.
Now, I have some time to fix with some bugs, if there will be any, in master branch.

@mathbunnyru mathbunnyru merged commit 9e3fd1c into jupyter:master Jul 5, 2022
@mathbunnyru
Copy link
Member Author

AFAIK, everything finally works as expected 🎉

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Jul 6, 2022

I was able to enable building datascience-notebook under aarch64.
And here comes the best part:
even in master with datascience-notebook and pushing to DockerHub build time is 1h 23m 22s

Of course, this is true with the assumption, that self-hosted runners are only working for one build (because there are only 2 of them right now).

@trallard
Copy link
Member

trallard commented Jul 6, 2022

Wow @mathbunnyru I have been away as I am stretched with work but kudos on this major refactoring. You've done an incredible work 🙂 thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Maintenance A proposed enhancement to how we maintain this project
Projects
None yet
5 participants