New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-arch next steps #1139
Comments
Thanks a lot for writing this issue! About the ARM instances, is there something blocking us from using the current dynamic agent allocation in ci.jenkins.io (e.g. adding a pipeline parallel branch to execute the About your questions:
Thanks for this huge and awesome work @timja |
Another food for thought if we want to get started with s390x and ppc64: WDYT about adding, right now:
|
For CI it's fine for running the build / test against each architecture, but when we're publishing we want to use docker buildx builders, which means we run the command from one machine which has ssh access to all the others required
👍, is it possible to get more machines, @MarkEWaite / @slide?
There doesn't seem to be atm, I can access from my machine
The main question was can I put it into DNS to make it easier to manage / reason about it, or if IP only is it ok if they are public or do they need to be loaded from a credential |
The ARM capability for trusted.ci is only a configuration away. Can you add this to today's infra meeting? I'll take care of that this week to open up the possibility here and not risk any blocking :) |
I can't see how to do that, I also can't attend, (meeting time doesn't work so well for me these days) |
@timja let me handle it, no problem on this (and many thanks for managing this!) |
(I meant the agenda, btw, the move to hackmd has made that harder than google doc was, I can't see a published agenda) |
I think we should reuse the machines but create separate accounts on the machine for those use cases. I've been using a separate account on the machines for my test cluster without any negative impact that I've detected. I propose to create the following accounts:
I'm open to either. I suspect that the operating system is more important than the cloud provider. @olblak and I have been using arm64 machines with Ubuntu 20.04 on Oracle Cloud with good results. I've also run arm64 on Oracle Cloud with Oracle Linux, but it is much less familiar to me than the Ubuntu environment. Oracle Cloud has offered us membership in their Arm accelerator program and a $3000 credit. AWS has donated $60k to the Jenkins project. My initial leaning is towards Oracle Arm just because there are so many other ways that we will use the capacity that AWS is donating.
I assume in Pipeline, though I'm OK with either.
The IP addresses are not considered sensitive. DNS entries seem like a very good idea.
That sounds good to me.
|
https://github.com/jenkins-infra/jenkins-infra/blob/staging/hieradata/common.yaml#L80
Fine with me, how can we get the machine setup? |
I'll create the machine and provide you an account on the machine with sudo. Are you OK with the idea that I proposed to have a |
fine from my POV @olblak or @dduportal may have different opinions, the machines we have already are quite powerful. |
I've created a timja account on s390x and on the ppc64le machine with the public key that you provided. |
I updated the Ubuntu packages on ppc64le and rebooted (it had 100+ packages that were outdated, including Java versions). The machine has restarted and is working. |
After the parallel changes are merged I can look at enabling this on ci at least in a PR Are we wanting the full test suite run on every platform or just smoke tests? |
FYI I tried running via QEMU after the git-lfs update in our dockerfile running on our agents (aws one) and looks like I hit this? which points to maybe a mis-configured QEMU?
|
Setting up QEMU got it further, It now fails on:
not clear why |
Manual test:
export ARCH=arm64
make build-debian_jdk11
docker run --rm -t docker.io/jenkins/jenkins:2.300-jdk11 uname -m
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
aarch64 => I assume the error when downloading the war file was a network issue: to be double checked of course :) |
@dduportal I can reproduce on the ubuntu 20 machine, just run
|
@timja when you say that you can reproduce, do you mean the error? Because the command you provided is successfull for me on both the Ubuntu 20.04 machine with QEMU installed (and enabled) and my macOS Intel (with Docker4Mac). |
I got the error last night on the ubuntu 20 machine using that above command, I've just retriggered it again |
@timja thanks for clearing it out, I asked because I was not sure if I understood correctly. It means the outcome of the build is not always the same: there is something weird :| |
it's passing now |
It takes 19m44.652 with no cache though, maybe we can remove some images we don't need multi-arch on? |
I built the full set twice more via QEMU (with --no-cache) One got stuck in s390x and I cancelled it after 30 minutes. One completed in 10 minutes This is on the reduced platform branch. I also ran it twice on my M1 building with remote builders (non emulated), 1st run failed with
2nd run failed with
exit code 4 is SIGILL which means unknown machine code =/ |
I ran the build ~4 times on the s390x machine and all failed on git-lfs install, when I removed the debian buster 11 image from the list it passed first time. So I've removed it in #1156 FTR I tried manually on ubuntu s390x and it works fine to install git-lfs |
@dduportal do you think we should continue trying with QEMU or build on architecture? |
In past sessions of the platform SIG, Alex Earl mentioned that there were specific feature issues with QEMU (see the Jan 15, 2021 platform SIG notes). Unfortunately, I didn't capture any details in the notes. You can hear the description from @slide at https://youtu.be/MzpL2IEkJ3E?t=530 There was a comment in the meeting Jan 15, 2020 as well that Jim Crowley was investigating QEMU. https://docs.google.com/document/d/1q5A72xnoJVPZRKXZhyNnYCSCTuG02LiFcKQkH5rdwXc/edit#heading=h.2ye0o1azc72i |
So @timja was able to determine why the publication was failing (on trusted.ci) while the usual pipeline was working (on ci.jenkins): the Azure VM required to enable the QEMU's binfmt before each QEMU build, while AWS (on ci.jenkins) EC2 agent only required the binfmt to be loaded in the AMI. #1169 enabled the multi-arch again and also has the fix, it seems that @timja tests are ok (custom images published on DockerHub) \o/ |
Images are now being published 🎉 |
@timja I could see multiarch images published with tag rhel-ubi8 but not all tags/versions images. Checking on https://hub.docker.com/r/jenkins/jenkins |
@Nayana-ibm We do have some plans to build and publish images for s390x as we currently have access to s390x infrastructure from IBM but there is no ETA at the moment as far as I know. |
Is there anything specific you’re after? Adding it to each tag makes the build take longer. so we would prefer if it was done based off of user need. It’s no problem to enable it for another one though |
@Nayana-ibm this adds it to the default image is that enough? #1183 |
@timja Thank you for considering s390x for default images. I could see images are now published with latest and jdk11 tags. Line 195 in 744ce8f
Am I missing anything here? |
Next LTS release is scheduled for Wed 25th August |
Great! Thank you |
make
targets have been added in this PR to show how that will works390x
,ppc64le
static agents--set '*.platform=linux/amd64'
`docker buildx` config and ssh config
docker buildx create --name remote --use docker buildx create --name remote \ --append ssh://jenkins-agent-ppc64le docker buildx create --name remote \ --append ssh://jenkins-agent-s390x # I'm running on arm so didn't actually do this, but for completeness docker buildx create --name remote \ --append ssh://jenkins-agent-arm64
Few questions:
s390x
andppc64le
agents across ci.jenkins.io and trusted-ci or we need to get another one for each?arm64
machine be hosted, as far as I know we have 2 choices, AWS or Oracle cloud, any preference?s390x
andppc64le
currently) considered sensitive? or can I create DNS entries for them, e.g. (ppc64le-agent.jenkins.io)Any help would be hugely appreciated ❤️
cc @olblak @MarkEWaite @slide @dduportal
The text was updated successfully, but these errors were encountered: