Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-architecture plan for Kubernetes #38067

Closed
luxas opened this issue Dec 4, 2016 · 40 comments · Fixed by #49457
Closed

Multi-architecture plan for Kubernetes #38067

luxas opened this issue Dec 4, 2016 · 40 comments · Fixed by #49457
Assignees
Labels
area/build-release area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@luxas
Copy link
Member

luxas commented Dec 4, 2016

Background: I've implemented most of the multi-architecture Kubernetes has today, and wrote the proposal here: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/multi-platform.md

Now it's time to continue improving the multi-arch experience as well.
Tasks to do:

  • Deprecate armel and use armhf images instead and use GOARM=7 instead of GOARM=6
    • Motivation:
      • The only GOARM=6 board Go will support in go1.8 is the Raspberry Pi 1 which is just too slow to run newer Kubernetes versions.
      • Small performance improvements when using GOARM=7
      • The armel (http://hub.docker.com/u/armel) images are not updated as often as the armhf (http://hub.docker.com/u/armhf) images are.
  • Use go1.8 as fast as possible for arm and ppc64le (and of course generally as well, but that will require the "real" release)
    • Motivation:
      • Brings us a lot of mandatory fixes for arm and ppc64le
      • Brings us the SSA backend which is ~30% faster
      • We can remove the patched golang for arm and start building ppc64le binaries by default again
      • arm hyperkube will start working again
      • Even if it's beta, it's probably better than a self-patched version of go1.7
    • Proposal:
  • Reenable ppc64le builds again by using go1.8betas until the stable version of go1.8 is released and release v1.6 of kubernetes for ppc64le with go1.8
  • Evalute s390x as a new platform
  • Convert the essential images that are named registry/binary:version to manifest lists
    • TODO: Investigate rkt support for manifest lists: Support Docker manifest lists appc/docker2aci#193
    • Wait for gcr.io to roll out a v2 schema 2 registry that support manifest lists. @aronchick told me it's gonna happen mid-December.
    • Start building manifest lists when releasing Kubernetes (kube-apiserver, kube-scheduler, etc.)
    • Basically, all images will be named registry/binary-arch:version as most of them are now, but then the image without the -arch bit will be a manifest list which points to the right -arch image depending on which arch docker runs on.
    • Convert all other essential images to manifest lists, namely:
  • Convert the ingress images to multiple architectures

cc-ing involved people here:
@kubernetes/sig-testing @david-mcmahon @saad-ali @pwittrock @Pensu @ixdy @jessfraz @thockin @vishh @gajju26 @brendandburns

@luxas luxas self-assigned this Dec 4, 2016
@luxas luxas added this to the v1.6 milestone Dec 4, 2016
@luxas luxas added area/build-release area/release-eng Issues or PRs related to the Release Engineering subproject labels Dec 4, 2016
@thockin
Copy link
Member

thockin commented Dec 5, 2016 via email

@xiaolou86
Copy link
Contributor

mark, to learn later

@vielmetti
Copy link

@thockin - you might sync with @tianon who is doing Alpine builds for aarch64 - we sorted out a few issues and then alpine edge started to work on that platform. Cf. https://hub.docker.com/r/aarch64/alpine/

What I don't know is the time schedule for the next release of Alpine, whether we'll have Alpine 3.5 with aarch64 support before or after Kubernetes 1.6.

@luxas
Copy link
Member Author

luxas commented Dec 18, 2016

@vielmetti That's great!
To date we've only used busybox everywhere, and we have to do that until all architectures support alpine (s390x and ppc64le yet)

The thing that's more important though is getting the manifest lists working properly in core, and to make some manifest lists for official images available from Docker Hub. cc @tianon

@tianon
Copy link

tianon commented Dec 19, 2016

docker-library/official-images#2289 is the relevant discussion for supporting manifest lists in official images

@luxas
Copy link
Member Author

luxas commented Jan 13, 2017

Now things are happening with this when go1.8rc1 was released:

Regarding s390x, @ixdy said this in #36050 (comment):

FYI I'd forgotten to push images for s390x, so looking into that now.
Pushed:

  • gcr.io/google_containers/debian-iptables-s390x:v5
  • gcr.io/google_containers/pause-s390x:3.0
  • gcr.io/google-containers/kube-addon-manager-s390x:v6.1

Issues:

  • kubedns: @bowei can you handle this, since it moved to a different repo? also it looks like dnsmasq (https://github.com/kubernetes/dns/blob/master/images/dnsmasq/Makefile) has no s390x support yet.
  • cluster/images/etcd uses go1.6.3, so we can't currently build gcr.io/google_containers/etcd-s390x
  • google_containers/hyperkube-s390x: I suspect this will automatically be released with 1.6. nothing there yet, though.
  • cluster/images/kube-discovery has weird versioning - its tag is just "1.0", but it depends on whatever's been built in the tree, so I'm not sure what commit to push for s390x. @dgoodwin can maybe help here.
  • gcr.io/google_containers/serve_hostname - it looks like nobody has pushed any non-amd64 images for this one yet.

I and @bowei will look into the DNS images soon I guess.
@ixdy will be able to push the etcd image when #38926 is merged.
hyperkube will be released with the first 1.6 alpha automatically
Yes, kube-discovery has a bit weird versioning, but it was a one-time add and haven't been updated since the initial add and push, so it's safe to just build and push it now @ixdy

We can look into the test images later.

I have #38926 ready for review. Highlights there:

  • Removes the golang 1.7 patch
  • Instead of the patched golang; go1.8rc1 is put in the GOROOT that arm and ppc64le now uses
  • Reenables ppc64le because it's using go1.8rc1 which has all the changes it needs
  • armel => armhf
  • GOARM 6 => GOARM 7
  • Bumps the QEMU version to v2.7.0

After this, I'm gonna start looking into making manifest lists for the official images so we can avoid -ARCH suffixes and instead just let docker pull the right layers for the right arch.

@estesp
Copy link

estesp commented Jan 25, 2017

Just FYI--IBM is doing some work to hopefully get alpine supported on ppc64le and s390x (or more completely in cases where work has already been underway).

@luxas
Copy link
Member Author

luxas commented Jan 25, 2017

@estesp That's great! See the conversation about that here please: #40248 (comment)

k8s-github-robot pushed a commit that referenced this issue Jan 28, 2017
Automatic merge from submit-queue

Improve the multiarch situation; armel => armhf; reenable pcc64le; remove the patched golang

**What this PR does / why we need it**: 
 - Improves the multiarch situation as described in #38067 
 - Tries to bump to go1.8 for arm (and later enable ppc64le)
 - GOARM 6 => GOARM 7
 - Remove the golang 1.7 patch
 - armel => armhf
 - Bump QEMU version to v2.7.0

**Release note**:

```release-note
Improve the ARM builds and make hyperkube on ARM working again by upgrading the Go version for ARM to go1.8beta2
```

@kubernetes/sig-testing-misc @jessfraz @ixdy @jbeda @david-mcmahon @pwittrock
@xnox
Copy link

xnox commented Feb 7, 2017

Hello. about s390x support I see for example k8s-dns-kube-dns-ppc64le and flannel-ppc64le, but not s390x variants for those. Will gcr.io/google_containers/flannel-s390x and gcr.io/google_containers/k8s-dns-kube-dns-s390x be build and published soon? Anything outstanding to make that happen?

@cwsolee
Copy link

cwsolee commented Feb 7, 2017

Both gcr.io/google_containers/flannel-s390x and gcr.io/google_containers/k8s-dns-kube-dns-s390x are work in progress. We just finish flannel porting and PR almost all done. Next is add enable build script to create image in gcr.io, might be a while though.

@luxas
Copy link
Member Author

luxas commented Feb 7, 2017

cc @bowei

The gcr.io/google_containers/k8s-dns-kube-dns-s390x image is just an extra command for the Googler that's releasing the image.

For flannel, you have to coordinate with them, the gcr.io/google_containers/flannel doesn't exist anymore because I and the flannel team ported it "upstream" to flannel.

Overall, there are k8s binaries for s390x since v1.6.0-alpha.1 and adding debs for it + kubeadm is in the progress as well.

@cwsolee
Copy link

cwsolee commented Feb 8, 2017

IC, Yes, my team did the enablement for K8s binaries a while back and just got pick up since 1.6 alpha 1. We'll look at latest Flannel to see what we need to do. To provide gcr.io/google_containers/k8s-dns-kube-dns-s390x. do u know who we can talk to? Or we'll go thru our usual community route.

@bowei
Copy link
Member

bowei commented Feb 8, 2017

@cwsolee -- send a pull request enabling the architecture to the repository

@ethernetdan
Copy link
Contributor

@luxas can this be closed or moved to v1.7?

@luxas
Copy link
Member Author

luxas commented Mar 14, 2017

This can be moved to v1.7, it couldn't be fully implemented in v1.6 due to that gcr.io doesn't support manifest lists yet

Moving milestone...

@luxas luxas modified the milestones: v1.7, v1.6 Mar 14, 2017
@vielmetti
Copy link

Thanks @luxas . Is there a spot where gcr.io support for manifest lists is being discussed or addresssed?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2018
@mkumatag
Copy link
Member

mkumatag commented Apr 8, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2018
@mschrupp
Copy link

mschrupp commented Jul 3, 2018

@luxas or @mkumatag
just to make sure what is meant here with manifest lists -
will container images such as kube-proxy soon be multi-arch using manifest lists (for images without explicit architecture)?

Background: I currently have the problem that kubeadm generates a DaemonSet
with gcr.io/google_containers/kube-proxy-amd64:v1.11.0, which doesn't allow kubeadm join from arm devices (e.g. RPi).

It should rather use gcr.io/google_containers/kube-proxy:v1.11.0, but this image also only contains amd64 as of now, no manifest list

As workaround I'm putting a nodeSelector for beta.kubernetes.io/arch: amd64
in the kube-system/kube-proxy DaemonSet and duplicate it for
beta.kubernetes.io/arch: arm with gcr.io/google_containers/kube-proxy-arm:v1.11.0

@vielmetti
Copy link

@JesusOfSuburbia - that sounds like a problem that should be raised as an issue of its own!

@mkumatag
Copy link
Member

mkumatag commented Jul 4, 2018

Background: I currently have the problem that kubeadm generates a DaemonSet
with gcr.io/google_containers/kube-proxy-amd64:v1.11.0, which doesn't allow kubeadm join from arm devices (e.g. RPi).

It should rather use gcr.io/google_containers/kube-proxy:v1.11.0, but this image also only contains amd64 as of now, no manifest list

@JesusOfSuburbia do you have a mixed cluster? because in kubeadm we already have code to take care arch and additional code also added recently to restrict the target node #64696

@mschrupp
Copy link

mschrupp commented Jul 4, 2018

@mkumatag oh wow, thanks for the link. yes, I have a mixed cluster! it looks like the restriction takes the same approach I'm doing manually at the moment. Still waiting for the manifest lists though, but at least it's documented for now. Thanks again, also for your work!

@dims
Copy link
Member

dims commented Jul 7, 2018

Update as of today

Guess eventual goal is to be able to run conformance tests on alternate architectures - vmware-tanzu/sonobuoy#181

@mkumatag anything else i missed?

@mkumatag
Copy link
Member

mkumatag commented Jul 9, 2018

@dims you have covered it really well.. :)

@timothysc
Copy link
Member

/cc @liztio @chuckha - re: sonobuoy multi-arch support.

@dims
Copy link
Member

dims commented Sep 12, 2018

as of v1.12.0-beta.2, all the kubernetes release container images are multi-arch capable. Also conformance tests have switched to multi-arch as well.

if there are any other images missing manifests, we should treat it as a bug and close this issue out.

/close

@k8s-ci-robot
Copy link
Contributor

@dims: Closing this issue.

In response to this:

as of v1.12.0-beta.2, all the kubernetes release container images are multi-arch capable. Also conformance tests have switched to multi-arch as well.

if there are any other images missing manifests, we should treat it as a bug and close this issue out.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mkumatag
Copy link
Member

@dims This is the only pending PR I see - #59664 at the moment which is needed for etcd image.

@dims
Copy link
Member

dims commented Sep 13, 2018

@mkumatag i managed to test the dns PR too, we need that to merge - kubernetes/dns#259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-release area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.