Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-architecture plan for Kubernetes #38067

Closed
luxas opened this issue Dec 4, 2016 · 40 comments

Comments

@luxas
Copy link
Member

commented Dec 4, 2016

Background: I've implemented most of the multi-architecture Kubernetes has today, and wrote the proposal here: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/multi-platform.md

Now it's time to continue improving the multi-arch experience as well.
Tasks to do:

  • Deprecate armel and use armhf images instead and use GOARM=7 instead of GOARM=6
    • Motivation:
      • The only GOARM=6 board Go will support in go1.8 is the Raspberry Pi 1 which is just too slow to run newer Kubernetes versions.
      • Small performance improvements when using GOARM=7
      • The armel (http://hub.docker.com/u/armel) images are not updated as often as the armhf (http://hub.docker.com/u/armhf) images are.
  • Use go1.8 as fast as possible for arm and ppc64le (and of course generally as well, but that will require the "real" release)
    • Motivation:
      • Brings us a lot of mandatory fixes for arm and ppc64le
      • Brings us the SSA backend which is ~30% faster
      • We can remove the patched golang for arm and start building ppc64le binaries by default again
      • arm hyperkube will start working again
      • Even if it's beta, it's probably better than a self-patched version of go1.7
    • Proposal:
  • Reenable ppc64le builds again by using go1.8betas until the stable version of go1.8 is released and release v1.6 of kubernetes for ppc64le with go1.8
  • Evalute s390x as a new platform
    • If no one loudly complains, I'm gonna take care of the PRs #37092 and #36050 and merge them in time for v1.6
  • Convert the essential images that are named registry/binary:version to manifest lists
    • TODO: Investigate rkt support for manifest lists: appc/docker2aci#193
    • Wait for gcr.io to roll out a v2 schema 2 registry that support manifest lists. @aronchick told me it's gonna happen mid-December.
    • Start building manifest lists when releasing Kubernetes (kube-apiserver, kube-scheduler, etc.)
    • Basically, all images will be named registry/binary-arch:version as most of them are now, but then the image without the -arch bit will be a manifest list which points to the right -arch image depending on which arch docker runs on.
    • Convert all other essential images to manifest lists, namely:
  • Convert the ingress images to multiple architectures

cc-ing involved people here:
@kubernetes/sig-testing @david-mcmahon @saad-ali @pwittrock @Pensu @ixdy @jessfraz @thockin @vishh @gajju26 @brendandburns

@luxas luxas self-assigned this Dec 4, 2016

@luxas luxas added this to the v1.6 milestone Dec 4, 2016

@thockin

This comment has been minimized.

Copy link
Member

commented Dec 5, 2016

@xiaolou86

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2016

mark, to learn later

@vielmetti

This comment has been minimized.

Copy link

commented Dec 18, 2016

@thockin - you might sync with @tianon who is doing Alpine builds for aarch64 - we sorted out a few issues and then alpine edge started to work on that platform. Cf. https://hub.docker.com/r/aarch64/alpine/

What I don't know is the time schedule for the next release of Alpine, whether we'll have Alpine 3.5 with aarch64 support before or after Kubernetes 1.6.

@luxas

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2016

@vielmetti That's great!
To date we've only used busybox everywhere, and we have to do that until all architectures support alpine (s390x and ppc64le yet)

The thing that's more important though is getting the manifest lists working properly in core, and to make some manifest lists for official images available from Docker Hub. cc @tianon

@tianon

This comment has been minimized.

Copy link

commented Dec 19, 2016

docker-library/official-images#2289 is the relevant discussion for supporting manifest lists in official images

@luxas

This comment has been minimized.

Copy link
Member Author

commented Jan 13, 2017

Now things are happening with this when go1.8rc1 was released:

Regarding s390x, @ixdy said this in #36050 (comment):

FYI I'd forgotten to push images for s390x, so looking into that now.
Pushed:

  • gcr.io/google_containers/debian-iptables-s390x:v5
  • gcr.io/google_containers/pause-s390x:3.0
  • gcr.io/google-containers/kube-addon-manager-s390x:v6.1

Issues:

  • kubedns: @bowei can you handle this, since it moved to a different repo? also it looks like dnsmasq (https://github.com/kubernetes/dns/blob/master/images/dnsmasq/Makefile) has no s390x support yet.
  • cluster/images/etcd uses go1.6.3, so we can't currently build gcr.io/google_containers/etcd-s390x
  • google_containers/hyperkube-s390x: I suspect this will automatically be released with 1.6. nothing there yet, though.
  • cluster/images/kube-discovery has weird versioning - its tag is just "1.0", but it depends on whatever's been built in the tree, so I'm not sure what commit to push for s390x. @dgoodwin can maybe help here.
  • gcr.io/google_containers/serve_hostname - it looks like nobody has pushed any non-amd64 images for this one yet.

I and @bowei will look into the DNS images soon I guess.
@ixdy will be able to push the etcd image when #38926 is merged.
hyperkube will be released with the first 1.6 alpha automatically
Yes, kube-discovery has a bit weird versioning, but it was a one-time add and haven't been updated since the initial add and push, so it's safe to just build and push it now @ixdy

We can look into the test images later.

I have #38926 ready for review. Highlights there:

  • Removes the golang 1.7 patch
  • Instead of the patched golang; go1.8rc1 is put in the GOROOT that arm and ppc64le now uses
  • Reenables ppc64le because it's using go1.8rc1 which has all the changes it needs
  • armel => armhf
  • GOARM 6 => GOARM 7
  • Bumps the QEMU version to v2.7.0

After this, I'm gonna start looking into making manifest lists for the official images so we can avoid -ARCH suffixes and instead just let docker pull the right layers for the right arch.

@estesp

This comment has been minimized.

Copy link

commented Jan 25, 2017

Just FYI--IBM is doing some work to hopefully get alpine supported on ppc64le and s390x (or more completely in cases where work has already been underway).

@luxas

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2017

@estesp That's great! See the conversation about that here please: #40248 (comment)

k8s-github-robot pushed a commit that referenced this issue Jan 28, 2017

Kubernetes Submit Queue
Merge pull request #38926 from luxas/update_cross_go18
Automatic merge from submit-queue

Improve the multiarch situation; armel => armhf; reenable pcc64le; remove the patched golang

**What this PR does / why we need it**: 
 - Improves the multiarch situation as described in #38067 
 - Tries to bump to go1.8 for arm (and later enable ppc64le)
 - GOARM 6 => GOARM 7
 - Remove the golang 1.7 patch
 - armel => armhf
 - Bump QEMU version to v2.7.0

**Release note**:

```release-note
Improve the ARM builds and make hyperkube on ARM working again by upgrading the Go version for ARM to go1.8beta2
```

@kubernetes/sig-testing-misc @jessfraz @ixdy @jbeda @david-mcmahon @pwittrock
@xnox

This comment has been minimized.

Copy link

commented Feb 7, 2017

Hello. about s390x support I see for example k8s-dns-kube-dns-ppc64le and flannel-ppc64le, but not s390x variants for those. Will gcr.io/google_containers/flannel-s390x and gcr.io/google_containers/k8s-dns-kube-dns-s390x be build and published soon? Anything outstanding to make that happen?

@cwsolee

This comment has been minimized.

Copy link

commented Feb 7, 2017

Both gcr.io/google_containers/flannel-s390x and gcr.io/google_containers/k8s-dns-kube-dns-s390x are work in progress. We just finish flannel porting and PR almost all done. Next is add enable build script to create image in gcr.io, might be a while though.

@luxas

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2017

cc @bowei

The gcr.io/google_containers/k8s-dns-kube-dns-s390x image is just an extra command for the Googler that's releasing the image.

For flannel, you have to coordinate with them, the gcr.io/google_containers/flannel doesn't exist anymore because I and the flannel team ported it "upstream" to flannel.

Overall, there are k8s binaries for s390x since v1.6.0-alpha.1 and adding debs for it + kubeadm is in the progress as well.

@cwsolee

This comment has been minimized.

Copy link

commented Feb 8, 2017

IC, Yes, my team did the enablement for K8s binaries a while back and just got pick up since 1.6 alpha 1. We'll look at latest Flannel to see what we need to do. To provide gcr.io/google_containers/k8s-dns-kube-dns-s390x. do u know who we can talk to? Or we'll go thru our usual community route.

@bowei

This comment has been minimized.

Copy link
Member

commented Feb 8, 2017

@cwsolee -- send a pull request enabling the architecture to the repository

@ethernetdan

This comment has been minimized.

Copy link
Member

commented Mar 13, 2017

@luxas can this be closed or moved to v1.7?

@luxas

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2017

This can be moved to v1.7, it couldn't be fully implemented in v1.6 due to that gcr.io doesn't support manifest lists yet

Moving milestone...

@luxas luxas modified the milestones: v1.7, v1.6 Mar 14, 2017

@vielmetti

This comment has been minimized.

Copy link

commented Mar 14, 2017

Thanks @luxas . Is there a spot where gcr.io support for manifest lists is being discussed or addresssed?

@roberthbailey roberthbailey modified the milestones: v1.8, v1.7 May 27, 2017

@luxas luxas removed the milestone/removed label Jan 8, 2018

@fejta-bot

This comment has been minimized.

Copy link

commented Apr 8, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@mkumatag

This comment has been minimized.

Copy link
Member

commented Apr 8, 2018

/remove-lifecycle stale

@mschrupp

This comment has been minimized.

Copy link

commented Jul 3, 2018

@luxas or @mkumatag
just to make sure what is meant here with manifest lists -
will container images such as kube-proxy soon be multi-arch using manifest lists (for images without explicit architecture)?

Background: I currently have the problem that kubeadm generates a DaemonSet
with gcr.io/google_containers/kube-proxy-amd64:v1.11.0, which doesn't allow kubeadm join from arm devices (e.g. RPi).

It should rather use gcr.io/google_containers/kube-proxy:v1.11.0, but this image also only contains amd64 as of now, no manifest list

As workaround I'm putting a nodeSelector for beta.kubernetes.io/arch: amd64
in the kube-system/kube-proxy DaemonSet and duplicate it for
beta.kubernetes.io/arch: arm with gcr.io/google_containers/kube-proxy-arm:v1.11.0

@vielmetti

This comment has been minimized.

Copy link

commented Jul 3, 2018

@jesusofsuburbia - that sounds like a problem that should be raised as an issue of its own!

@mkumatag

This comment has been minimized.

Copy link
Member

commented Jul 4, 2018

Background: I currently have the problem that kubeadm generates a DaemonSet
with gcr.io/google_containers/kube-proxy-amd64:v1.11.0, which doesn't allow kubeadm join from arm devices (e.g. RPi).

It should rather use gcr.io/google_containers/kube-proxy:v1.11.0, but this image also only contains amd64 as of now, no manifest list

@jesusofsuburbia do you have a mixed cluster? because in kubeadm we already have code to take care arch and additional code also added recently to restrict the target node #64696

@mschrupp

This comment has been minimized.

Copy link

commented Jul 4, 2018

@mkumatag oh wow, thanks for the link. yes, I have a mixed cluster! it looks like the restriction takes the same approach I'm doing manually at the moment. Still waiting for the manifest lists though, but at least it's documented for now. Thanks again, also for your work!

@dims

This comment has been minimized.

Copy link
Member

commented Jul 7, 2018

Update as of today

  • docker/cli#1156 has merged. This fixes a problem with manifest bytes. Then we need a docker release with that patch for subsequent tests.
  • #63453 has merged. we need folks with enough privileges to run the commands in #63453 (comment) to upload images to k8s.gcr.io
  • kubernetes/release#516 is still waiting to be merged, that will mint images when we push a alpha or beta which we can then use to test.
  • #61097 will allow using a manifest based busybox image (manifest-tool inspect docker.io/busybox shows this has a bunch of arch'es though not the support for windows that 61097 mentions)

Guess eventual goal is to be able to run conformance tests on alternate architectures - heptio/sonobuoy#181

@mkumatag anything else i missed?

@mkumatag

This comment has been minimized.

Copy link
Member

commented Jul 9, 2018

@dims you have covered it really well.. :)

@timothysc

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

/cc @liztio @chuckha - re: sonobuoy multi-arch support.

@dims

This comment has been minimized.

Copy link
Member

commented Sep 12, 2018

as of v1.12.0-beta.2, all the kubernetes release container images are multi-arch capable. Also conformance tests have switched to multi-arch as well.

if there are any other images missing manifests, we should treat it as a bug and close this issue out.

/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2018

@dims: Closing this issue.

In response to this:

as of v1.12.0-beta.2, all the kubernetes release container images are multi-arch capable. Also conformance tests have switched to multi-arch as well.

if there are any other images missing manifests, we should treat it as a bug and close this issue out.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mkumatag

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

@dims This is the only pending PR I see - #59664 at the moment which is needed for etcd image.

@dims

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

@mkumatag i managed to test the dns PR too, we need that to merge - kubernetes/dns#259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.