Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaps when porting to AArch64 #2551

Closed
3 of 4 tasks
dixudx opened this issue Mar 29, 2018 · 18 comments
Closed
3 of 4 tasks

Gaps when porting to AArch64 #2551

dixudx opened this issue Mar 29, 2018 · 18 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dixudx
Copy link
Member

dixudx commented Mar 29, 2018

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
FEATURE REQUEST

Environment:

  • Cloud provider or hardware configuration:
    Hardware

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.12.0-221-arm64 aarch64
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Version of Ansible (ansible --version):
ansible 2.3.1.0
  config file = /xxx/ansible.cfg
  configured module search path = [u'./library']
  python version = 2.7.5 (default, Aug 25 2017, 09:08:42) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

Kubespray version (commit) (git rev-parse --short HEAD): Latest master branch

Anything else do we need to know:

I've found several gaps when porting kubespray to Aarch64.

@xd007 has submitted several PRs to fix all the below addressing issues.

1. Image Arch:

2. Docker Imcompatible

3. Etcd start-up

/cc @mattymo @rsmitty

@ant31
Copy link
Contributor

ant31 commented Aug 17, 2018

thank you @xd007, @dixudx ! That's awesome !

Few things,
I would do set_fact about the architecture for each node. So instead of being manual, the detection would be automatic per node (fewer errors in configuration).
I think the nodes are automatically labeled with the arch, but if not we probably should do it (hybrid clusters): built-in-node-labels

For the etcd PR. it uses a different way to switch between image-tags. I would like to have it consistent with other images.

Let me know if you need help or more info

@dixudx
Copy link
Member Author

dixudx commented Aug 20, 2018

I think the nodes are automatically labeled with the arch, but if not we probably should do it (hybrid clusters): built-in-node-labels

@ant31 Those labels are automatically labeled by kubelet.

I would like to have it consistent with other images.

Adding such an environment variable ETCD_UNSUPPORTED_ARCH for other images, like amd64 or ppc64le, does no harm. But it seems misleading. Do you want to do like this?

For etcd, it will check the architecture for non amd64/ppc64le platforms.

	// TODO qualify arm64
	if runtime.GOARCH == "amd64" || runtime.GOARCH == "ppc64le" {
		return
	}
	// unsupported arch only configured via environment variable
	// so unset here to not parse through flag
	defer os.Unsetenv("ETCD_UNSUPPORTED_ARCH")
	if env, ok := os.LookupEnv("ETCD_UNSUPPORTED_ARCH"); ok && env == runtime.GOARCH {
		fmt.Printf("running etcd on unsupported architecture %q since ETCD_UNSUPPORTED_ARCH is set\n", env)
		return
	}

@ant31
Copy link
Contributor

ant31 commented Aug 20, 2018

I'm not saying to drop the way your doing adding the envvar ETCD_UNSUPPORTED_ARCH, just to be consistent in how tags are chose with other images.

@dixudx
Copy link
Member Author

dixudx commented Aug 20, 2018

doing adding the envvar ETCD_UNSUPPORTED_ARCH, just to be consistent in how tags are chose with other images.

@ant31 Yeah, I know. Right. That would seems more native and elegant.

But currently I can't find a better way to handle this case.

@ant31
Copy link
Contributor

ant31 commented Aug 20, 2018

@dixudx I've proposed a solution in #3140 , could you review please ?
the two commits to check are: 6de0076 and 5c47d8a

@vielmetti
Copy link
Contributor

Looks like #3140 was merged, @ant31 are there other open tasks or does this now work?

@karlskewes
Copy link
Contributor

karlskewes commented Jan 4, 2019

I'm just going through this now with fresh hosts and looks like a few more changes required for cluster.yml to succeed.

  1. {{ image_arch }} added to download role defaults - PR Update kubectl and etcd download urls for mult-arch #3975
  2. Deleting ubuntu-bionic.yml in Docker CE install on Ubuntu Bionic arm64 #3972 - PR Remove Ubuntu Bionic specific vars file - breaks multi-arch #3974
  3. Image SHA's added for arm64 binaries extending this pattern perhaps ? - Open to suggestions, happy to PR. I've just put in local overrides .yml for now.
  4. Bumping calico version to 3.2.x where arm64 was enabled or even better to 3.3.x where the Calico Quay repositories seem to offer v3.3.x-{amd64|arm64} naming convention to suit Add support for etcd arm64 #3140 from above. EDIT: This could also be a TODO comment for arm users to be aware of.
    EDITv2: It seems part of Add support for etcd arm64 #3140 was reverted due to Calico breaking changes in 3.2 and 3.3 releases. So right now I'm trying weave (used before without kubespray). Will come back to Calico.

@vielmetti
Copy link
Contributor

@vielmetti and @Miouge1 are working on WorksOnArm/equinix-metal-arm64-cluster#127 relevant to this.

@Miouge1
Copy link
Contributor

Miouge1 commented Feb 3, 2019

@kskewes Calico version bump has already been done to v3.4, and I've opened #4176 to handle the arm64 checksums.

As far as I tested, this is enough to get Kubespray with Calico up on arm64. Other CNI plugins will need more work, but Calico is a good start since it's the Kubespray default.

@karlskewes
Copy link
Contributor

Nice work!
I'm still running weave but can rebase and try calico.

@vielmetti vielmetti mentioned this issue Feb 7, 2019
2 tasks
@karlskewes
Copy link
Contributor

Good one on getting all those SHA's in there.
Changed to Calico 3.4 after a cluster reset.
All running great, Metallb (deployed separately) advertising LoadBalancer routes via BGP.

@Miouge1
Copy link
Contributor

Miouge1 commented Feb 15, 2019

I did further tests around arm64 support:

  • ran into a problem with a mixed x86 and arm64 cluster, see PR Use docker.io for calico #4253
  • AFAIK flannel has no docker image for arm64
  • Tests of Weave on arm64 look promising

@vielmetti
Copy link
Contributor

vielmetti commented Feb 15, 2019

Here is a related Flannel issue for arm64: flannel-io/flannel#663

and a request to address the images for Flannel: coreos/flannel-cni#10 with this PR coreos/flannel-cni#13

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 16, 2019
@vielmetti
Copy link
Contributor

It appears that we're still stuck on this with Flannel.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 16, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants