Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track the gaps when porting to ARM (arm7l) #4294

Closed
1 of 5 tasks
lwolf opened this issue Feb 22, 2019 · 28 comments
Closed
1 of 5 tasks

Track the gaps when porting to ARM (arm7l) #4294

lwolf opened this issue Feb 22, 2019 · 28 comments

Comments

@lwolf
Copy link
Contributor

@lwolf lwolf commented Feb 22, 2019

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
FEATURE REQUEST

Environment:

  • Cloud provider or hardware configuration:
    Hardware

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.14.78-150 armv7l
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Version of Ansible (ansible --version):
    ansible 2.7.2
    python version = 2.7.10 (default, Oct 6 2017, 22:29:07) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)]

Kubespray version (commit) (git rev-parse --short HEAD): Latest master branch

Anything else do we need to know:

At the moment it is impossible to install kubespray on pure arm7l hardware.
Most of the containers does not provide binaries/containers for non-amd64 arch.

The aim of this ticket is to make it possible use ARM hardware as pool of worker nodes alongside the amd64 master/nodes.
I've found a several gaps when trying to install kubespray on arm7l devices. So

  1. Checksums aren't available for arm, only for amd64,arm64
  • add checksums for the main components - hyperkube, kubeadm and cni_binary #4261
  1. Add NodeSelector to some manifests
  • find all the places where only amd64 containers are available (tiller, dashboard, dnsautoscaler)
  1. Overlay network support, provide per architecture daemonsets gated by NodeSelector
  • flannel: could be deployed to all the architectures (arm, arm64,amd64,etc)
  • calico: could be run on amd64,arm64
  1. Etcd - etcd does not provide binaries for arm32. Until there will be one, ARM nodes can't act as a master nodes.
  • build etcd for arm7l and create a container

related issues: #4261 #4065

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

Also default pause container is invalid (( plus has '\n' in it for arm7l and it breaks systemd unit ))

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

pod_infra_image_repo: k8s.gcr.io/pause
pod_infra_image_tag: "3.1"

works for me

@ant31
Copy link
Contributor

@ant31 ant31 commented Feb 25, 2019

what are the production usecase to run workload on arm7 ?
It's already to complex to have both amd64 and arm64, I would limit the options to only real production usecases.

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

IoT on BBB devices in my usecase for example

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

(( Beagle bone black ))

@ant31
Copy link
Contributor

@ant31 ant31 commented Feb 26, 2019

what can of load are you running on those machines? why kubernetes?

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 26, 2019

A quite simple load, reading from serial UARTs and sending to the message queue data.

Why kubernetes?

Because I'm familiar with it and it serves me health checks, deployments, upgrade process nicely. It nicely manages secrets and monitoring. With other solutions such as consul + ansible + docker I'd had to have some verification deployment successfully completed + gradual rollout. Maybe I could also use a spinnaker or something like that, though I'm not familiar with it, and I run k8s for the rest of the infrastructure.

K8s downside on edge devices in CPU usage...it's around 15% CPU time just on kubelet, even after tuning various housekeeping/node status freq parameters. Most of the time is spent during syscalls, runtime, and some JSON decoding.

EDIT: This is AM335x 1GHz ARM® Cortex-A8, and I see 30% branch misprediction rate system-wide ( also similar amount for kubelet )...so yeah, not the best processor in the world nor the most powerful.

@b23prodtm
Copy link

@b23prodtm b23prodtm commented Apr 1, 2019

According to coreOS docs

etcd has known issues on 32-bit systems due to a bug in the Go runtime

Etcd-io doesn't provide any 32bits ARM binaries because of a Go language issue. Otherwise we could have downloaded tarballs from cores/etcd-io (https://github.com/coreos/etcd/releases/download/) and checksums .

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Apr 5, 2019

/kind feature
/help

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Apr 5, 2019

@Miouge1:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/kind feature
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rich-nahra
Copy link

@rich-nahra rich-nahra commented May 5, 2019

I ran into this issue today. I have a esxi home lab with limited amount of RAM so I thought it would be nice to run masters / etcd on ARM and minions on vm's. etcd seems to be the only blocker.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Aug 3, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Aug 4, 2019

/remove-lifecycle stale

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Nov 2, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Dec 2, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Dec 12, 2019

@rich-nahra well etcd works on arm64, but not on arm32. I've seen lots of people running k3s on arm32 (raspberry pi) which uses sqlite instead of etcd, so I guess that's a possible work around for test labs.

@lwolf and @nmiculinic I would consider "etcd for arm32" out of scope of Kubespray, is there anything else we can do to make life easier for Kubespray on ARM 32bits?

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Dec 13, 2019

@Miouge1 I agree that etcd is probably out of scope. I recently migrated my arm32 cluster to k3s.

I need to check if it still relevant, but last time I check this one was still an issue - 2.Add NodeSelector to some manifests
It's about gating deployments to specific node types if container exists only for a specific arch. Like gate helm/tiller to only amd64.

@visago
Copy link

@visago visago commented Jan 9, 2020

There's no calico release for arm (32bit) too, is there a work around for that too ? (I have a seperate 64bit etcd node to get around the lack of a 32bit etcd)

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Jan 9, 2020

The only CNI that works on arm32 at the moment is flannel.

@thiscantbeserious
Copy link

@thiscantbeserious thiscantbeserious commented Jan 25, 2020

@Miouge1 I agree that etcd is probably out of scope. I recently migrated my arm32 cluster to k3s.

I need to check if it still relevant, but last time I check this one was still an issue - 2.Add NodeSelector to some manifests
It's about gating deployments to specific node types if container exists only for a specific arch. Like gate helm/tiller to only amd64.

Well, arm is supported, there are just no builds in the official repo and its flagged as unstable:

https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/supported-platform.md

For example Ubuntu has it in the official repo:

https://packages.ubuntu.com/search?keywords=etcd

But it seems to require an experimental flag set - its unstable due to a bug in Go, that's existing 9 years now. Funnily speaking just recently it picked up tracktion and there was a prototype implemented to fix this bug for 32 bit systems:

golang/go#599

That said we're speaking about armhf only here so armv7+ with hardware floating point unit - no other architectures likely possible (???)

That includes the Raspberry Pi 3 Pre-B or the Odroid XU4-Platform (e.g. Odroid HC1)

Anyway it runs on Ubuntu Bionic Beaver 18.04 on the Odroid HC1 for me with the flag mentioned set in the systemd.service:

doh@node1:~$ etcd
2020-01-26 00:35:41.336453 W | etcdmain: running etcd on unsupported architecture "arm" since ETCD_UNSUPPORTED_ARCH is set
2020-01-26 00:35:41.338147 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=arm
2020-01-26 00:35:41.338300 I | etcdmain: etcd Version: 3.2.17
2020-01-26 00:35:41.338387 I | etcdmain: Git SHA: Not provided (use ./build instead of go build)
2020-01-26 00:35:41.338470 I | etcdmain: Go Version: go1.10
2020-01-26 00:35:41.338552 I | etcdmain: Go OS/Arch: linux/arm
2020-01-26 00:35:41.338636 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
2020-01-26 00:35:41.338740 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2020-01-26 00:35:41.341755 C | etcdmain: listen tcp 127.0.0.1:2380: bind: address already in use
doh@node1:~$

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Feb 25, 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Feb 25, 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hadrien-toma
Copy link
Contributor

@hadrien-toma hadrien-toma commented Feb 25, 2020

/reopen

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Feb 25, 2020

@hadrien-toma: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 25, 2020

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Feb 25, 2020
@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Feb 25, 2020

@lwolf: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Mar 26, 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Mar 26, 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.