Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arm images for hyperkube, kubeadm and cni_binary #4261

Merged
merged 8 commits into from Jun 5, 2019

Conversation

@lwolf
Copy link
Contributor

@lwolf lwolf commented Feb 17, 2019

This adds support for arm checksums for hyperkube, kubeadm and cni images.

Related to #4065

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Feb 18, 2019

@lwolf thank you for your contribution. Great to see more ARM support, do you mind sharing the tests you did around this? Which HW platform did you use etc...?

ci check this
/lgtm

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 18, 2019

As a test, I run an upgrade of my current multiarch cluster from v1.12.3 to v1.12.5 with changes from this PR and #4176 backported to the 2.8 branch.

Arm nodes are based on https://www.hardkernel.com/shop/odroid-hc1-home-cloud-one/

$ uname -a
Linux odroid-hc-01 4.14.78-150 #1 SMP PREEMPT Tue Oct 23 10:43:36 -03 2018 armv7l armv7l armv7l GNU/Linux

To be clear, this PR is not enough to install kubespray on arm nodes, but it reduces number of hacks needed to do that.

Let me know if there are any specific tests I can run

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Feb 18, 2019

@lwolf if you can, it would be good to make an issue tracking all hacks necessary for armv7l support, kind of like #2551 is for armv8/arm64.

Also adding an arm hash for etcd_binary_checksums would be nice.

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 18, 2019

I'll run more or less clean deploy in the next few days and try to collect all the issues I'll have in a ticket.

Regarding etcd, unfortunately there is no arm build available.

@laimison
Copy link

@laimison laimison commented Feb 20, 2019

There is a preview/unofficial version of Debian Buster for Raspberry which enables arm64
https://wiki.debian.org/RaspberryPi3

@lwolf do you think it can reduce the issues that you have faced using arm64 instead of 32 bit?

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 20, 2019

@laimison I'm not sure how exactly arm64 debian will help in arm32 setup.

@b23prodtm
Copy link

@b23prodtm b23prodtm commented Feb 21, 2019

There is an issue #4259 opened also for etcd arm (armv7l , "32 bits") binary support.
Actually Pi 3 is a 64 bits capable CPU which wouldn't come enabled with "stretch" but "buster" as mentioned @laimison

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 21, 2019

So, if I'm not mistaken, RPI3 (actually a 64bit CPU) is being wrongly identified in older OS/kernel, which will be fixed in newer version of OS.
But most of the real armv7l devices are 32-bit, and require binaries for arm32.
So it doesn't help in the current context.

@b23prodtm thanks for linking related issue

@laimison
Copy link

@laimison laimison commented Feb 21, 2019

I have mentioned 64 bit solution because etcd binary is available at https://github.com/etcd-io/etcd/releases/download/v3.3.12/etcd-v3.3.12-linux-arm64.tar.gz and Raspberry itself is capable to run Buster. So the question is whether other Kubernetes binaries are missed and if Buster is ready/worth to try. Am I missing something especially about that etcd binary?

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 22, 2019

@laimison It seems that you have an assumption that Raspberry PI 3 is the only existing ARM hardware, or at least the only one, people will use with kubespray, therefore upgrading it to Buster and running arm64 binaries will solve all the problems.

This PR is about adding support for truly 32 bit ARM hardware, therefore, the only way I see to solve the issue with etcd is by compiling it accordingly.

Let me know if I misunderstood or missing something.

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 22, 2019

When I try to run in on the worker node:

    "downloads": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute u'arm'"

Somewhere in kubespray/extra_playbooks/roles/download/defaults/main.yml the arm is missing

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Feb 25, 2019

@nmiculinic is this on a kube-node without any kube-master or etcd role?

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

yes, that's right. Adding dummy etcd for arm kinda fixes things (( at least this bug for me ))

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Feb 25, 2019

@lwolf is it possible to add something (document or an assert) to gracefully fail if someone tries to do etcd on arm32?

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

When I run this playbook. For arm etcd I put some dummy hash so the dict doesn't crash and burn.


TASK [kubernetes/kubeadm : Join to cluster] ****************************************************************************
skipping: [ip-10-100-60-75.eu-west-1.compute.internal]
fatal: [bbb-test]: FAILED! => {"ansible_job_id": "848543574878.27806", "changed": true, "cmd": ["/usr/local/bin/kubeadm", "join", "--config", "/etc/kubernetes/kubeadm-client.conf", "--ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests"], "delta": "0:00:05.137759", "end": "2019-02-25 08:24:26.577483", "finished": 1, "msg": "non-zero return code", "rc": 2, "start": "2019-02-25 08:24:21.439724", "stderr": "\t[WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty\n[preflight] Some fatal errors occurred:\n\t[ERROR Port-10250]: Port 10250 is in use\n[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`", "stderr_lines": ["\t[WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty", "[preflight] Some fatal errors occurred:", "\t[ERROR Port-10250]: Port 10250 is in use", "[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`"], "stdout": "[preflight] Running pre-flight checks", "stdout_lines": ["[preflight] Running pre-flight checks"]}

TASK [kubernetes/kubeadm : Join to cluster with ignores] ***************************************************************
fatal: [bbb-test]: FAILED! => {"changed": false, "msg": "async task did not complete within the requested time"}

TASK [kubernetes/kubeadm : Display kubeadm join stderr if any] *********************************************************
fatal: [bbb-test]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stderr_lines'\n\nThe error appears to have been in '/Users/lpp/Desktop/ascalia/bb/kubespray/roles/kubernetes/kubeadm/tasks/main.yml': line 95, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Display kubeadm join stderr if any\n      ^ here\n"}
skipping: [ip-10-100-60-75.eu-west-1.compute.internal]

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 25, 2019

@Miouge1 I was thinking about adding a dummy value, but assert sounds like a better option. Will update the PR

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

This big error I c/p is result of armv7 having "\n" at the end and misconfiguring kubelet systemd service.

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 25, 2019

@nmiculinic yeah, I mentioned this issue and workaround I use in #4065 (comment)
but I feel like it is too "hacky" to put it into upstream.
Didn't have time yet to find more elegant solution

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Feb 25, 2019

Oh I missed it, I had a pause of a couple of weeks while I was doing other stuff, and now I'm back at this.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 26, 2019
@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Feb 26, 2019

@Miouge1 added dummy etcd checksum and a comment as a workaround for the "no attribute error" for now.

// I'll be away for 2 weeks with very limited connection.

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Mar 18, 2019

@Miouge1 I added assert check for etcd and a new checksums.
Let me know if there is anything else I should add.

For some reason CI is not happy about gather_facts=true, am I doing it wrong?

@nmiculinic
Copy link
Contributor

@nmiculinic nmiculinic commented Apr 23, 2019

I've managed to install it now. (( only had to apply patched I provided few comments earlier ))

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Apr 23, 2019

Thanks, I added dummy checksums for the calicoctl_binary_checksums

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Apr 23, 2019

CI is broken in master, once it's fixed in master you can rebase to get CI to run.

@lwolf lwolf force-pushed the arm branch 2 times, most recently from 1295039 to ab12c34 Apr 23, 2019
@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Apr 24, 2019

what's the right way to restart failed builds (for not code related reasons) - amend last commit or ask somebody to trigger restart?

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Apr 24, 2019

@Miouge1 could you please advice.
I can't make architecture check pass the tests.

If I set gather_facts: True it fails with python not found
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/201813766

If I set gather_facts: False it fails with ansible_architecture is undefined.
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/201820403

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Apr 27, 2019

@lwolf special access is required to Retry CI jobs, otherwise an amend and force push can retry the whole pipeline.

As we are trying to improve CI coverage, there has been significant changes in CI in the past few weeks.

@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Apr 27, 2019

I see, thanks. I wasn't sure that amending is the right way to do it.
I removed the new architecture check that was causing build to fail, and now it seems fine.
Is there anything else that needs to be done to get this merged?

@woopstar
Copy link
Member

@woopstar woopstar commented May 1, 2019

@Miouge1 what do you think of this?

@Miouge1
Copy link
Member

@Miouge1 Miouge1 commented Jun 5, 2019

@woopstar this looks good to me

/lgtm

@woopstar
Copy link
Member

@woopstar woopstar commented Jun 5, 2019

/approve

@lwolf can you do a new PR where we add the latest ARM checksums for 1.14.1 and 1.14.2 etc. There is also some for 1.13 missing

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Jun 5, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lwolf, woopstar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 07cb8eb into kubernetes-sigs:master Jun 5, 2019
18 of 19 checks passed
unbreakab1e added a commit to joomcode/kubespray that referenced this issue Jun 5, 2019
…ernetes-sigs#4261)

* Add support for arm images for hyperkube, kubeadm and cni_binary

* Add dummy etcd checksum for arm

This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.

* Add etcd host assert check

* Add 1.13.4 checksums of kubeadm and hyperkube for arm

* Update checksums of kubeadm and hyperkube for arm

* Add dummy checksums for calicoctl_binary_checksums dict

* disable gather_facts because it causes tests to fail

* Remove architecture check for etcd, due to unable to run tests
@lwolf
Copy link
Contributor Author

@lwolf lwolf commented Jun 6, 2019

thanks @woopstar , I created a PR with missing checksums #4850

LuckySB added a commit to southbridgeio/kubespray that referenced this issue Aug 4, 2019
…ernetes-sigs#4261)

* Add support for arm images for hyperkube, kubeadm and cni_binary

* Add dummy etcd checksum for arm

This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.

* Add etcd host assert check

* Add 1.13.4 checksums of kubeadm and hyperkube for arm

* Update checksums of kubeadm and hyperkube for arm

* Add dummy checksums for calicoctl_binary_checksums dict

* disable gather_facts because it causes tests to fail

* Remove architecture check for etcd, due to unable to run tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants