coredns:v1.8.0 and etcd:3.4.13-0 on arm64 have the wrong architecture in their manifests; kubeadm init fails #104085

sysedwinistrator · 2021-08-02T21:39:50Z

What happened:

Cluster initialization on arm64 via kubeadm init using CRI-O would fail after a timeout.
watch crictl ps wouldn't show any containers starting.
CRI-O logs show the following:

Aug 02 20:42:19 n2 crio[852185]: time="2021-08-02 20:42:19.803721746+02:00" level=info msg="Image operating system mismatch: image uses OS \"linux\"+architecture \"amd64\", expecting one of \"linux+arm64\""

The inspection of the images' manifests seems to confirm the initial assumption that some of them were pulled with the wrong architecture:

# crictl images | awk 'NR>1 {print $3}' | while read -r line; do crictl inspecti $line | jq '"IMAGE: "+.status.repoTags[]+", ARCH: "+ .info.imageSpec.architecture'; done
"IMAGE: k8s.gcr.io/coredns/coredns:v1.8.0, ARCH: amd64"
"IMAGE: k8s.gcr.io/etcd:3.4.13-0, ARCH: amd64"

But all the other images are fine:

"IMAGE: k8s.gcr.io/kube-apiserver:v1.21.3, ARCH: arm64"
"IMAGE: k8s.gcr.io/kube-controller-manager:v1.21.3, ARCH: arm64"
"IMAGE: k8s.gcr.io/kube-proxy:v1.21.3, ARCH: arm64"
"IMAGE: k8s.gcr.io/kube-scheduler:v1.21.3, ARCH: arm64"
"IMAGE: k8s.gcr.io/pause:3.4.1, ARCH: arm64"

To rule out that CRI-O is somehow at fault, I ran the images through skopeo inspect, which confirmed the previous results even when passing --override-arch arm64.

So I checked what architecture the binaries in the container storage location actually are:

# find . -name 'etcd*' -exec file {} \;
./22bafa088de22425c1186532aaaae80ee77823a4dba938b6a09310313804e001/usr/local/bin/etcd: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=C2gD-OhOm0EaEE6O5P-G/E0Rtpt8guBmaGHRO19gQ/OcfznmXfak8UQlpC_k77/c5upPp32vabmtzAPWE5y, not stripped
......
./22bafa088de22425c1186532aaaae80ee77823a4dba938b6a09310313804e001/usr/local/bin/etcdctl: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=Cj-D8mbvp8S4ehUJ-NSL/_qmGy8_7NS8-Lo8YkEWC/dPf7-SxGllpI-tniKRD5/7fldA6Ky3GsgJkND3PO2, not stripped
......
# find . -name 'coredns*' -exec file {} \;
./275d9308c04eff2ed9bc4723b14ea6e45baba29bf6c0bff6ce8ff5353fafa29f/coredns: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=39m9lQWm4aqmpdYUY6OJ/iNA5d9riIL95H_GDUSTD/H-Quk1-OcGkg7U4wkTDn/akmRBs5wZkoZ_CH88xMx, stripped

These are arm64 binraries, the images' manifests are wrong.

I reproduced this issue on another arm64 host via podman. The image digests are the same, and podman inspect also shows the architecture to be amd64. Unlike CRI-O, however, podman can run the images.

What you expected to happen:

That the images of the correct architecture also list that correct architecture in their manifests.

How to reproduce it (as minimally and precisely as possible):

kubeadm init with CRI-O should do the trick.

Anything else we need to know?:

This has already been reported in #99656.
But both etcd and coredns still show amd64 on their latest tags.

Environment:

Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"archive", BuildDate:"2021-08-02T11:40:12Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/arm64"}
Cloud provider or hardware configuration: ODROID-N2 SBC
OS (e.g: cat /etc/os-release): Arch Linux ARM
Kernel (e.g. uname -a): 5.13.3 custom based on Arch Linux ARM config
Install tools: kubeadm
Network plugin and version (if this is a network-related bug):
Others: CRI-O

The text was updated successfully, but these errors were encountered:

sysedwinistrator · 2021-08-02T21:46:38Z

@sysedwinistrator: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:
* `/sig <group-name>`

* `/wg <group-name>`

* `/committee <group-name>`
Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/sig release

sysedwinistrator · 2021-08-03T14:39:47Z

It turns the wrong architecture in the manifests was not the reason the pods weren't starting.
I couldn't quite figure out what was the issue. Cluster initialisation would work after kubeadm reset, completely resetting the container storage and "stopping" all cgroup slice units related to k8s.

Since the latest images of etcd and coredns are still affected, I'm going to keep the issue open.

saschagrunert · 2021-08-03T15:08:20Z

CoreDNS as well as etcd are released in their own process and are not in the scope of SIG Release. For etcd, the image lives there: https://github.com/kubernetes/kubernetes/blob/master/cluster/images/etcd

Indeed, the architectures for non amd64 images are wrong, so we can do two things:

use the --from syntax in the Dockerfile

kubernetes/cluster/images/etcd/Dockerfile

Line 29 in 9ff3b7e

FROM ${RUNNERIMAGE}
switch to buildx like we did for other images

I assume the same applies to CoreDNS, but I'm not sure right now where the image is being build.

neolit123 · 2021-08-03T15:41:23Z

I reproduced this issue on another arm64 host via podman. The image digests are the same, and podman inspect also shows the architecture to be amd64. Unlike CRI-O, however, podman can run the images.

is there a way to tell cri-o to ignore the mismatch in the images temporary?
we should certainly fix these images though.

https://github.com/kubernetes/kubernetes/blob/master/cluster/images/etcd

cc @jpbetz @wenjiaswe

I assume the same applies to CoreDNS, but I'm not sure right now where the image is being build.

cc @chrisohaver @johnbelamaric @rajansandeep

i think the image is build from the Docker file at:
https://github.com/coredns/coredns

/sig api-machinery network

neolit123 · 2021-08-03T15:55:01Z

related:
#102315

note, this test is catching problems in etcd, coredns, kube-proxy, conformance image:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all#periodic-manifest-lists
https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-kubernetes-e2e-manifest-lists/1422389464386244608/build-log.txt

WARNING: in config digest sha256:72d94efa1c317db2557d31f62a3384ef4959e5db59fdf9e49ca121a1afe5021e: found architecture "amd64", expected "arm"

WARNING: in config digest sha256:1a771bad15fcca72ab59198b3724a42871b3197bd8afc88feff53a2d045ac8db: found architecture "amd64", expected "arm"

it's just reporting warnings because otherwise this would fail the entire test job and some CRs tolerate the arch mismatch.

saschagrunert · 2021-08-04T08:58:42Z

Fixes for coredns and etcd are in flight.

caesarxuchao · 2021-08-05T20:09:58Z

/triage accepted

saschagrunert · 2021-08-11T08:23:59Z

The issue should be solved with the next Kubernetes minor release (v1.23) for etcd, and with the next coredns release.

sysedwinistrator added the kind/bug Categorizes issue or PR as related to a bug. label Aug 2, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 2, 2021

k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 2, 2021

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Aug 3, 2021

This was referenced Aug 4, 2021

Use docker buildx for etcd image #104116

Merged

Use docker buildx for release image coredns/coredns#4779

Merged

justaugustus added this to the v1.23 milestone Aug 4, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 5, 2021

saschagrunert closed this as completed Aug 11, 2021

saschagrunert mentioned this issue Oct 5, 2021

Use docker buildx for etcd image #105484

Merged

BenTheElder mentioned this issue Apr 6, 2023

Inconsistency about CPU architecture of VPA images between manifest and image attribute kubernetes/autoscaler#5667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coredns:v1.8.0 and etcd:3.4.13-0 on arm64 have the wrong architecture in their manifests; kubeadm init fails #104085

coredns:v1.8.0 and etcd:3.4.13-0 on arm64 have the wrong architecture in their manifests; kubeadm init fails #104085

sysedwinistrator commented Aug 2, 2021

sysedwinistrator commented Aug 2, 2021

sysedwinistrator commented Aug 3, 2021

saschagrunert commented Aug 3, 2021 •

edited

Loading

neolit123 commented Aug 3, 2021

neolit123 commented Aug 3, 2021 •

edited

Loading

saschagrunert commented Aug 4, 2021

caesarxuchao commented Aug 5, 2021

saschagrunert commented Aug 11, 2021

coredns:v1.8.0 and etcd:3.4.13-0 on arm64 have the wrong architecture in their manifests; kubeadm init fails #104085

coredns:v1.8.0 and etcd:3.4.13-0 on arm64 have the wrong architecture in their manifests; kubeadm init fails #104085

Comments

sysedwinistrator commented Aug 2, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

sysedwinistrator commented Aug 2, 2021

sysedwinistrator commented Aug 3, 2021

saschagrunert commented Aug 3, 2021 • edited Loading

neolit123 commented Aug 3, 2021

neolit123 commented Aug 3, 2021 • edited Loading

saschagrunert commented Aug 4, 2021

caesarxuchao commented Aug 5, 2021

saschagrunert commented Aug 11, 2021

saschagrunert commented Aug 3, 2021 •

edited

Loading

neolit123 commented Aug 3, 2021 •

edited

Loading