Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quay.io/calico/cni:v3.16.5-arm64 - incorrect architecture in manifest and inside the image? #4256

Closed
artemry-nv opened this issue Dec 14, 2020 · 23 comments · Fixed by projectcalico/node#1044

Comments

@artemry-nv
Copy link

# docker manifest inspect --verbose quay.io/calico/cni:v3.16.5-arm64
{
        "Ref": "quay.io/calico/cni:v3.16.5-arm64",
        "Descriptor": {
                "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                "digest": "sha256:865ad558ba82230b0d388234b7a1d6ed732e454d0102c0a55890dbc6a7b12456",
                "size": 946,
                "platform": {
                        "architecture": "amd64",
                        "os": "linux"
                }
        },
        "SchemaV2Manifest": {
                "schemaVersion": 2,
                "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                "config": {
                        "mediaType": "application/vnd.docker.container.image.v1+json",
                        "size": 2580,
                        "digest": "sha256:f0a2aae8859671bf87c640e57792e3aa96620c086c8345f9879bd5e5aa165894"
                },
                "layers": [
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 18092,
                                "digest": "sha256:3eb1fafec4d364bb44591106c04bccb36ee6101fa9a637377849d6efcf13230d"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 4064,
                                "digest": "sha256:3bc29d15d8a8bb939e4ee7741d8783ff57b7a7efaa48e3214d4e8c1f9b44e953"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 42446614,
                                "digest": "sha256:4224829040d2f578aa392429f195f88d94f87edaec6f1e36fde7dcf16ccf2286"
                        }
                ]
        }
}

Is it correct that there's "architecture": "amd64"? And it looks like there're x86_64 binaries inside.
For comparison older versions:

# docker manifest inspect --verbose quay.io/calico/cni:v3.15.3-arm64
{
        "Ref": "quay.io/calico/cni:v3.15.3-arm64",
        "Descriptor": {
                "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                "digest": "sha256:7db5bb10620b3af64b141bd2d3ad959b97995c77d6c7e0707f51a8b5ea784719",
                "size": 1573,
                "platform": {
                        "architecture": "arm64",
                        "os": "linux"
                }
        },
        "SchemaV2Manifest": {
                "schemaVersion": 2,
                "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                "config": {
                        "mediaType": "application/vnd.docker.container.image.v1+json",
                        "size": 3459,
                        "digest": "sha256:17e2092f2b36647c82c9ea183c61c68fbaef8cf70a536b54f8227c597dc0054a"
                },
                "layers": [
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 20340179,
                                "digest": "sha256:007027d142c80b166a004bc7265c04036b80df438ac408f1a947e05c581b418e"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 18119,
                                "digest": "sha256:fd6edb08e6506774935240eee44804bab5e75915dc2b63b79ee597dd63a405f1"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 4061,
                                "digest": "sha256:3b95d38f1578719f095a2e3156db9d9267bc447acb84009737ea4b1794ab8b65"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 30695849,
                                "digest": "sha256:b8b44482720a38da86eb0b54d7dd09c38254fe941c60cdb7f1b5a7f06a394dd7"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 3087,
                                "digest": "sha256:8c7602656f2e814f511c5c463a4f33a44d654a78a01ec3bb43c97ada97239f94"
                        },
                        {
                                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                "size": 410,
                                "digest": "sha256:34fcbf8be9e764a3486fea0516d1c28f9bf94547a5a19bf1c566aa91f55bf8c0"
                        }
                ]
        }
}
@caseydavenport
Copy link
Member

Yeah, this looks wrong to me. Not sure how that happened.

@caseydavenport
Copy link
Member

So, I think this might just be a display issue with docker manifest inspect.

I get the same result as you, but when I try to run this on an amd64 machine I get the following:

$ docker run quay.io/calico/cni:v3.16.5-arm64                                                                                                                                                                                    
standard_init_linux.go:211: exec user process caused "exec format error"

I don't have an arm64 env to try this on, though.

@mrunge
Copy link

mrunge commented Dec 31, 2020

There is something fishy with v3.16.5

docker run quay.io/calico/cni:v3.16.5-arm64
Unable to find image 'quay.io/calico/cni:v3.16.5-arm64' locally
v3.16.5-arm64: Pulling from calico/cni
4d09abd83388: Pull complete 
bd21986765d6: Pull complete 
3965a0607b1a: Pull complete 
Digest: sha256:4c4c854b40e8bcfa5a51c8160c3d8afcfdc8713ca32abc61b0aeadabcf7350e1
Status: Downloaded newer image for quay.io/calico/cni:v3.16.5-arm64
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
time="2020-12-31T15:46:21Z" level=info msg="/host/opt/cni/bin is not writeable, skipping"
time="2020-12-31T15:46:21Z" level=info msg="/host/secondary-bin-dir is not writeable, skipping"
time="2020-12-31T15:46:21Z" level=fatal msg="open /host/etc/cni/net.d/10-calico.conflist: no such file or directory" source="install.go:389"

@lht
Copy link

lht commented Jan 6, 2021

@caseydavenport This is not just a display issue. The architect field is indeed incorrectly set. You can also test this with run --platform linux/arm64 on amd64 machines.

> docker run --rm -it  --platform linux/arm64  --entrypoint /bin/sh calico/node:v3.16.6-arm64
/ # uname -m
aarch64

> docker run --rm -it  --platform linux/arm64  --entrypoint /bin/sh calico/cni:v3.16.6-arm64
Unable to find image 'calico/cni:v3.16.6-arm64' locally
v3.16.6-arm64: Pulling from calico/cni
Digest: sha256:afd48a8e96fd6090297badf0cbeda126e3a07172eba4e35034733ea4849a224b
Status: Image is up to date for calico/cni:v3.16.6-arm64
docker: Error response from daemon: image with reference calico/cni:v3.16.6-arm64 was found but does not match the specified platform: wanted linux/arm64, actual: linux/amd64.
See 'docker run --help'.

We got below image pull error on Kubernetes (EC2 c6g node),

failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:b2410e2206f1ed88dc10f14a9a3fb396d7f5f4b117351511e9ad961bb7c319e4: not found

We noticed the same issue with calico/typha.

@caseydavenport
Copy link
Member

Yeah, ok. Definitely sounds like the image is borked then. I'll see what I can find, but given the Calico team doesn't have test infra for other architectures any help in identifying the issue in the build system would be much appreciated!

@mrunge
Copy link

mrunge commented Jan 7, 2021

The last working image is 3.15.3, question is what changed between 3.15.3 and 3.16?

@caseydavenport
Copy link
Member

Well, there seem to be substantial build changes between those two versions, namely around moving the container to a scratch-based image. So, seems likely that messed it up.

@artemry-nv
Copy link
Author

СС @lennybe

@artemry-nv
Copy link
Author

It looks like the following images are impacted for arm64 and ppc64le:
calico/cni
calico/node
calico/kube-controllers

@pando85
Copy link

pando85 commented Jan 16, 2021

I tested these images in arm64 and binaries are non working:

quay.io/calico/cni                                     v3.15.2             5dadc388f979f       39.8MB
quay.io/calico/cni                                     v3.15.3             ca5564c06ea04       39.8MB
quay.io/calico/cni                                     v3.16.5             9165569ec2362       46.3MB
quay.io/calico/kube-controllers                        v3.15.2             fbbc4a1a0e98e       22.3MB
quay.io/calico/kube-controllers                        v3.16.5             1120bf0b8b414       22.4MB
quay.io/calico/node                                    v3.15.2             cc7508d4d2d4b       91.1MB
quay.io/calico/node                                    v3.16.5             c1fa37765208c       57.3MB

Finally I got it working with docker.io based ones:

docker.io/calico/cni                                   v3.16.5             f0a2aae885967       42.5MB
docker.io/calico/node                                  v3.16.5             e7826a139b1b7       39.4MB

Hope this helps anybody.

@artemry-nv
Copy link
Author

Any chances it will be fixed in the default quay.io registry?

@lmm
Copy link
Contributor

lmm commented Mar 23, 2021

Sorry for the late reply!

Any chances it will be fixed in the default quay.io registry?

If we can get fixes into the v3.15 and v3.16 branches we could have fixed images in the next patch release in those streams.

@artemry-nv
Copy link
Author

Sorry for the late reply!

Any chances it will be fixed in the default quay.io registry?

If we can get fixes into the v3.15 and v3.16 branches we could have fixed images in the next patch release in those streams.

Any updates?

@aquam8
Copy link

aquam8 commented Jun 28, 2021

Any chances this get fixed in the images pushed into quay.io please? Using docker hub introduces the well know rate-limiting constraints that we do not want for components as critical as Calico.
Thank you very much

@aquam8
Copy link

aquam8 commented Jun 28, 2021

Duplicate of #4692 it would seem

@lwr20
Copy link
Member

lwr20 commented Jun 29, 2021

@frozenprocess are you seeing this in your work for https://github.com/projectcalico/node/pull/1044/files ?

@frozenprocess
Copy link
Collaborator

@lwr20 Yep

docker manifest inspect --verbose rezareza/calico-node:latest-arm64
{
	"Ref": "docker.io/rezareza/calico-node:latest-arm64",
	"Descriptor": {
		"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
		"digest": "sha256:a2bad0161c075987f4a86f3f3ecbe00100a2c95ac54a11b339baa64dff6f0265",
		"size": 737,
		"platform": {
			"architecture": "amd64",
			"os": "linux"
		}
	},
	"SchemaV2Manifest": {
		"schemaVersion": 2,
		"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
		"config": {
			"mediaType": "application/vnd.docker.container.image.v1+json",
			"size": 2718,
			"digest": "sha256:bf6f8bc2b3012752e34db68635985d2398b0dc7f9f298984fb03c3a766dbf0b5"
		},
		"layers": [
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 56921744,
				"digest": "sha256:c0ba239412730599850413b68c2eccc3ceb0afb5586aa3427ec9b4fe89f8fed8"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 4066,
				"digest": "sha256:bb033881836cec702fdfbd11fd326a72ff31fe34b8eae64f567bb6ba1b466555"
			}
		]
	}
}

@lwr20
Copy link
Member

lwr20 commented Jun 29, 2021

@frozenprocess But you've actually run AMD64 clusters, right?

Is the problem that Quay AMD64 images are wrong or is it that the Quay manifest is wrong?

@frozenprocess
Copy link
Collaborator

frozenprocess commented Jun 29, 2021

@lwr20
At this point I'm 99% sure it was an ARM64 cluster, checking it again.
Adding --platform linux/arm64/v8 to docker build statement in the MakeFile fixed my problem.

 docker manifest inspect --verbose rezareza/calico-node:latest-arm64v1 | egrep -i arch
			"architecture": "arm64",

@aquam8
Copy link

aquam8 commented Jun 29, 2021

One thing I don't get here. Why is there -arm64 suffix in image name? Part of the point is that you do not want to assume the architecture across your containers. Some may run on heterogeneous node groups across AMD and ARM.
Image should be just quay.io/calico/node:3.16.4 and the right image for each architecture should be picked because it's a multi-architecture image built with manifests for arm64 and amd64.

Isn't this your expectation too?

@lwr20
Copy link
Member

lwr20 commented Jun 30, 2021

Why is there -arm64 suffix in image name?

I think this is a hangover from before Quay supported multi-arch manifest images. Agree that this should be fixed.

@caseydavenport
Copy link
Member

@frozenprocess is this one that you plan on looking at as part of your arch work?

@frozenprocess
Copy link
Collaborator

frozenprocess commented Aug 17, 2021

@caseydavenport yes. --target-platform can fix this issue.
https://github.com/projectcalico/node/pull/1044/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R269

an image created using the PR

docker inspect rezareza/calico-node:latest | egrep -i arch
        "Architecture": "arm64",

caseydavenport pushed a commit to projectcalico/node that referenced this issue Aug 30, 2021
Fix: arm64 dockerfile synced with amd64 -> https://github.com/projectcalico/node/issues/524
Add: A new argument`--platform` in docker build phase to address incorrect architecture problem. -> projectcalico/calico#4256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants