Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cross-compiling node-image #2176

Merged
merged 9 commits into from
May 14, 2021

Conversation

BenTheElder
Copy link
Member

@BenTheElder BenTheElder commented Apr 2, 2021

specifically intended to resolve #166

changes:

  • deprecate --kube-root in favor of making it a require argument (this is not strictly related, but I am reworking node image build bit by bit and started there)
  • switch to pulling images with containerd in the running base, instead of to the host. this breaks offline kubernetes development but the previous approach with docker pull / save / export cannot be used reliably for multi-arch
    • fully qualify images to prepull (needed by ctr)
  • fix image architecture in upstream built images, some of which are have the wrong metadata
  • move architecture support to warning (not strictly necessary, but when people ask for new architectures to be possible to build we can allow it without explicitly supporting them, initially)
  • add --arch flag to kind build node-image
  • update push-build.sh to push a multi-arch image using this cross compile mechanism
  • update the default node image to a multi-arch image

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 2, 2021
@BenTheElder
Copy link
Member Author

BenTheElder commented Apr 2, 2021

I've been maintaining a test image at bentheelder/kind-node:arm64-test.
I do not have arm hardware so I can only do pretty minimal testing easily.

kubernetes/release#1842 is relevant, we just patch the image config to workaround this which seems to be fine so far. we may also need this for kubernetes/kubernetes#99656

Switching away from pulling images to the host docker will cause two regressions:

  • users building node images and leveraging a mirror / proxy
  • users building node images offline (currently possible if you build once and none of the pulled / non-built images change)

However without switching away we are blocked by moby/moby#36552 (comment)

docker save does not even have a --platform option, so we'll need a different approach one way or another.

we could re-introduce the pull-to-host approach when we're confident that we're building for the same platform as the host, but I'm not sure this is worth prioritizing right now, these only affect building node images in specific cases, not consuming built images.

because manifest lists can only exist on a registry and not in local docker, kind build node-image is not building multiple architectures at once, instead it can be invoked for each architecture and docker manifest used to construct a multi-arch combined manifest image in the registry.

I will update our push-node.sh to do this.

@BenTheElder
Copy link
Member Author

/retest

@BenTheElder
Copy link
Member Author

1.15/1.16 e2e images have a docker version that is too old for the multi-arch --run without enabling experimental daemon options.

@BenTheElder
Copy link
Member Author

We can probably just spin down these old jobs.

One other annoying aspect remaining besides code cleanup is extremely noisy output from the image pulling now. I'll get back to that tomorrow, as well as updating push-node.sh to produce a multi-arch image using this.

This is done enough though that you can do kind build node-image --arch=arm64 on a non-arm64 host and it should still produce an image for arm64, so it's ready for testing. I've pre-built one for v1.20.2 at bentheelder/kind-node:arm-test.

}
defer f.Close()
return importer.LoadCommand().SetStdout(os.Stdout).SetStderr(os.Stdout).SetStdin(f).Run()
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always returns nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes? Because the contract is a method that returns an error type but we are only running one thing and it will error and that's fine. We check later that the images are loaded.

Ctr pull does not have the --no-unpack option unlike the option I added to import.

@@ -117,6 +117,15 @@ func EditArchiveRepositories(reader io.Reader, writer io.Writer, editRepositorie
return err
}
hdr.Size = int64(len(b))
// edit image config when we find that
} else if strings.HasSuffix(hdr.Name, ".json") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very obscure, so every json but manifest.json is an image config manifest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they will be named <hash>.json. we could get the names from the manifest.json, but then we need to do two passes over the tar or depend on ordering or ...

we could do this, but it doesn't seem necessary. images following the spac should only contain manifest.json and <hash>.json .json files where the latter are image config files.

links to the spec are in the comments above https://github.com/moby/moby/blob/master/image/spec/v1.2.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a typical image archive only has two .json files in it, but it's possible to create a tarball with multiple images, in which case there is one manifest.json and one config hash.json.

The manifest contains an array of structs each with the config file name for an image, it's tags, and it's layers. There is also the legacy repositories file for the tags.

The config file specifies the non-layer data like architecture/os, entrypoint, env, etc. for one image. We are patching the architecture field to avoid kind only working on highly patched kubernetes builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a lot of information, do you think future us can figure it out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are comments and there are links to the spec, which is not very lengthy :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@BenTheElder BenTheElder mentioned this pull request Apr 2, 2021
@@ -87,7 +87,7 @@ func GetArchiveTags(path string) ([]string, error) {
// https://github.com/moby/moby/blob/master/image/spec/v1.md
// https://github.com/moby/moby/blob/master/image/spec/v1.1.md
// https://github.com/moby/moby/blob/master/image/spec/v1.2.md
func EditArchiveRepositories(reader io.Reader, writer io.Writer, editRepositories func(string) string) error {
func EditArchive(reader io.Reader, writer io.Writer, editRepositories func(string) string, architectureOverride string) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to do this because kubernetes has frequently produced images with the wrong architecture metadata when cross compiling images, which prevents success importing / unning them under containerd which is actually checking for the matching image. dockerd is pretty loose with using these but does throw warnings.

@aojea
Copy link
Contributor

aojea commented Apr 4, 2021

/retest
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 4, 2021
@BenTheElder
Copy link
Member Author

I will finish this up when I'm back Monday (at minimum: reduce excessive terminal / log spam from image loading, rebase).

In the meantime I think we should hold off v0.11 until this is good and landed @amwat.

@egandro
Copy link

egandro commented Apr 12, 2021

Following here. Just notice me here - I have a Pi400 and a Pi4/8gb for testing.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2021
@BenTheElder
Copy link
Member Author

rebased and "fixed" the excessive noise issue (silenced output from images pulled to node by importer).

should still add multi-arch to push-node.sh

this will make more sense as we expand the accepted values and as GOPATH goes away in 1.17 and setting this essentially becomes required when we can't autodetect the kubernetes source checkout reliably
- switch to pulling images with containerd in the running base, instead of to the host. this breaks offline kubernetes development but the previous approach with docker pull / save / export cannot be used reliably for multi-arch
- fully qualify images to prepull (needed by ctr)
- fix image architecture in upstream built images, some of which are have the wrong metadata
- move architecture support to warning
@BenTheElder BenTheElder changed the title WIP: support cross-compiling node-image support cross-compiling node-image May 14, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2021
@BenTheElder
Copy link
Member Author

this is ~good enough:

$ DOCKER_CLI_EXPERIMENTAL=enabled docker manifest inspect kindest/node:v1.21.1
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2821,
         "digest": "sha256:21a1f4a2ff8e62abab5286db9a6d13734b62b702a61247a2500ea3289d2276dd",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2821,
         "digest": "sha256:0e501b88eda8c7618ae9308c07f9d14dd6bb7835dfb83836a3ef0cfec50ba190",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 14, 2021

@BenTheElder: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kind-conformance-parallel-1-16 0a1d35d link /test pull-kind-conformance-parallel-1-16
pull-kind-conformance-parallel-1-15 0a1d35d link /test pull-kind-conformance-parallel-1-15

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@BenTheElder
Copy link
Member Author

note: 1.15 and 1.16 will fail because the docker installed in those environments is simply too old. we will need to retire these jobs, Kubernetes just did the last 1.18 patch release ...

@aojea
Copy link
Contributor

aojea commented May 14, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 14, 2021
@k8s-ci-robot k8s-ci-robot merged commit 247fea5 into kubernetes-sigs:main May 14, 2021
@spiffxp
Copy link
Member

spiffxp commented May 14, 2021

I'm guessing this means kind will no longer support 1.15 and 1.16 in its next release? Do we call out supported versions somewhere?

@aojea
Copy link
Contributor

aojea commented May 14, 2021

I'm guessing this means kind will no longer support 1.15 and 1.16 in its next release? Do we call out supported versions somewhere?

There at least 2 places in the docs, "principles" and "roadmap" that talk about

Support all currently upstream-supported Kubernetes versions

@BenTheElder
Copy link
Member Author

It doesn't mean 1.15/1.16 are totally unsupported it just means we won't be able to run e2e. They will still receive best effort support.

@BenTheElder BenTheElder deleted the new-node-image branch May 14, 2021 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support arm64
6 participants