Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[build/debian*] Cleanup of image building and allow building on Darwin #80909

Closed
wants to merge 3 commits into from
Closed

[build/debian*] Cleanup of image building and allow building on Darwin #80909

wants to merge 3 commits into from

Conversation

hoegaarden
Copy link
Contributor

@hoegaarden hoegaarden commented Aug 2, 2019

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

  • Cleanup docker build mechanism & simplify the makefiles:
    • Use multistage builds
    • No seding of the Dockerfile
    • remove the use of temp directories where possible
  • Allows building those images for different archs on a MacOS machine
  • Allows to skip the binfmt_misc multiarch setup
  • Adds a config for building the images on GCB (for later integration with prow/builder)

Which issue(s) this PR fixes:

none

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 2, 2019
@hoegaarden
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/release Categorizes an issue or PR as relevant to SIG Release. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2019
@hoegaarden hoegaarden changed the title Allow building debian-base images on MacOS & some cleanup Allow building debian-* images on MacOS & build cleanup Oct 11, 2019
@hoegaarden
Copy link
Contributor Author

/retest

@hoegaarden
Copy link
Contributor Author

/assign @tallclair @thockin

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I know about this is what I hacked. @tallclair has fingerprints on this..

@dims
Copy link
Member

dims commented Oct 12, 2019

/assign @mkumatag

@thockin thockin removed their assignment Nov 15, 2019
@hoegaarden hoegaarden changed the title Allow building debian-* images on MacOS & build cleanup [build/debian*] Cleanup of image building and allow building on Darwin Nov 28, 2019
- Use multistage builds for the base
- Don't sed Dockerfiles, use build args instead
- Make (multiarch) builds work on Darwin
- Hide not so useful output
When trying to run the container builds on GCB in parallel steps I see
IO errors, which I believe are triggered when we try to register binfmt
rules in parallel.

When `SKIP_REGISTER_MULTIARCH` is a non-empty string, the registration
will be skipped.

I also took the opportiny to split that out into its own target.

----

Example errors from a GCB run:

```
Step #6 - "s390x": /workspace/go/src/k8s.io/kubernetes/build/debian-hyperkube-base/../../third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh: 246: echo: echo: I/O error
Step #6 - "s390x": /workspace/go/src/k8s.io/kubernetes/build/debian-hyperkube-base/../../third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh: 246: echo: echo: I/O error
Step #6 - "s390x": /workspace/go/src/k8s.io/kubernetes/build/debian-hyperkube-base/../../third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh: 246: echo: echo: I/O error
Step #6 - "s390x": /workspace/go/src/k8s.io/kubernetes/build/debian-hyperkube-base/../../third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh: 246: echo: echo: I/O error
Step #6 - "s390x": /workspace/go/src/k8s.io/kubernetes/build/debian-hyperkube-base/../../third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh: 246: echo: echo: I/O error
Step #6 - "s390x": /workspace/go/src/k8s.io/kubernetes/build/debian-hyperkube-base/../../third_party/multiarch/qemu-user-static/register/qemu-binfmt-conf.sh: 246: echo: echo: I/O error
```
@hoegaarden
Copy link
Contributor Author

/retest

@justaugustus
Copy link
Member

Whoa. I didn't know this PR existed.
/hold for discussion in the Releng meeting today
/area release-eng
/milestone v1.18

@k8s-ci-robot k8s-ci-robot added area/release-eng Issues or PRs related to the Release Engineering subproject do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 9, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 9, 2019
@justaugustus justaugustus added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 9, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 9, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hoegaarden
To complete the pull request process, please assign lavalamp
You can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The config was deliberatly called `debian-images-cloudbuild.yaml` to
signal that this is only responsible for the debian-* images.

The images for all supported architectures are baked in parallel and
pushed to the registry ones all images have been created successfully.
## optional
_K8S_GIT_URL: https://github.com/kubernetes/kubernetes
_K8S_GIT_BRANCH: master
_BUILDER_IMAGE: gcr.io/k8s-staging-release-test/k8s-cloud-builder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justaugustus 🤷‍♂️

@@ -12,8 +12,96 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM scratch
ARG BUILD_BASE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justaugustus I know you requested to do a git rm Dockerfile{.build,} or such, to see a nicer diff. However, git / GitHub show the diff always the same if I change/delete the two files in one commit.

ARCH?=amd64
SKIP_REGISTER_MULTIARCH?=
ALL_ARCH = amd64 arm arm64 ppc64le s390x

TEMP_DIR:=$(shell mktemp -d)
QEMUVERSION=v2.9.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why not bump the version of qemu in this PR ? Lot of improvements have been made in the last releases : https://www.qemu.org/2019/04/24/qemu-4-0-0/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I would rather bump the version in a separate PR than "hiding" that upgrade in this "refactoring PR".

@tallclair
Copy link
Member

To be honest, I'm tempted to abandon debian-base in favor of the upstream debian -slim image.

Before making this decision, I would want to compare 2 data points:

  • Image size of addons using debian-base compared to debian-slim (I don't think we should care that strongly about image size, but we should at least ACK the change)
  • Average number of CVEs in images over time. I.e. grab a couple old versions of debian-base, and a debian-slim image from around the same time and compare the number of CVEs (broken down by fixed) in each. This was the original motivation behind creating debian-base.

Any other arguments against migrating to debian-slim?

@justaugustus
Copy link
Member

I really like that idea, @tallclair!

@hoegaarden @cpanato @ameukam -- Can you work on what Tim mentioned above together?

@justaugustus
Copy link
Member

@hoegaarden @cpanato @ameukam -- Any update w.r.t. Tim's suggestion?

@hoegaarden
Copy link
Contributor Author

hoegaarden commented Jan 14, 2020

TL;DR: I think the base images might still make sense, and it might be more useful to invest in making the building/promotion process open and invest in processes and tooling around that.

What I have learned:

  • debian-base is currently based on debian:buster-slim
  • debian-ipables & debian-hyperkube-base are based on debian-base
  • The repos we publish the images to are suffixed with the architecture (e.g. k8s.gcr.io/debian-base-arm64), but we also publish manifests without the architecture suffix, those point to the images in the suffixed repos (e.g. k8s.gcr.io/debian-base)
  • k8s.gcr.io/kube-* are based on debian-base and debian-iptables (here & here), k8s.io/etcd is based on debian-base (here), ks8.gcr.io/hyperkube is based on debian-hyperkube-base
  • there are other images based on k8s.gcr.io/debian-* in k/k, that are mostly test images (see further down)
  • there are other images out there which are based on k8s.gcr.io/debian-* (see further down)

What I assume or believe I to have heard (might be totally wrong though):

  • Most / all of the mentioned images are built manually (= not triggered by a commit to k/k or so) by someone inside google
  • Images end up in staging-k8s.gcr.io
  • Images are picked up by an google internal image promoter which promotes them from staging-k8s.gcr.io to k8s.gcr.io
  • It might be possible for the OSS image promoter to promote images to k8s.gcr.io without breaking the google internal image promotion

What I don't know:

  • Which environments do and don't need the qemu emulators and their binfmt_misc registration. (e.g. a recent Docker for Mac does seem to not need it, GCB [and therefor other linux envs?] seems to need it)?

Personal take:

  • I think it is still useful to maintain those base images. I like, that we have 1 base image, 2 intermediate (iptables, hyperkube-base) which everything is based off of and allows us to just COPY/ADD specific files into.
  • If it turns out that we need the qemu emulator and the binfmt_misc registration, then I would definitely opt for having a base image which handles that.
  • We need to get better / faster / more open with the image building / updating. Things that might help are:
    • more people which are able to promote images (that might be possible if and when we are able to use the image promoter to promote to k8s.gcr.io without messing up "anything else")
    • the image build auto-triggering on any update of the base image (currently debian:buster-slim), push it to staging and run tests against
    • some agreed upon, ideally automated, gates for promotion
    • have as many images as possible auto-update (in staging) when we bump the base image

Images in k/k based on k8s.gcr.io/debian-* images
→ git grep 'debian-' -- ':!*README*' ':!vendor/' ':!*CHANGELOG*'
build/BUILD:        "base": "@debian-base-{ARCH}//image",
build/BUILD:        "base": "@debian-base-{ARCH}//image",
build/BUILD:        "base": "@debian-base-{ARCH}//image",
build/BUILD:        "base": "@debian-iptables-{ARCH}//image",
build/common.sh:    "kube-apiserver,${KUBE_BASE_IMAGE_REGISTRY}/debian-base-${arch}:${debian_base_version}"
build/common.sh:    "kube-controller-manager,${KUBE_BASE_IMAGE_REGISTRY}/debian-base-${arch}:${debian_base_version}"
build/common.sh:    "kube-scheduler,${KUBE_BASE_IMAGE_REGISTRY}/debian-base-${arch}:${debian_base_version}"
build/common.sh:    "kube-proxy,${KUBE_BASE_IMAGE_REGISTRY}/debian-iptables-${arch}:${debian_iptables_version}"
build/debian-base/Makefile:IMAGE ?= $(REGISTRY)/debian-base
build/debian-hyperkube-base/Makefile:IMAGE?=$(REGISTRY)/debian-hyperkube-base
build/debian-hyperkube-base/Makefile:BASEIMAGE=k8s.gcr.io/debian-base-$(ARCH):0.4.1
build/debian-iptables/Makefile:IMAGE=$(REGISTRY)/debian-iptables
build/debian-iptables/Makefile:BASEIMAGE?=k8s.gcr.io/debian-base-$(ARCH):v2.0.0
build/dependencies.yaml:    - path: build/debian-hyperkube-base/Makefile
build/workspace.bzl:            name = "debian-base-" + arch,
build/workspace.bzl:            repository = "debian-base",
build/workspace.bzl:            name = "debian-iptables-" + arch,
build/workspace.bzl:            repository = "debian-iptables",
build/workspace.bzl:            name = "debian-hyperkube-base-" + arch,
build/workspace.bzl:            repository = "debian-hyperkube-base",
cluster/addons/addon-manager/Makefile:BASEIMAGE=k8s.gcr.io/debian-base-$(ARCH):v1.0.0
cluster/images/conformance/BUILD:    base = "@debian-hyperkube-base-{ARCH}//image",
cluster/images/conformance/Makefile:BASEIMAGE=k8s.gcr.io/debian-hyperkube-base-$(ARCH):0.12.1
cluster/images/etcd-empty-dir-cleanup/Dockerfile:FROM k8s.gcr.io/debian-base:v1.0.0
cluster/images/etcd/Makefile:    BASEIMAGE?=k8s.gcr.io/debian-base:v1.0.0
cluster/images/etcd/Makefile:    BASEIMAGE?=k8s.gcr.io/debian-base-arm:v1.0.0
cluster/images/etcd/Makefile:    BASEIMAGE?=k8s.gcr.io/debian-base-arm64:v1.0.0
cluster/images/etcd/Makefile:    BASEIMAGE?=k8s.gcr.io/debian-base-ppc64le:v1.0.0
cluster/images/etcd/Makefile:    BASEIMAGE?=k8s.gcr.io/debian-base-s390x:v1.0.0
cluster/images/hyperkube/BUILD:    base = "@debian-hyperkube-base-{ARCH}//image",
cluster/images/hyperkube/Makefile:BASEIMAGE=k8s.gcr.io/debian-hyperkube-base-$(ARCH):0.12.1
test/e2e/scheduling/ubernetes_lite_volumes.go:  imageURL := "https://www.googleapis.com/compute/v1/projects/debian-cloud/global/images/debian-7-wheezy-v20140606"
test/e2e_node/system/specs/gke.yaml:  # This is not critical, but debian-based container-vm kernel module study
test/images/jessie-dnsutils/fixup-apt-list.sh:# http://security.debian.org/debian-security/dists/jessie/updates/InRelease is missing
test/images/jessie-dnsutils/fixup-apt-list.sh:        # See: https://lists.debian.org/debian-devel-announce/2019/03/msg00006.html
test/images/nonroot/Dockerfile:FROM k8s.gcr.io/debian-base:v1.0.0
test/images/pets/peer-finder/BASEIMAGE:amd64=k8s.gcr.io/debian-base-amd64:0.4.1
test/images/pets/peer-finder/BASEIMAGE:arm=k8s.gcr.io/debian-base-arm:0.4.1
test/images/pets/peer-finder/BASEIMAGE:arm64=k8s.gcr.io/debian-base-arm64:0.4.1
test/images/pets/peer-finder/BASEIMAGE:ppc64le=k8s.gcr.io/debian-base-ppc64le:0.4.1
test/images/pets/peer-finder/BASEIMAGE:s390x=k8s.gcr.io/debian-base-s390x:0.4.1
test/images/pets/redis-installer/BASEIMAGE:amd64=k8s.gcr.io/debian-base-amd64:0.4.1
test/images/pets/redis-installer/BASEIMAGE:arm=k8s.gcr.io/debian-base-arm:0.4.1
test/images/pets/redis-installer/BASEIMAGE:arm64=k8s.gcr.io/debian-base-arm64:0.4.1
test/images/pets/redis-installer/BASEIMAGE:ppc64le=k8s.gcr.io/debian-base-ppc64le:0.4.1
test/images/pets/zookeeper-installer/BASEIMAGE:amd64=k8s.gcr.io/debian-base-amd64:0.4.1
test/images/pets/zookeeper-installer/BASEIMAGE:arm=k8s.gcr.io/debian-base-arm:0.4.1
test/images/pets/zookeeper-installer/BASEIMAGE:arm64=k8s.gcr.io/debian-base-arm64:0.4.1
test/images/pets/zookeeper-installer/BASEIMAGE:ppc64le=k8s.gcr.io/debian-base-ppc64le:0.4.1
test/images/resource-consumer/BASEIMAGE:amd64=k8s.gcr.io/debian-base-amd64:0.4.1
test/images/resource-consumer/BASEIMAGE:arm=k8s.gcr.io/debian-base-arm:0.4.1
test/images/resource-consumer/BASEIMAGE:arm64=k8s.gcr.io/debian-base-arm64:0.4.1
test/images/resource-consumer/BASEIMAGE:ppc64le=k8s.gcr.io/debian-base-ppc64le:0.4.1
test/images/resource-consumer/BASEIMAGE:s390x=k8s.gcr.io/debian-base-s390x:0.4.1


references to k8s.gcr.io/debian-* on github

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2020
@k8s-ci-robot
Copy link
Contributor

@hoegaarden: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smourapina
Copy link

smourapina commented Feb 17, 2020

Hello @hoegaarden, @cpanato & @ameukam !
Will this be finished for milestone 1.18? Bug Triage for 1.18 here, with a friendly reminder that code freeze is scheduled for 5 March.

@tallclair
Copy link
Member

@hoegaarden can you expand on why you think we should still maintain the k8s.gcr.io/debian-* images?

As I mentioned in #80909 (comment), the 2 reasons we historically used a custom base image were:

  1. Image size - I really don't think this should matter. We're not talking gigabytes, there are now official -slim variants, and caching + baselayer sharing makes this less of a concern
  2. CVE noise - Base images typically have a lot of packages that we don't use, and when those packages have CVEs detected it creates a lot of noise in our automated scanning, and causes churn if we choose to fix them.

2 seems like the only valid reason to me, but I'm not convinced that the pain of maintaining an intermediate image is worth the benefit we get from dropping ~10 packages out of ~85.

@tallclair tallclair mentioned this pull request Feb 19, 2020
14 tasks
@tallclair
Copy link
Member

tallclair commented Feb 21, 2020

Some data answering the questions in #80909 (comment):

closest build of buster-slim to debian-base:v2.0.0 (built 2019-11-01) is debian:buster-20191014-slim

  • package versions are identical

Packages removed in debian-base:

bash/now 5.0-4 amd64 [installed,local]
e2fsprogs/now 1.44.5-1+deb10u2 amd64 [installed,local]
fdisk/now 2.33.1-0.1 amd64 [installed,local]
libblkid1/now 2.33.1-0.1 amd64 [installed,local]
libcom-err2/now 1.44.5-1+deb10u2 amd64 [installed,local]
libext2fs2/now 1.44.5-1+deb10u2 amd64 [installed,local]
libfdisk1/now 2.33.1-0.1 amd64 [installed,local]
libmount1/now 2.33.1-0.1 amd64 [installed,local]
libncursesw6/now 6.1+20181013-2+deb10u1 amd64 [installed,local]
libsmartcols1/now 2.33.1-0.1 amd64 [installed,local]
libss2/now 1.44.5-1+deb10u2 amd64 [installed,local]
libtinfo6/now 6.1+20181013-2+deb10u1 amd64 [installed,local]
libuuid1/now 2.33.1-0.1 amd64 [installed,local]
mount/now 2.33.1-0.1 amd64 [installed,local]
ncurses-base/now 6.1+20181013-2+deb10u1 all [installed,local]
ncurses-bin/now 6.1+20181013-2+deb10u1 amd64 [installed,local]
sysvinit-utils/now 2.93-8 amd64 [installed,local]
tzdata/now 2019c-0+deb10u1 all [installed,local]
util-linux/now 2.33.1-0.1 amd64 [installed,local]

Comparison:

debian-base-amd64:v2.0.0 debian-buster-20191014-slim
Vulnerabilities
as of 2020-02-20
2 fixed vulnerabilities

(in systemd, libidn2)

45 total

5 fixed vulnerabilities

(in e2fsprogs, 2 in ncurses, systemd, libidn2)

49 total

Image Size 52 MB 69 MB

Observations:

  • Ideally we'd compare an older image, but v1.0.0 is based on stretch (not stretch-slim), so it's not a fair comparison.
  • Even with relatively few packages removed, debian-base cuts the vulnerabilities by more than half. That said, these are all low/med, and we mostly only care about high/critical, so more soak time is needed.
  • One of the vulnerabilities is in systemd, which we had removed in debian-base:v1.0.0, but got added back in v2.0.0. This is likely to be a source of vulnerabilities in the future.
  • debian:buster-slim is updated on approximately a monthly cadence, which means there could be up to 1 month of lag from a package being fixed to it being fixed in the base image. In practice, none of the versions changed over the 2 weeks between these builds

@justaugustus
Copy link
Member

@hoegaarden -- I'm going to take a run at removing the debian images in #88603 given @tallclair's analysis.

Depending on how much headway I make/availability I may need someone to pick up that PR and carry it over the finish line. Stay tuned.

@smourapina
Copy link

@hoegaarden, @tallclair & @justaugustus: are you still confident that you will complete this for milestone 1.18? Would like to leave a friendly reminder that code freeze happens in 1 day (5 March EOD). Thanks!

@justaugustus
Copy link
Member

We're not going to land this for 1.18 and it may end up obsoleted altogether.
/milestone v1.19

@justaugustus
Copy link
Member

justaugustus commented May 8, 2020

Apologies for allowing this to go stale.
@dims picked up the work and I've continued enabling image pushing for base images in this umbrella issue: #90698 #58012

When I have some time, I'm going to come back to this and see if we can pull out some additional functionality from what @hoegaarden set up, but for now:
/close

@k8s-ci-robot
Copy link
Contributor

@justaugustus: Closed this PR.

In response to this:

Apologies for allowing this to go stale.
@dims picked up the work and I've continued enabling image pushing for base images in this umbrella image: #90698

When I have some time, I'm going to come back to this and see if we can pull out some additional functionality from what @hoegaarden set up, but for now:
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/release Categorizes an issue or PR as relevant to SIG Release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants