Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't redownload kicbase when already loaded #15528

Merged
merged 6 commits into from Jan 3, 2023

Conversation

spowelljr
Copy link
Member

@spowelljr spowelljr commented Dec 16, 2022

Problem:
Starting minikube when the kicbase image isn't loaded into Docker (mainly first time users or first start after an update), after loading the kicbase image from cache into Docker, the kicbase image is completely redownloaded from the internet directly to Docker. This unnecessary download causes the start time to make much longer than necessary.

Cause:
The cause is that loading the image from the minikube dir into Docker the SHA of the image is lost (expected behavior). Following the load from cache is a call to download the image from the internet, before the download starts it checks if the image already exists, but it includes the SHA in that comparison, but the image loaded from the cache doesn't have the SHA so the image exists check fails, so it runs daemon.Write to download the kicbase image. daemon.Write does not respect Docker's cache, so it completely redownloads the kicbase image (385 MB) which takes a considerable amount of time.

Solution:
I learned that daemon.Write doesn't even include the hash, we do a docker pull <image> after it's downloaded to get the SHA. So I added the same to the end of the func that loads the image from the cache into Docker, only log a warning if it fails incase the user is offline. This fix resulted in ImageToDaemon not being needed anymore as the only value the function was adding was setting the hash via docker pull <image> which cache load now does, so I fully removed ImageToDaemon.

Before:

$ time minikube start
πŸ˜„  minikube v1.28.0 on Darwin 13.1 (arm64)
✨  Automatically selected the docker driver. Other choices: ssh, qemu2 (experimental)
πŸ“Œ  Using Docker Desktop driver with root privileges
πŸ‘  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
    > gcr.io/k8s-minikube/kicbase...:  0 B [_____________________] ?% ? p/s 41s
πŸ”₯  Creating docker container (CPUs=2, Memory=4000MB) ...
🐳  Preparing Kubernetes v1.25.3 on Docker 20.10.21 ...
    β–ͺ Generating certificates and keys ...
    β–ͺ Booting up control plane ...
    β–ͺ Configuring RBAC rules ...
πŸ”Ž  Verifying Kubernetes components...
    β–ͺ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
πŸ„  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

real	1m14.976s
user	0m6.138s
sys	0m10.619s

After:

$ time ./out/minikube start
πŸ˜„  minikube v1.28.0 on Darwin 13.1 (arm64)
✨  Automatically selected the docker driver. Other choices: ssh, qemu2 (experimental)
πŸ“Œ  Using Docker Desktop driver with root privileges
πŸ‘  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
πŸ”₯  Creating docker container (CPUs=2, Memory=4000MB) ...
🐳  Preparing Kubernetes v1.25.3 on Docker 20.10.21 ...
    β–ͺ Generating certificates and keys ...
    β–ͺ Booting up control plane ...
    β–ͺ Configuring RBAC rules ...
πŸ”Ž  Verifying Kubernetes components...
    β–ͺ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
πŸ„  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

real	0m31.400s
user	0m3.205s
sys	0m2.292s

Result:
43.6s or 58% reduction in start time

Additional Fix 1:
When starting minikube while offline the following confusing message is printed:
❗ minikube was unable to download gcr.io/k8s-minikube/kicbase:v0.0.36, but successfully downloaded gcr.io/k8s-minikube/kicbase:v0.0.36 as a fallback image

This was caused because the comparison behind the scenes was comparing the expected image which had a SHA and the image that was actually loaded that didn't have a SHA due to being unable to pull the SHA from the internet resulting in the comparison failing and outputting the message. Stripped the SHAs before doing the comparison to get the expected output of not printing the message.

Before (no internet connection and message is printed):

$ minikube start
πŸ˜„  minikube v1.28.0 on Darwin 13.1 (arm64)
✨  Automatically selected the docker driver. Other choices: ssh, qemu2 (experimental)
πŸ“Œ  Using Docker Desktop driver with root privileges
πŸ‘  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
❗  minikube was unable to download gcr.io/k8s-minikube/kicbase:v0.0.36, but successfully downloaded gcr.io/k8s-minikube/kicbase:v0.0.36 as a fallback image
πŸ”₯  Creating docker container (CPUs=2, Memory=4000MB) ...
❗  This container is having trouble accessing https://registry.k8s.io
πŸ’‘  To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/
🐳  Preparing Kubernetes v1.25.3 on Docker 20.10.21 ...
    β–ͺ Generating certificates and keys ...
    β–ͺ Booting up control plane ...

After (no internet connection and no message is printed)

$ minikube start
πŸ˜„  minikube v1.28.0 on Darwin 13.1 (arm64)
✨  Automatically selected the docker driver. Other choices: ssh, qemu2 (experimental)
πŸ“Œ  Using Docker Desktop driver with root privileges
πŸ‘  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
πŸ”₯  Creating docker container (CPUs=2, Memory=4000MB) ...
❗  This container is having trouble accessing https://registry.k8s.io
πŸ’‘  To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/
🐳  Preparing Kubernetes v1.25.3 on Docker 20.10.21 ...
    β–ͺ Generating certificates and keys ...
    β–ͺ Booting up control plane ...

Additional Fix 2:
When using --download-only the kicbase image download would return after the first kicbase was attempted to be downloaded, regardless if the download was successful or not. Fixed the logic so if the first image fails it will retry with fallback images as expected.

Before:

I1216 14:19:26.618571   11726 cache.go:147] Downloading gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c to local cache
I1216 14:19:26.618711   11726 image.go:60] Checking for gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c in local cache directory
I1216 14:19:26.618849   11726 image.go:120] Writing gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c to local cache
E1216 14:19:26.620027   11726 cache.go:203] Error downloading kic artifacts:  getting remote image: Get "https://gcr.io/v2/": dial tcp: lookup gcr.io: no such host

After:

I1216 14:24:03.568076   12993 cache.go:146] Downloading gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c to local cache
I1216 14:24:03.568211   12993 image.go:60] Checking for gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c in local cache directory
I1216 14:24:03.568347   12993 image.go:118] Writing gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c to local cache
I1216 14:24:03.570327   12993 cache.go:165] failed to download gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c, will try fallback image if available: getting remote image: Get "https://gcr.io/v2/": dial tcp: lookup gcr.io: no such host
I1216 14:24:03.570338   12993 image.go:76] Checking for docker.io/kicbase/build:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c in local docker daemon
I1216 14:24:03.632463   12993 cache.go:146] Downloading docker.io/kicbase/build:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c to local cache
I1216 14:24:03.632605   12993 image.go:60] Checking for docker.io/kicbase/build:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c in local cache directory
I1216 14:24:03.632657   12993 image.go:118] Writing docker.io/kicbase/build:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c to local cache
I1216 14:24:03.634069   12993 cache.go:165] failed to download docker.io/kicbase/build:v0.0.36-1668787669-15272@sha256:06094fc04b5dc02fbf1e2de7723c2a6db5d24c21fd2ddda91f6daaf29038cd9c, will try fallback image if available: getting remote image: Get "https://index.docker.io/v2/": dial tcp: lookup index.docker.io: no such host
I1216 14:24:03.634088   12993 image.go:76] Checking for gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272 in local docker daemon
I1216 14:24:03.688544   12993 cache.go:146] Downloading gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272 to local cache
I1216 14:24:03.688697   12993 image.go:60] Checking for gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272 in local cache directory
I1216 14:24:03.688736   12993 image.go:118] Writing gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272 to local cache
I1216 14:24:03.690219   12993 cache.go:165] failed to download gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1668787669-15272, will try fallback image if available: getting remote image: Get "https://gcr.io/v2/": dial tcp: lookup gcr.io: no such host
I1216 14:24:03.690234   12993 image.go:76] Checking for docker.io/kicbase/build:v0.0.36-1668787669-15272 in local docker daemon
I1216 14:24:03.752520   12993 cache.go:146] Downloading docker.io/kicbase/build:v0.0.36-1668787669-15272 to local cache
I1216 14:24:03.752638   12993 image.go:60] Checking for docker.io/kicbase/build:v0.0.36-1668787669-15272 in local cache directory
I1216 14:24:03.752664   12993 image.go:118] Writing docker.io/kicbase/build:v0.0.36-1668787669-15272 to local cache
I1216 14:24:03.754372   12993 cache.go:165] failed to download docker.io/kicbase/build:v0.0.36-1668787669-15272, will try fallback image if available: getting remote image: Get "https://index.docker.io/v2/": dial tcp: lookup index.docker.io: no such host

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 16, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2022
@spowelljr
Copy link
Member Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 16, 2022
@@ -197,118 +195,36 @@ func parseImage(img string) (*name.Tag, name.Reference, error) {
}

// CacheToDaemon loads image from tarball in the local cache directory to the local docker daemon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add in comment what does it return, (what is the string that it returns)

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 15528) |
+----------------+----------+---------------------+
| minikube start | 53.4s    | 54.4s               |
| enable ingress | 26.7s    | 25.7s               |
+----------------+----------+---------------------+

Times for minikube ingress: 28.7s 24.1s 27.6s 25.6s 27.6s
Times for minikube (PR 15528) ingress: 24.6s 27.1s 25.1s 24.1s 27.6s

Times for minikube start: 53.2s 52.2s 54.4s 55.2s 52.3s
Times for minikube (PR 15528) start: 53.5s 54.1s 54.5s 56.6s 53.3s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 15528) |
+----------------+----------+---------------------+
| minikube start | 25.2s    | 26.5s               |
| enable ingress | 21.7s    | 21.1s               |
+----------------+----------+---------------------+

Times for minikube start: 24.9s 25.5s 24.1s 25.5s 26.0s
Times for minikube (PR 15528) start: 27.5s 25.3s 25.8s 28.5s 25.5s

Times for minikube ingress: 22.4s 20.9s 22.5s 21.5s 21.4s
Times for minikube (PR 15528) ingress: 22.5s 22.0s 20.9s 20.0s 19.9s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 15528) |
+----------------+----------+---------------------+
| minikube start | 21.9s    | 21.6s               |
| enable ingress | 26.5s    | 26.3s               |
+----------------+----------+---------------------+

Times for minikube start: 21.6s 21.5s 21.7s 22.4s 22.2s
Times for minikube (PR 15528) start: 21.5s 21.8s 21.8s 21.9s 21.2s

Times for minikube ingress: 26.5s 26.4s 27.4s 26.4s 25.9s
Times for minikube (PR 15528) ingress: 25.9s 26.4s 26.4s 26.0s 26.4s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
Docker_Windows TestNetworkPlugins/group/false/DNS (gopogh) 23.81 (chart)
Docker_macOS TestStartStop/group/newest-cni/serial/Pause (gopogh) 25.19 (chart)
Hyperkit_macOS TestPause/serial/SecondStartNoReconfiguration (gopogh) 26.03 (chart)
Docker_macOS TestMultiNode/serial/RestartMultiNode (gopogh) 29.45 (chart)
Docker_Windows TestNetworkPlugins/group/bridge/DNS (gopogh) 42.86 (chart)
KVM_Linux TestPause/serial/SecondStartNoReconfiguration (gopogh) 44.44 (chart)
Hyper-V_Windows TestErrorSpam/setup (gopogh) 46.43 (chart)
Hyper-V_Windows TestMultiNode/serial/DeployApp2Nodes (gopogh) 46.43 (chart)
Docker_Windows TestStartStop/group/newest-cni/serial/Pause (gopogh) 47.62 (chart)
Docker_Linux_containerd TestStartStop/group/old-k8s-version/serial/DeployApp (gopogh) 48.60 (chart)
Docker_Linux_containerd TestStartStop/group/old-k8s-version/serial/FirstStart (gopogh) 48.60 (chart)
Docker_Linux_containerd TestStartStop/group/old-k8s-version/serial/UserAppExistsAfterStop (gopogh) 48.60 (chart)
Docker_Linux_containerd TestStartStop/group/old-k8s-version/serial/SecondStart (gopogh) 50.00 (chart)
Docker_Linux_containerd TestStartStop/group/embed-certs/serial/DeployApp (gopogh) 50.47 (chart)
Docker_Linux_containerd TestStartStop/group/embed-certs/serial/FirstStart (gopogh) 50.47 (chart)
Docker_Linux_containerd TestStartStop/group/embed-certs/serial/UserAppExistsAfterStop (gopogh) 50.47 (chart)
Docker_Linux_containerd TestStartStop/group/embed-certs/serial/SecondStart (gopogh) 52.68 (chart)
Docker_Windows TestNetworkPlugins/group/kubenet/DNS (gopogh) 57.14 (chart)
Docker_Linux_containerd TestStartStop/group/default-k8s-diff-port/serial/SecondStart (gopogh) 58.88 (chart)
Docker_Linux_containerd TestStartStop/group/default-k8s-diff-port/serial/UserAppExistsAfterStop (gopogh) 58.88 (chart)
Docker_Linux_containerd TestStartStop/group/default-k8s-diff-port/serial/DeployApp (gopogh) 59.63 (chart)
Docker_Linux_containerd TestStartStop/group/default-k8s-diff-port/serial/FirstStart (gopogh) 60.00 (chart)
Docker_Linux TestNetworkPlugins/group/calico/Start (gopogh) 72.14 (chart)
Hyper-V_Windows TestNoKubernetes/serial/StartWithK8s (gopogh) 75.00 (chart)
Hyper-V_Windows TestPause/serial/SecondStartNoReconfiguration (gopogh) 75.00 (chart)
Docker_Linux TestNetworkPlugins/group/false/DNS (gopogh) 76.43 (chart)
KVM_Linux TestMultiNode/serial/ValidateNameConflict (gopogh) 95.24 (chart)
Docker_Linux_containerd TestNetworkPlugins/group/calico/Start (gopogh) 98.13 (chart)
Docker_macOS TestIngressAddonLegacy/serial/ValidateIngressDNSAddonActivation (gopogh) 99.32 (chart)
Docker_Linux_containerd TestKubernetesUpgrade (gopogh) 100.00 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@medyagh
Copy link
Member

medyagh commented Dec 21, 2022

lgtm, ready to be merged when you ready

@medyagh medyagh merged commit a670a21 into kubernetes:master Jan 3, 2023
@spowelljr spowelljr deleted the fixDoubleKicDownload branch March 16, 2023 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants