Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Pull images early in `kubeadm init` by default #64105

Merged
merged 1 commit into from May 29, 2018

Conversation

@chuckha
Copy link
Member

chuckha commented May 21, 2018

kubeadm now pulls container images before the init step.

  • This commit also fixes a dependency cycle

What this PR does / why we need it:
This PR adds a check to kubeadm preflight checks that pulls images down before kubeadm tries to run them so boot up time of containers does not have to pull.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes kubernetes/kubeadm#825

Special notes for your reviewer:
The interface stuff is to remove a dependency cycle where utils pulled in preflight and preflight pulled in utils. I do not think this utils package should be importing preflight.

Release note:

kubeadm will pull required images during preflight checks if it cannot find them on the system
@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 21, 2018

/test pull-kubernetes-e2e-gce

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 21, 2018

/test pull-kubernetes-kubemark-e2e-gce

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 22, 2018

/hold

@chuckha chuckha force-pushed the chuckha:prepull-images branch from 722d7bc to b7e9388 May 23, 2018

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/M labels May 23, 2018

@chuckha chuckha force-pushed the chuckha:prepull-images branch from b7e9388 to aea69a2 May 23, 2018

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 23, 2018

/hold cancel

@luxas

luxas approved these changes May 23, 2018

Copy link
Member

luxas left a comment

Great PR!
Left a couple of comments. Main feedback is to use docker inspect.
But good to see this coming, so we can improve our UX.

func (i *imgs) Exists(image string) (int, error) {
switch image {
case "a":
return 0, nil

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

for readability, could you add a comment on each case here, like
// image a not present locally, should be pulled
// image b cached locally, should not be pulled
// image c failed to inspect
etc.

if len(warnings) != 0 {
t.Fatalf("did not expect any warnings but got %q", warnings)
}
if len(errors) != 2 {

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

// expects one pull error and one inspect error

@@ -60,13 +59,17 @@ func CheckErr(err error) {
checkErr("", err, fatal)
}

type preflightError interface {

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

👍 for fixing this. You could add a short explanation why we need this though

continue
}
if err != nil {
errors = append(errors, fmt.Errorf("failed to check if image exists [%s]: %v", image, err))

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

nit: %q instead of [%s]. For me it doesn't actually matter, but FYI 😄

}
var buf bytes.Buffer
// docker images *always* returns exit code 0 even if the image doesn't exist so have to check the output
dockerCmd := cri.exec.Command("sh", "-c", fmt.Sprintf("docker images %v", image))

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

So either you can do a filter directly like docker images gcr.io/google_containers/etcd-amd64:3.0.17, but I would prefer this to do docker inspect %s. inspect returns 1 if not found. Need to translate that, but you don't have to parse the output at least.

This comment has been minimized.

@stealthybox

This comment has been minimized.

@stealthybox

stealthybox May 23, 2018

Contributor

Available since 1.13

docker image inspect %s --format '{{.Id}}'

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

We need to support v1.11+ though, but thanks for the heads up

This comment has been minimized.

@stealthybox

stealthybox May 23, 2018

Contributor

The non nested version should work:

docker inspect %s --format '{{.Id}}'

This comment has been minimized.

@chuckha

chuckha May 23, 2018

Author Member

yes! this was a great suggestion :)

return errors.New(images.ImageNotFoundErrorMessage)
}

if f.cri == "docker" {

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

if you can move to docker inspect this should change

expectError: false,
},
{
name: "using non-docker cri and image does exist",

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

Add one more case using non-docker cri which errors

expectError: false,
},
{
name: "use docker but docker errors",

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

be consequent in using using here 😄

}

func (f *fakeCmd) Run() error {
fmt.Fprintf(f.out, "%v %v", f.cmd, strings.Join(f.args, " "))

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

%s %s if they are strings

errorExpected bool
}{
{
name: "New succeeds even if crictl is not in path",

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

These seems like unit tests for NewCRInterfacer rather than the pull functionality. Maybe you can keep this, make it focus only on NewCRInterfacer, and then create one more unit test to specifically test the pull functionality?

@luxas

This comment has been minimized.

Copy link
Member

luxas commented May 23, 2018

I'd prefer to let kubelet handle this, as what k/k does. Kubeadm is more focused on setting up conformant clusters quickly. Kubelet can better handle such images. There is no need to have another same functionality in kubeadm, which will bring in complexity as well.

@dixudx so the reason we do this is purely because we want to have happier users, which can diagnose what goes wrong in the image pulling stage. Right now we have loads of users not understanding what happens when kubeadm init waits for the kubelet to pull for a long time, and we really want to fix that, make kubeadm not "deadlock" UX-wise in the future. Right now we can't set a timeout on the wait for the kubelet to pull either as you may have slow internet connection but everything's going well. With this, we can dramatically reduce the time-to-wait for the API server to be started in the later case, and move that logic to fail fast if we can't pull the images. With that in mind, does this make sense to you? It's a valid concern for sure.

@dixudx

This comment has been minimized.

Copy link
Member

dixudx commented May 23, 2018

Right now we have loads of users not understanding what happens when kubeadm init waits for the kubelet to pull for a long time, and we really want to fix that

Yeah, this confused me as well when I used kubeadm in the first time.

With this, we can dramatically reduce the time-to-wait for the API server to be started in the later case, and move that logic to fail fast if we can't pull the images. With that in mind, does this make sense to you?

SGTM. 👍

@@ -864,6 +904,7 @@ func RunInitMasterChecks(execer utilsexec.Interface, cfg *kubeadmapi.MasterConfi
HTTPProxyCheck{Proto: "https", Host: cfg.API.AdvertiseAddress},
HTTPProxyCIDRCheck{Proto: "https", CIDR: cfg.Networking.ServiceSubnet},
HTTPProxyCIDRCheck{Proto: "https", CIDR: cfg.Networking.PodSubnet},
ImagePullCheck{Images: criInterfacer, ImageList: images.GetAllImages(cfg)},

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

Idea, can we create a dedicated RunPullImagesCheck function that we in kubeadm init run after all the init checks have succeeded? That way we won't even start pulling if something else is really wrong, but reuse the preflight framework anyway.

This comment has been minimized.

@stealthybox

stealthybox May 23, 2018

Contributor

^ I like this
Should we allow the ImagePullCheck to be skippable if it is now part of the primary logic?
Since crictl is new, there may be users who do not have it and are not using docker.

( We're shipping it with the new packages, but that doesn't always help somebody in the upgrade case )

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

Should we allow the ImagePullCheck to be skippable if it is now part of the primary logic?

Yes, that's one of the reasons we're hosting it as a preflight check, you can set the ignore preflight errors flag and boom you won't be bothered with this anymore if you don't like it

This comment has been minimized.

@chuckha

chuckha May 23, 2018

Author Member

Sure, this is better than shoving it somewhere in the middle.

We'll also want to think about a list of images needed for joining.

errors = append(errors, fmt.Errorf("failed to check if image exists [%s]: %v", image, err))
continue
}
if err := i.Images.Pull(image); err != nil {

This comment has been minimized.

@luxas

luxas May 23, 2018

Member

Does this print to STDOUT? I think that would be worthwhile to do, to show progress to the user.
This would be a special preflight check in that sense, something in between.

@luxas luxas changed the title Prepulls images by default kubeadm: Pull images early in `kubeadm init` by default May 23, 2018

@stealthybox
Copy link
Contributor

stealthybox left a comment

Thanks for working to improve this @chuckha :)

Quick pass with code and UX considerations:

// Returns
// * -1 if an error occurred with the exec command
// * 0 if no error occurred but image is not found
// * 1 if no error occurred and the image is found

This comment has been minimized.

@stealthybox

stealthybox May 23, 2018

Contributor

Please create public constants for these: https://golang.org/ref/spec#Iota 🙂
They should be public since the method is also public and can be used by another package.

const (
	ImageCheckerUnavailable = iota - 2
	ImageCheckError
	ImageNotFound
	ImageFound
)
// results in -2, -1, 0, 1

https://play.golang.org/p/sucigHDnCai

func (i ImagePullCheck) Check() (warnings, errors []error) {
for _, image := range i.ImageList {
found, err := i.Images.Exists(image)
if found == 1 {

This comment has been minimized.

@stealthybox

stealthybox May 23, 2018

Contributor

if result == image.ImageFound {

etc ...

}
var buf bytes.Buffer
// docker images *always* returns exit code 0 even if the image doesn't exist so have to check the output
dockerCmd := cri.exec.Command("sh", "-c", fmt.Sprintf("docker images %v", image))

This comment has been minimized.

@stealthybox

stealthybox May 23, 2018

Contributor

We should lookup the dockerPath and add it to CRInterfacer.

This will allow us to avoid an unecessary subshell.

Additionally, we can add
else if cri.dockerPath != "" {

The default case would be to return ImageCheckerUnavailable(-2) when neither crictl or docker are available to check for images.
This will allow us to return a user-friendly error message to people who could potentially be using containerd or rkt without crictl in their PATH.
The error message would be a better signal for users to skip the pre-flight check than "sh: 1: docker: not found"

Exists(string) error
}

// Images defines the set of behaviors needed for images realting to the CRI

This comment has been minimized.

@chuckha

chuckha May 23, 2018

Author Member

TYPO s/realting/relating

limitations under the License.
*/

package images_test

This comment has been minimized.

@chuckha

chuckha May 23, 2018

Author Member

are these tests too obsessive?

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 24, 2018

/test pull-kubernetes-e2e-gce

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 24, 2018

/test pull-kubernetes-e2e-kops-aws

1 similar comment
@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 24, 2018

/test pull-kubernetes-e2e-kops-aws

var err error
if criSocket != kubeadmapiv1alpha2.DefaultCRISocket {
if crictlPath, err = execer.LookPath("crictl"); err != nil {
return nil, fmt.Errorf("crictl is required for non docker container runtimes: %v", err)

This comment has been minimized.

@stealthybox

stealthybox May 24, 2018

Contributor

Are these errors a new addition?
They seem to halt the method preventing the new struct from being created.
It looks like you would not be able to use this with any other runtime than docker (and only with crictl also on the path).

Is this the expected behavior?

This comment has been minimized.

@chuckha

chuckha May 24, 2018

Author Member

The docker bit is a necessary hack that we'd like to eventually totally replace with crictl. We don't want to get into the business of writing this kind of logic for every container runtime tool, so we are only supporting crictl.

Also I have a ticket (holy cow i have a lot of tickets i need to do) to include crictl with the system packages so we can start depending on it more.

The reason for the docker hack is that since the kubelet is the thing that hosts the cri dockershim and the kubelet isn't necessarily running by the time preflights run (I believe that is still true), we have to use the docker binary IF and only if we know we're using docker as a runtime (which we assert criSocket == default means we're using docker as the runtime).

The longterm goal is to only use crictl.

@chuckha chuckha force-pushed the chuckha:prepull-images branch from a1b4bcd to 4ac1941 May 25, 2018

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/test pull-kubernetes-e2e-kops-aws

Prepulls images by default
kubeadm now pulls container images before the init step if it cannot find them on the system

* This commit also cleans up a dependency cycle

Closes #825

@chuckha chuckha force-pushed the chuckha:prepull-images branch from 4ac1941 to 2f2de31 May 25, 2018

@stealthybox

This comment has been minimized.

Copy link
Contributor

stealthybox commented May 25, 2018

/test pull-kubernetes-node-e2e

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/test pull-kubernetes-node-e2e

1 similar comment
@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/test pull-kubernetes-node-e2e

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/test pull-kubernetes-node

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/retest

1 similar comment
@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/retest

@timothysc
Copy link
Member

timothysc left a comment

/approve

Others have been involved in the review they can lgtm.

@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/retest

1 similar comment
@chuckha

This comment has been minimized.

Copy link
Member Author

chuckha commented May 25, 2018

/retest

@luxas

luxas approved these changes May 29, 2018

Copy link
Member

luxas left a comment

/lgtm
/retest

Thanks a lot @chuckha 💯!

if err := i.Images.Exists(image); err == nil {
continue
}
if err := i.Images.Pull(image); err != nil {

This comment has been minimized.

@luxas

luxas May 29, 2018

Member

Can be done as a follow-up

@k8s-ci-robot k8s-ci-robot added the lgtm label May 29, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 29, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chuckha, luxas, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented May 29, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 29, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 29, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit e3a4104 into kubernetes:master May 29, 2018

15 of 18 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce
Details
pull-kubernetes-e2e-gce-100-performance Job triggered.
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation chuckha authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.