Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: support automatic retry after failing to pull image #86899

Merged
merged 1 commit into from Jan 13, 2020

Conversation

@SataQiu
Copy link
Member

SataQiu commented Jan 7, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
kubeadm: support automatic retry after failing to pull image

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#1844

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: support automatic retry after failing to pull image

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@SataQiu

This comment has been minimized.

Copy link
Member Author

SataQiu commented Jan 7, 2020

/assign @neolit123

@SataQiu SataQiu force-pushed the SataQiu:enable-pull-retry-20200107 branch from 0d8f42b to 8630ec6 Jan 7, 2020
@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels Jan 7, 2020
}
return nil
return lastError

This comment has been minimized.

Copy link
@aojea

aojea Jan 7, 2020

Member

nit. do you need lastError?
it seems that you can return directly return errors.Wrapf(err, "output: %s, error", string(out))

@SataQiu SataQiu force-pushed the SataQiu:enable-pull-retry-20200107 branch from 8630ec6 to 7fe5a30 Jan 7, 2020
func(cmd string, args ...string) exec.Cmd { return fakeexec.InitFakeCmd(&fcmd, cmd, args...) },
func(cmd string, args ...string) exec.Cmd { return fakeexec.InitFakeCmd(&fcmd, cmd, args...) },
func(cmd string, args ...string) exec.Cmd { return fakeexec.InitFakeCmd(&fcmd, cmd, args...) },
func(cmd string, args ...string) exec.Cmd { return fakeexec.InitFakeCmd(&fcmd, cmd, args...) },

This comment has been minimized.

Copy link
@neolit123

neolit123 Jan 7, 2020

Member

this looks really silly, but it's not the fault of this PR - the same function repeated multiple times.

This comment has been minimized.

Copy link
@bart0sh

bart0sh Jan 10, 2020

Contributor

may be something similar to this can reduce this noise.

@@ -761,10 +761,18 @@ func TestImagePullCheck(t *testing.T) {
CombinedOutputScript: []fakeexec.FakeAction{
// Test case1: pull only img3
func() ([]byte, []byte, error) { return nil, nil, nil },
// Test case 2: fail to pull image2 and image3
// Test case 2: fail to pull image2 and image3 (if the pull fails, it will be retried 5 times by default)

This comment has been minimized.

Copy link
@neolit123

neolit123 Jan 7, 2020

Member

please format the comment like so:

// Test case 2: fail to pull image2 and image3
// If the pull fails, it will be retried 5 times (see PullImageRetry in constants/constants.go)
@@ -181,6 +181,8 @@ const (
PatchNodeTimeout = 2 * time.Minute
// TLSBootstrapTimeout specifies how long kubeadm should wait for the kubelet to perform the TLS Bootstrap
TLSBootstrapTimeout = 2 * time.Minute
// PullImageRetry specifies how many times ContainerRuntime retries when pulling image failed
PullImageRetry = 5

This comment has been minimized.

Copy link
@neolit123

neolit123 Jan 7, 2020

Member

when we originally discussed this there were also comments about using exponential backoff.
i'm fine with using 5 retries, but let's see if anyone has more comments.

exp. backoff will make this harder to unit test.

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Jan 7, 2020

/priority backlog

@SataQiu SataQiu force-pushed the SataQiu:enable-pull-retry-20200107 branch from 7fe5a30 to 92100f3 Jan 8, 2020
@aojea

This comment has been minimized.

Copy link
Member

aojea commented Jan 8, 2020

/test pull-kubernetes-e2e-kind-ipv6
unrelated failures, known flakes, PR is ok

Copy link
Member

neolit123 left a comment

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 8, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

var err error
var out []byte
for i := 0; i < constants.PullImageRetry; i++ {
out, err = runtime.exec.Command("crictl", "-r", runtime.criSocket, "pull", image).CombinedOutput()

This comment has been minimized.

Copy link
@bart0sh

bart0sh Jan 10, 2020

Contributor

Ignoring errors here doesn't look good to me.

This comment has been minimized.

Copy link
@yastij

yastij Jan 10, 2020

Member

it's not ignoring, it retries until it succeeds if it doesn't it logs the error. We can still log the error on each retry but I'm not sure if it's needed

}
return nil
return errors.Wrapf(err, "output: %s, error", string(out))

This comment has been minimized.

Copy link
@bart0sh

bart0sh Jan 10, 2020

Contributor

string() call is not needed here.

This comment has been minimized.

Copy link
@SataQiu

SataQiu Jan 11, 2020

Author Member

Thanks @bart0sh
Updated!

Copy link
Member

yastij left a comment

/lgtm
/hold

@bart0sh feel free to unhold

@SataQiu SataQiu force-pushed the SataQiu:enable-pull-retry-20200107 branch from 92100f3 to c7234aa Jan 11, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm label Jan 11, 2020
@SataQiu

This comment has been minimized.

Copy link
Member Author

SataQiu commented Jan 11, 2020

/test pull-kubernetes-e2e-gce

@SataQiu

This comment has been minimized.

Copy link
Member Author

SataQiu commented Jan 13, 2020

@bart0sh @yastij @neolit123 Do you have time to look it again?

Copy link
Member

yastij left a comment

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm and removed do-not-merge/hold labels Jan 13, 2020
@k8s-ci-robot k8s-ci-robot merged commit 3e8155e into kubernetes:master Jan 13, 2020
16 checks passed
16 checks passed
cla/linuxfoundation SataQiu authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-e2e-kind-ipv6 Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.