Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm should always fall back to client version when there is any internet issue #80024

Conversation

@RainbowMango
Copy link
Member

commented Jul 11, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #79997
Fixes kubernetes/kubeadm#1662

Special notes for your reviewer:
Please help to check if more test case needed.
I think it's enough to check the return value of the function fetcher.

Does this PR introduce a user-facing change?:

kubeadm: fall back to client version in case of certain HTTP errors
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

Hi @RainbowMango. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RainbowMango

This comment has been minimized.

Copy link
Member Author

commented Jul 11, 2019

Copy link
Member

left a comment

Thanks @RainbowMango !
Not sure if this is the actual fix. Let me investigate a bit.
/assign
/ok-to-test
/cc @kad

@k8s-ci-robot k8s-ci-robot requested a review from kad Jul 11, 2019
// If the network operaton was successful but the server did not reply with StatusOK
if body != "" {
return "", err
}

This comment has been minimized.

Copy link
@neolit123

neolit123 Jul 11, 2019

Member

for the attached issue:

The CA generation failed because of 502 bad gateway returned and the response body isn't empty when it's trying to fetch stable version from internet.

what does the body contain?

should we just handle StatusOK?
@kad had comments that we should not do that, thus we ended with the body check.

This comment has been minimized.

Copy link
@kad

kad Jul 11, 2019

Member

it was specifically 404 which should be leading to propagation of error, but code was written in a bit simpler manner: "if not StatusOK"... Probably we should do a bit better handling for 5xx statuses and allow fallback in that case, similarly as with detection of "air gap" scenarios. I need to dig a bit more in the code to see how to do it better.

This comment has been minimized.

Copy link
@leakingtapan

leakingtapan Jul 11, 2019

Contributor

what does the body contain?

Since the body is not logged anywhere, I don't have the body at the moment. Unless we can reproduce the issue

should we just handle StatusOK?

Looks the StatusOk is already handled here? So that fetchFromURL returns error for any non Ok status

This comment has been minimized.

Copy link
@neolit123

neolit123 Jul 11, 2019

Member

@kad @leakingtapan
we should probably just handle 5** as a quick fix?

long term we discussed that we should make the "fetch remote version code" on demand only (if a label is passed) and use the local version by default.

This comment has been minimized.

Copy link
@rosti

rosti Jul 12, 2019

Member

Here are my impressions ATM:

  1. 4** and 5** codes should always be considered errors (especially 5**).
  2. The body of everything else, but 200, is unreliable. Even if it contains a version, it may not be a Kubernetes version.
  3. We need a test case that handles this.

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Jul 17, 2019

Author Member

I'm confused now.

Let me know how I can help shape it up. :)

This comment has been minimized.

Copy link
@RainbowMango

RainbowMango Jul 18, 2019

Author Member

@rosti @neolit123 @kad

I suggest not to be too complicated. As mentioned by @leakingtapan , we going to fallback when there is an error from fetcher(). We don't need to consider the error code is 4xx or 5xx, and if there is an error that makes no sense to parse the body.

I don't think it's worth to break fetchFromURL() into two parts, just for the low-value test case.

I look forward to your final conclusion.

This comment has been minimized.

Copy link
@leakingtapan

leakingtapan Jul 18, 2019

Contributor

@kad please clarify your points so we can move the fix forward

This comment has been minimized.

Copy link
@neolit123

neolit123 Jul 18, 2019

Member

we discussed this problem again today with @rosti
@RainbowMango please fix the failing unit tests https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/80024/pull-kubernetes-bazel-test/1151838526606675968

and we can merge this PR, follow up PRs can still be accepted (cc @kad).

This comment has been minimized.

Copy link
@RainbowMango
@RainbowMango

This comment has been minimized.

Copy link
Member Author

commented Jul 18, 2019

/retest

@RainbowMango RainbowMango force-pushed the RainbowMango:pr_fix_issue_79997_kubeadm_fall_back branch from 3e2f9b8 to a4ca944 Jul 19, 2019
@k8s-ci-robot k8s-ci-robot added size/M and removed size/XS labels Jul 19, 2019
Copy link
Member

left a comment

/lgtm
/approve
/priority backlog

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

/hold
before this is merged can you please add a release note under Does this PR introduce a user-facing change?: in the PR description instead of NONE.

kubeadm: fall back to client version in case of certain HTTP errors

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

/hold cancel

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 19, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

@k8s-ci-robot k8s-ci-robot merged commit 65fc256 into kubernetes:master Jul 19, 2019
23 checks passed
23 checks passed
cla/linuxfoundation RainbowMango authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@RainbowMango RainbowMango referenced this pull request Aug 29, 2019
6 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.