Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherrypick #95981 to 1.18, Enables HTTP/2 health check #100376

Merged
merged 4 commits into from Mar 25, 2021

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Mar 18, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

Backports use of the golang.org/x/net PingTimeout to 1.18 (replay of #96778 on 1.18)

Rationale for backporting:

  • how unpredictable the stuck connection issue is (can be triggered by arbitrary network interruptions)
  • how many components it affects (all components using client-go)
  • how problematic a stuck connection is for informer clients (can cause permanently stale data with no indication anything is wrong)
  • we've successfully used these sys and net levels in 1.19 for O(months) now
  • other consumers have successfully backported these changes to 1.18 and used them in production

Does this PR introduce a user-facing change?

HTTP/2 connection health check is enabled by default in all Kubernetes clients to fix persistently broken connections (https://github.com/kubernetes/client-go/issues/374). If needed, users can tune the feature via the HTTP2_READ_IDLE_TIMEOUT_SECONDS and HTTP2_PING_TIMEOUT_SECONDS environment variables. The feature is disabled if HTTP2_READ_IDLE_TIMEOUT_SECONDS is set to 0.

/cc @caesarxuchao

@liggitt liggitt changed the base branch from master to release-1.18 March 18, 2021 17:19
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 18, 2021
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/cloudprovider approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 18, 2021
@caesarxuchao
Copy link
Member

/assign

Thanks, Jordan!

@caesarxuchao
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 18, 2021
@liggitt liggitt changed the title WIP - Cherrypick #95981 to 1.18, Enables HTTP/2 health check Cherrypick #95981 to 1.18, Enables HTTP/2 health check Mar 18, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2021
@liggitt liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 18, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 18, 2021
@liggitt
Copy link
Member Author

liggitt commented Mar 18, 2021

/cc @lavalamp

@liggitt
Copy link
Member Author

liggitt commented Mar 18, 2021

failure looks like #98182

/retest

@caesarxuchao
Copy link
Member

/retest

@caesarxuchao
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member Author

liggitt commented Mar 23, 2021

/retest

@liggitt
Copy link
Member Author

liggitt commented Mar 24, 2021

/assign @deads2k

@liggitt
Copy link
Member Author

liggitt commented Mar 24, 2021

conceptually acked by @deads2k in #95981 (comment), leaning on @caesarxuchao for review

@justaugustus justaugustus added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Mar 24, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Mar 24, 2021
@k8s-ci-robot k8s-ci-robot merged commit e680574 into kubernetes:release-1.18 Mar 25, 2021
@liggitt liggitt deleted the http2-1.18 branch March 29, 2021 19:58
@LuChenjing
Copy link

@liggitt Hi,I still met this issue after upgrading K8S from 1.18.5 to 1.18.20. I noticed this PR was release after 1.18.18.
How I upgraded K8S is replacing the apiserver\scheduler\controller manager\kubelet\kube-proxy docker images, is there any other configs I should set?

@xuchen-xiaoying
Copy link

xuchen-xiaoying commented Mar 25, 2022

it seems that after this cherry pick to 1.18, problem that "use of closed network connection" still re-produced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants