Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl: fix timeout=32s for some rest APIs when --request-timeout=0 #103619

Closed
wants to merge 1 commit into from

Conversation

BoleynSu
Copy link

@BoleynSu BoleynSu commented Jul 9, 2021

According to kubectl options, for --request-timeout, a value
of zero means don't timeout requests. This PR fixes the wrong
behavior.

What type of PR is this?

/kind bug

What this PR does / why we need it:

Please check #103618

Which issue(s) this PR fixes:

Fixes #103618

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix kubectl timing out on slow connections.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


According to `kubectl options`, for `--request-timeout`, a value
of zero means don't timeout requests. This PR fixes the wrong
behavior.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 9, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @BoleynSu!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @BoleynSu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 9, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BoleynSu
To complete the pull request process, please assign lavalamp after the PR has been reviewed.
You can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BoleynSu
Copy link
Author

BoleynSu commented Jul 9, 2021

This patch is to correct the current behavior. However, as it has been there for 3 years, it is very likely that someone is already depending on the current behavior. If we do not want to change the current behavior, maybe we can change the usage message instead?

@fedebongio
Copy link
Contributor

/assign @jpbetz
/cc @roycaihw
/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 13, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 13, 2021
@@ -462,9 +459,6 @@ func withRetries(maxRetries int, f func() ([]*metav1.APIGroup, []*metav1.APIReso
func setDiscoveryDefaults(config *restclient.Config) error {
config.APIPath = ""
config.GroupVersion = nil
if config.Timeout == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't remove the default timeout to preserve backwards compatibility.

What if we add -1 for no timeout instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this is already the current behavior?

kubectl get nodes --request-timeout=-1s -v10

...

I0719 14:20:47.145520   41053 round_trippers.go:435] curl ... foo.com/api/v1/nodes?limit=500'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like it's an issue specific to the kubectl documentation. Unless there is a mismatch between the client-go documentation and client-go behavior, we shouldn't change client-go.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a doc-code dismatch. Other places have the correct behavior while only this one misbehaves.

Copy link
Author

@BoleynSu BoleynSu Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl options shows the following.

      --request-timeout='0': The length of time to wait before giving up on a single server request.
Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means
don't timeout requests.

Note that I did not really check if other places work as documented. Only that when searching the codebase for this particular flag I did not find any other code using it wrongly.

it looks like it's an issue specific to the kubectl documentation. Unless there is a mismatch between the client-go documentation and client-go behavior, we shouldn't change client-go.

In staging/src/k8s.io/client-gorest/config.go, we also have

        // The maximum length of time to wait before giving up on a server request. A value of zero means no timeout.
        Timeout time.Duration

Copy link
Contributor

@jpbetz jpbetz Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @roycaihw's comment. If we want to respect the --request-timeout documentation in kubectl (which seems reasonable) we should do it by passing in a value to client-go to tell it to not timeout. We cannot delete (or otherwise change) the default in client-go (which is used by more clients than just kubectl), since that would be a breaking change to the other clients.

Copy link
Member

@liggitt liggitt Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the comments about driving behavior from the kubectl side, rather than here

even in kubectl, I'm not sure removing timeout entirely makes sense... the server will still time out eventually, regardless of what the client does (and at around 30 seconds for short-lived requests, I think, at least for REST API requests)

@BoleynSu
Copy link
Author

/sig api-machinery

Copy link
Author

@BoleynSu BoleynSu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping.

I just notice one of my comments is in pending status for a few days and I do not know how to make it public, so I updated my last comment instead. PTAL.

@BoleynSu
Copy link
Author

BoleynSu commented Jul 27, 2021 via email

@eddiezane
Copy link
Member

I ran through this in a debugger and get what is happening now. @BoleynSu apologies for not understanding earlier.

To summarize:

When running kubectl get nodes --request-timeout=0s --cache-dir=/dev/null the parsed and merged config does indeed have config.Timeout set to 0 here. The issue is that we are not able to differentiate between the user supplied value of 0 and the Go default value of 0 - which is why using -1 worked in my testing.

The callstack looks like this:

k8s.io/client-go/discovery.setDiscoveryDefaults(discovery_client.go:462)
k8s.io/client-go/discovery.NewDiscoveryClientForConfig(discovery_client.go:487)
k8s.io/client-go/discovery/cached/disk.NewCachedDiscoveryClientForConfig(cached_discovery.go:281)
k8s.io/cli-runtime/pkg/genericclioptions.(*ConfigFlags).ToDiscoveryClient(config_flags.go:253)
k8s.io/kubectl/pkg/cmd/util.(*MatchVersionFlags).ToDiscoveryClient(kubectl_match_version.go:91)
k8s.io/cli-runtime/pkg/resource.NewBuilder.func1(builder.go:206)

As it stands now we aren't able to set discovery clients (created by NewDiscoveryClientForConfig) to use a timeout of 0.

We aren't able to remove the default or make it a pointer but maybe we can pass metadata that the timeout was set in the config (like @jpbetz suggested above) or change the expectation that a -1 one be used for no timeout.

// The maximum length of time to wait before giving up on a server request. A value of zero means no timeout.
Timeout time.Duration

@liggitt @soltysh thoughts?

@BoleynSu
Copy link
Author

BoleynSu commented Jul 29, 2021 via email

@liggitt
Copy link
Member

liggitt commented Jul 29, 2021

IIUC, they are only for create/patch/... which change the data, not for retrieving data. @liggitt

get requests still have a server-side upper bound timeout (defaulting to 60 seconds, I think)

@BoleynSu
Copy link
Author

BoleynSu commented Jul 30, 2021 via email

@BoleynSu
Copy link
Author

BoleynSu commented Aug 8, 2021

Friendly ping.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2021
@BoleynSu
Copy link
Author

BoleynSu commented Nov 9, 2021

Friendly ping.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2021
@dims
Copy link
Member

dims commented Jan 5, 2022

@fedebongio this probably needs an additional assignee?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubectl apply timeouts on /openapi/v2?timeout=32s
9 participants