Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready #820

Closed

Conversation

p0lyn0mial
Copy link

@p0lyn0mial p0lyn0mial commented Jun 21, 2021

WithNotFoundProtectorHandler will return 429 instead of 404 iff:

  • server hasn't been ready (/readyz=false)
  • the user is GC or the namespace lifecycle controller
  • the path is for an aggregated API or CR

This handler ensures that the system stays consistent even when requests are received before the server is ready.
In particular it prevents child deletion in case of GC or/and orphaned content in case of the namespaces controller.

@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Jun 21, 2021
@openshift-ci-robot
Copy link

@p0lyn0mial: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci openshift-ci bot requested review from deads2k and soltysh June 21, 2021 14:28
@p0lyn0mial p0lyn0mial changed the title With 429 when not ready UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready Jun 21, 2021
@p0lyn0mial
Copy link
Author

/assign @sttts @deads2k

}

func patchMatches(path string) bool {
return strings.HasPrefix(path, "/apis") || strings.HasPrefix(path, "/apis/")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enough?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enough?

Yes. Add a comment explaining that since discovery contains all groups, we have to block the discovery paths until CRDs and APIServices are synced. Otherwise someone will optimize this to expose built-in APIs.

I'm open to allowing direct GETS and PUTs for built-in APIs, but not their discovery


if patchMatches(r.URL.Path) && userMatches(attribs.GetUser().GetName()) {
w.Header().Set("Retry-After", "3")
http.Error(w, "Too many requests, please try again later.", http.StatusTooManyRequests)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this match the 429 returned by p&f? I thought we tried to return json.

Can the message here be distinct so we can find it while tracing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this match the 429 returned by p&f? I thought we tried to return json

yes, p&f uses tooManyRequest function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the message here be distinct so we can find it while tracing?

+1

}

func userMatches(user string) bool {
return user == "system:serviceaccount:kube-system:generic-garbage-collector" || user == "system:serviceaccount:kube-system:namespace-controller"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC uses its own account for informers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, informers use system:kube-controller-manager, do we want to add it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it only matters to the discovery and the normal client

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that informers will simply retry on 404

…een ready

WithNotFoundProtectorHandler will return 429 instead of 404 iff:
 - server hasn't been ready (/readyz=false)
 - the user is GC or the namespace lifecycle controller
 - the path is for an aggregated API or CR

This handler ensures that the system stays consistent even when requests are received before the server is ready.
In particular it prevents child deletion in case of GC or/and orphaned content in case of the namespaces controller.
@openshift-ci-robot
Copy link

@p0lyn0mial: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci
Copy link

openshift-ci bot commented Jun 22, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: p0lyn0mial
To complete the pull request process, please ask for approval from deads2k after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}

if patchMatches(r.URL.Path) && userMatches(attribs.GetUser().GetName()) {
w.Header().Set("Retry-After", "3")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to randomize the timeout?

@openshift-ci
Copy link

openshift-ci bot commented Jun 22, 2021

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp 003809e link /test e2e-gcp
ci/prow/e2e-gcp-upgrade 003809e link /test e2e-gcp-upgrade
ci/prow/e2e-aws-serial 003809e link /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 20, 2021
@openshift-ci
Copy link

openshift-ci bot commented Sep 20, 2021

@p0lyn0mial: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Nov 19, 2021
@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants