UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready #820

p0lyn0mial · 2021-06-21T14:28:26Z

WithNotFoundProtectorHandler will return 429 instead of 404 iff:

server hasn't been ready (/readyz=false)
the user is GC or the namespace lifecycle controller
the path is for an aggregated API or CR

This handler ensures that the system stays consistent even when requests are received before the server is ready.
In particular it prevents child deletion in case of GC or/and orphaned content in case of the namespaces controller.

openshift-ci-robot · 2021-06-21T14:28:34Z

@p0lyn0mial: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

2a78f0c|UPSTREAM: : return 429 instead of 404 when the server hasn't been ready: does not specify an upstream backport in the commit message
8f3cbdf|UPSTREAM: : wires WithNotFoundProtectorHandler to the default NotFound handler: does not specify an upstream backport in the commit message

p0lyn0mial · 2021-06-21T14:29:09Z

/assign @sttts @deads2k

p0lyn0mial · 2021-06-21T14:29:28Z

staging/src/k8s.io/apiserver/pkg/server/patch_config.go

+}
+
+func patchMatches(path string) bool {
+	return strings.HasPrefix(path, "/apis") || strings.HasPrefix(path, "/apis/")


is this enough?

is this enough?

Yes. Add a comment explaining that since discovery contains all groups, we have to block the discovery paths until CRDs and APIServices are synced. Otherwise someone will optimize this to expose built-in APIs.

I'm open to allowing direct GETS and PUTs for built-in APIs, but not their discovery

deads2k · 2021-06-21T15:08:27Z

staging/src/k8s.io/apiserver/pkg/server/patch_config.go

+
+		if patchMatches(r.URL.Path) && userMatches(attribs.GetUser().GetName()) {
+			w.Header().Set("Retry-After", "3")
+			http.Error(w, "Too many requests, please try again later.", http.StatusTooManyRequests)


does this match the 429 returned by p&f? I thought we tried to return json.

Can the message here be distinct so we can find it while tracing?

does this match the 429 returned by p&f? I thought we tried to return json

yes, p&f uses tooManyRequest function

Can the message here be distinct so we can find it while tracing?

+1

deads2k · 2021-06-21T15:11:57Z

staging/src/k8s.io/apiserver/pkg/server/patch_config.go

+}
+
+func userMatches(user string) bool {
+	return user == "system:serviceaccount:kube-system:generic-garbage-collector" || user == "system:serviceaccount:kube-system:namespace-controller"


GC uses its own account for informers?

nope, informers use system:kube-controller-manager, do we want to add it?

I thought it only matters to the discovery and the normal client

I think that informers will simply retry on 404

…een ready WithNotFoundProtectorHandler will return 429 instead of 404 iff: - server hasn't been ready (/readyz=false) - the user is GC or the namespace lifecycle controller - the path is for an aggregated API or CR This handler ensures that the system stays consistent even when requests are received before the server is ready. In particular it prevents child deletion in case of GC or/and orphaned content in case of the namespaces controller.

…NotFound handler

openshift-ci-robot · 2021-06-22T09:18:14Z

@p0lyn0mial: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

003809e|UPSTREAM: : wires WithNotFoundProtectorHandler to the default NotFound handler: does not specify an upstream backport in the commit message
b071cc8|UPSTREAM: : return 429 instead of 404 when the server hasn't been ready: does not specify an upstream backport in the commit message

openshift-ci · 2021-06-22T09:18:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: p0lyn0mial
To complete the pull request process, please ask for approval from deads2k after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

p0lyn0mial · 2021-06-22T09:19:13Z

staging/src/k8s.io/apiserver/pkg/server/patch_config.go

+		}
+
+		if patchMatches(r.URL.Path) && userMatches(attribs.GetUser().GetName()) {
+			w.Header().Set("Retry-After", "3")


do we want to randomize the timeout?

openshift-ci · 2021-06-22T11:56:50Z

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-gcp	`003809e`	link	`/test e2e-gcp`
ci/prow/e2e-gcp-upgrade	`003809e`	link	`/test e2e-gcp-upgrade`
ci/prow/e2e-aws-serial	`003809e`	link	`/test e2e-aws-serial`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-09-20T14:43:28Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci · 2021-09-20T14:43:35Z

@p0lyn0mial: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-bot · 2021-10-20T15:12:15Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-11-19T15:44:16Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-11-19T15:45:51Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Jun 21, 2021

openshift-ci bot requested review from deads2k and soltysh June 21, 2021 14:28

p0lyn0mial changed the title ~~With 429 when not ready~~ UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready Jun 21, 2021

openshift-ci bot assigned deads2k and sttts Jun 21, 2021

p0lyn0mial commented Jun 21, 2021

View reviewed changes

deads2k reviewed Jun 21, 2021

View reviewed changes

p0lyn0mial added 2 commits June 22, 2021 11:17

UPSTREAM: <carry>: wires WithNotFoundProtectorHandler to the default …

003809e

…NotFound handler

p0lyn0mial force-pushed the with-429-when-not-ready branch from 8f3cbdf to 003809e Compare June 22, 2021 09:18

p0lyn0mial commented Jun 22, 2021

View reviewed changes

openshift-ci bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 20, 2021

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2021

openshift-ci bot closed this Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready #820

UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready #820

p0lyn0mial commented Jun 21, 2021 •

edited

openshift-ci-robot commented Jun 21, 2021

p0lyn0mial commented Jun 21, 2021

p0lyn0mial Jun 21, 2021

deads2k Jun 21, 2021

deads2k Jun 21, 2021

p0lyn0mial Jun 22, 2021

p0lyn0mial Jun 22, 2021

deads2k Jun 21, 2021

p0lyn0mial Jun 22, 2021

p0lyn0mial Jun 22, 2021 •

edited

p0lyn0mial Jun 22, 2021

openshift-ci-robot commented Jun 22, 2021

openshift-ci bot commented Jun 22, 2021

p0lyn0mial Jun 22, 2021

openshift-ci bot commented Jun 22, 2021

openshift-bot commented Sep 20, 2021

openshift-ci bot commented Sep 20, 2021

openshift-bot commented Oct 20, 2021

openshift-bot commented Nov 19, 2021

openshift-ci bot commented Nov 19, 2021

UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready #820

UPSTREAM: <carry>: return 429 instead of 404 when the server hasn't been ready #820

Conversation

p0lyn0mial commented Jun 21, 2021 • edited

openshift-ci-robot commented Jun 21, 2021

p0lyn0mial commented Jun 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial Jun 22, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jun 22, 2021

openshift-ci bot commented Jun 22, 2021

Choose a reason for hiding this comment

openshift-ci bot commented Jun 22, 2021

openshift-bot commented Sep 20, 2021

openshift-ci bot commented Sep 20, 2021

openshift-bot commented Oct 20, 2021

openshift-bot commented Nov 19, 2021

openshift-ci bot commented Nov 19, 2021

p0lyn0mial commented Jun 21, 2021 •

edited

p0lyn0mial Jun 22, 2021 •

edited