Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver returns continue together with the 410 error #67284

Merged
merged 2 commits into from Sep 1, 2018

Conversation

@caesarxuchao
Copy link
Member

commented Aug 11, 2018

Implements #66981 (comment).

Closes #66981.

/sig api-machinery
/assign @lavalamp @liggitt @smarterclayton

Upon receiving a LIST request with expired continue token, the apiserver now returns a continue token together with the 410 "the from parameter is too old " error. If the client does not care about getting a list from a consistent snapshot, the client can use this token to continue listing from the next key, but the returned chunk will be from the latest snapshot.

@caesarxuchao caesarxuchao changed the title Return continue token for inconsistent chunks with 410 error apiserver returns continue together with the 410 error Aug 11, 2018

// respond with a 410 ResourceExpired error together with a continue token. If the client needs a
// consistent list, it must restart their list without the continue field. Otherwise, the client may
// send another list request with token received with the 410 error, the server will respond with a
// list starting from the next key, but from the latest snapshot.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

You need to explain more that the snapshot is now inconsistent, i.e. add a clause that explicitly calls out that the user will miss changes that occur between their two calls.

}
err = errors.NewResourceExpired(message +
"Or use the continue token in this response to continue the list, " +
"though the next chunk will be from a different snapshot than the previous chunks.")

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

I'm not sure a regular user will understand what "different snapshot" means. Try to find something more clear.

if paging {
return errors.NewResourceExpired("The provided from parameter is too old to display a consistent list result. You must start a new list without the from.")
}
func interpretListError(err error, continueKey, keyPrefix string) error {

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

I don't like changing the signature of this method to key off of "continueKey" present. This method above all needs to be easy to read, even if you have to duplicate messages (put the string in a constant).

// for this purpose to distinguish from a bad token that has empty rv.
newToken, err := encodeContinue(continueKey, keyPrefix, -1)
if err != nil {
return errors.NewResourceExpired(message +

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

This could leak database state information to the caller. You should probably be using utilruntime.HandleError and return a generic error message.

This comment has been minimized.

Copy link
@caesarxuchao
err = errors.NewResourceExpired(message +
"Or use the continue token in this response to continue the list, " +
"though the next chunk will be from a different snapshot than the previous chunks.")
err.ErrStatus.ListMeta.Continue = newToken

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

I don't like the way this method flows, please refactor.

options = append(options, clientv3.WithRev(continueRV))
returnedRV = continueRV

if continueRV > 0 {

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

This method is already complex, and this makes it harder to read. Add a comment.

This comment has been minimized.

Copy link
@caesarxuchao

caesarxuchao Aug 15, 2018

Author Member

Done.

@@ -563,7 +565,7 @@ func (s *store) List(ctx context.Context, key, resourceVersion string, pred stor
for {
getResp, err := s.client.KV.Get(ctx, key, options...)
if err != nil {
return interpretListError(err, len(pred.Continue) > 0)
return interpretListError(err, continueKey, keyPrefix)

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

You need unit tests for all of these conditions and errors, not just e2e tests.

By("retrieving the second page until the token expires")
opts.Continue = firstToken
var inconsistentToken string
wait.Poll(20*time.Second, 2*storagebackend.DefaultCompactInterval, func() (bool, error) {

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 12, 2018

Contributor

This is a really questionable test. I don't think this should be an e2e test, I think you need to be testing it as an integration test or in unit level. Waiting 15 minutes in an e2e test is dumb.

This comment has been minimized.

Copy link
@caesarxuchao

caesarxuchao Aug 15, 2018

Author Member

Added unit test.

I want to keep the e2e test. The continue token is a critical API, I want test it in an as realistic as possible setup. The test will take 0-5 mins, as the default compact time is 5 mins. Also, it's a parallel test, it shouldn't add too much time to the CI.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 17, 2018

Contributor

Hrm - that's about 3-4x longer than almost every other test except the very slow stateful sets.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 17, 2018

Contributor

I guess I can buy it.

@smarterclayton
Copy link
Contributor

left a comment

Needs some work

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:token-with-410 branch from 86a4351 to 13504b0 Aug 13, 2018

@k8s-ci-robot k8s-ci-robot added size/XXL and removed size/L labels Aug 13, 2018

@lavalamp
Copy link
Member

left a comment

started a review last week but it looks like clayton beat me to it. Here's the one comment I had, haha

// the value in the first response.
// consistent list may not be possible if the server configuration has changed or more than a few
// minutes have passed. The resourceVersion field returned when using this continue value will be
// identical to the value in the first response, unless if you have received this token from an error

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 13, 2018

Member

s/if//

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:token-with-410 branch 2 times, most recently from 6f3b60c to 06e9d33 Aug 15, 2018

@k8s-ci-robot k8s-ci-robot added size/L and removed size/XXL labels Aug 15, 2018

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 15, 2018

@smarterclayton thanks for the suggestions. PTAL. I tried hard to make the code and comments clear, but probably need more of your help.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2018

@smarterclayton PTAL. Thanks.

framework.Logf("Retrieved inconsistent continue %s", inconsistentToken)
return true, nil
})
By("retrieving the second page again with the token received with the error message")

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 17, 2018

Contributor

nit can you add newlines before the By(...) lines because it's hard to read

This comment has been minimized.

Copy link
@caesarxuchao

caesarxuchao Aug 18, 2018

Author Member

Done.

"retrieve the remainder of the results. Continuing with the provided " +
"token results in an inconsistent list - objects that were created, " +
"modified, or deleted after the first chunk of results will be returned " +
"and the current time may show up in the list."

This comment has been minimized.

Copy link
@caesarxuchao

caesarxuchao Aug 24, 2018

Author Member

@smarterclayton comments addressed. PTAL. Not sure what you meant with the "current time" here, the ListMeta doesn't have creationTimestamp.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Aug 29, 2018

Contributor

hrm, was supposed to be "between the time the first chunk was returned and now may show up in the list."

This comment has been minimized.

Copy link
@caesarxuchao

caesarxuchao Aug 29, 2018

Author Member

Got it. PTAL. Thanks.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2018

@smarterclayton could you give me a 1.12 milestone? The code slush began yesterday. Thanks.

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:token-with-410 branch from 38d0ce7 to 06911fd Aug 29, 2018

@smarterclayton smarterclayton added this to the v1.12 milestone Aug 29, 2018

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:token-with-410 branch from 06911fd to 94f599f Aug 29, 2018

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2018

/retest

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:token-with-410 branch from 94f599f to 6e81eff Aug 29, 2018

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2018

/retest

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2018

@smarterclayton do you have more comments? Can we get this merged before code freeze? Thank you.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2018

/approve
/lgtm

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:token-with-410 branch from 6e81eff to 5273182 Sep 1, 2018

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 1, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 1, 2018

New changes are detected. LGTM label has been removed.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented Sep 1, 2018

Solved a conflict in the generated file. Re-applied the lgtm.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 1, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@caesarxuchao caesarxuchao added the lgtm label Sep 1, 2018

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 1, 2018

Automatic merge from submit-queue (batch tested with PRs 67571, 67284, 66835, 68096, 68152). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

@k8s-github-robot k8s-github-robot merged commit 5b916f8 into kubernetes:master Sep 1, 2018

18 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation caesarxuchao authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.