Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with an expired continue token dring list_namespaced_job and silimar calls is not documented #953

Closed
adamnovak opened this issue Sep 12, 2019 · 17 comments
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@adamnovak
Copy link

Link to the issue (please include a link to the specific documentation or example):

See the documentation for list_namespaced_job, and specifically the section on _continue, and specifically the section on dealign with expired tokens:

If the specified continue value is no longer valid whether due to expiration (generally five to fifteen minutes) or a configuration change on the server, the server will respond with a 410 ResourceExpired error together with a continue token. If the kubernetes.client needs a consistent list, it must restart their list without the continue field. Otherwise, the kubernetes.client may send another list request with the token received with the 410 error, ...

Description of the issue (please include outputs or screenshots if possible):

How an expired token manifests at the Python level is not specified. Does the 410 error response from the server result in an ApiError being raised when the response is received by the RESTClient? If so, then to get the token you would have to:

  • Catch the ApiError
  • Check its status for 410 and its reason for 'ResourceExpired'.
  • Look at its data field, which contains the body of the error response.
  • Somehow extract the new continuation token from the response body.

The documentation does not specify how the token is sent "together with" the error. Is it the entire response body? Is it contained within some field of a serialized object? If so, what type is it and is there a way to get the Kubernetes Python API to deserialize it for me?

The underlying problem may go back to the actual REST API docs at https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#list-job-v1-batch where there is a possible 410 response code documented in the description of the continue parameter, but no documentation for it or its body in the table of possible response codes.

I think the answers may be in https://github.com/kubernetes/community/blob/79748734e6225769eb5186d0496adeb9f64789cf/contributors/design-proposals/api-machinery/api-chunking.md#handling-expired-resource-versions which gives an example in which a JSON response body appears to be returned. Is it safe to rely on the response body for this error type always being JSON? If not, is there a class I can use to deserailize it with deserialize()?

I think @smarterclayton originally wrote up the whole system, and so may have some insight at least on the REST end of things.

@adamnovak adamnovak added the kind/documentation Categorizes issue or PR as related to documentation. label Sep 12, 2019
@smarterclayton
Copy link

smarterclayton commented Sep 13, 2019 via email

@adamnovak
Copy link
Author

How do I tell a list returned by list_namespaced_job describing a successful next page from a list that represents a 410 error caused by my having submitted an expired continue token?

@roycaihw
Copy link
Member

iiuc the question is "how to read the continue token using the python client in a 410 response from the apiserver"

from a high level, apiserver always return a response body of Status kind when an error occurs, so the client should deserialize the response body into a Status object (python model, golang type) and read the .metadata.continue field

you're right that using this client you can catch the ApiError and deserialize the data field

it would be better if the client deserializes the Status for you when it sees the http status code >= 300 or < 200. This requires more plumbing:

@roycaihw roycaihw added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 18, 2019
@roycaihw
Copy link
Member

@scottilee An example using paging list and showing how to deal with expired continue token may be useful

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2019
@adamnovak
Copy link
Author

Has anyone decided to actually do this? Or should we just close it as being a component of kubernetes/kubernetes#69014 ?

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 18, 2020
@adamnovak
Copy link
Author

I don't believe that this has yet been documented/implemented; it is still an open problem.

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 21, 2020
@roycaihw
Copy link
Member

I agree this is still a problem. I'm trying to convert this into a good first issue / help wanted issue, so I'm writing down what I think is needed here.

I think we first need an example that does the following:

  • send a list request with a expired token that will trigger a 410 (either by sleeping 15 minutes or using a fake token)
  • catch the ApiError
  • read the data field, either as a dict, or deserializing it into V1Status
  • read the _continue token out of the metadata
  • send a list request with the continue token to retrieve a inconsistent list.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2020
@adamnovak
Copy link
Author

I don't think this has yet been addressed.

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2020
@adamnovak
Copy link
Author

/remove-lifecycle stale

This is still outstanding I think

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 14, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 12, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants