How to deal with an expired continue token dring list_namespaced_job and silimar calls is not documented #953

adamnovak · 2019-09-12T19:31:34Z

Link to the issue (please include a link to the specific documentation or example):

See the documentation for list_namespaced_job, and specifically the section on _continue, and specifically the section on dealign with expired tokens:

If the specified continue value is no longer valid whether due to expiration (generally five to fifteen minutes) or a configuration change on the server, the server will respond with a 410 ResourceExpired error together with a continue token. If the kubernetes.client needs a consistent list, it must restart their list without the continue field. Otherwise, the kubernetes.client may send another list request with the token received with the 410 error, ...

Description of the issue (please include outputs or screenshots if possible):

How an expired token manifests at the Python level is not specified. Does the 410 error response from the server result in an ApiError being raised when the response is received by the RESTClient? If so, then to get the token you would have to:

Catch the ApiError
Check its status for 410 and its reason for 'ResourceExpired'.
Look at its data field, which contains the body of the error response.
Somehow extract the new continuation token from the response body.

The documentation does not specify how the token is sent "together with" the error. Is it the entire response body? Is it contained within some field of a serialized object? If so, what type is it and is there a way to get the Kubernetes Python API to deserialize it for me?

The underlying problem may go back to the actual REST API docs at https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#list-job-v1-batch where there is a possible 410 response code documented in the description of the continue parameter, but no documentation for it or its body in the table of possible response codes.

I think the answers may be in https://github.com/kubernetes/community/blob/79748734e6225769eb5186d0496adeb9f64789cf/contributors/design-proposals/api-machinery/api-chunking.md#handling-expired-resource-versions which gives an example in which a JSON response body appears to be returned. Is it safe to rely on the response body for this error type always being JSON? If not, is there a class I can use to deserailize it with deserialize()?

I think @smarterclayton originally wrote up the whole system, and so may have some insight at least on the REST end of things.

The text was updated successfully, but these errors were encountered:

smarterclayton · 2019-09-13T18:11:37Z

The continue token is part of all `*List` objects returned by Kubernetes. Anyone getting a list gets continue tokens

…

On Thu, Sep 12, 2019 at 3:31 PM Adam Novak ***@***.***> wrote: *Link to the issue (please include a link to the specific documentation or example)*: See the documentation for list_namespaced_job <https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/BatchV1Api.md#list_namespaced_job>, and specifically the section on _continue, and specifically the section on dealign with expired tokens: If the specified continue value is no longer valid whether due to expiration (generally five to fifteen minutes) or a configuration change on the server, the server will respond with a 410 ResourceExpired error together with a continue token. If the kubernetes.client needs a consistent list, it must restart their list without the continue field. Otherwise, the kubernetes.client may send another list request with the token received with the 410 error, ... *Description of the issue (please include outputs or screenshots if possible)*: How an expired token manifests at the Python level is not specified. Does the 410 error response from the server result in an ApiError being raised when the response is received by the RESTClient <https://github.com/kubernetes-client/python/blob/c4883541465fe6a276365bb3e3dfee34c4e0c296/kubernetes/client/rest.py#L221-L222>? If so, then to get the token you would have to: - Catch the ApiError - Check its status for 410 and its reason for 'ResourceExpired'. - Look at its data field, which contains the body of the error response. - Somehow extract the new continuation token from the response body. The documentation does not specify how the token is sent "together with" the error. Is it the entire response body? Is it contained within some field of a serialized object? If so, what type is it and is there a way to get the Kubernetes Python API to deserialize it for me? The underlying problem may go back to the actual REST API docs at https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#list-job-v1-batch where there is a possible 410 response code documented in the description of the continue parameter, but no documentation for it or its body in the table of possible response codes. I think the answers may be in https://github.com/kubernetes/community/blob/79748734e6225769eb5186d0496adeb9f64789cf/contributors/design-proposals/api-machinery/api-chunking.md#handling-expired-resource-versions which gives an example in which a JSON response body appears to be returned. Is it safe to rely on the response body for this error type always being JSON? If not, is there a class I can use to deserailize it with deserialize() <https://github.com/kubernetes-client/python/blob/c4883541465fe6a276365bb3e3dfee34c4e0c296/kubernetes/client/api_client.py#L228> ? I think @smarterclayton <https://github.com/smarterclayton> originally wrote up the whole system, and so may have some insight at least on the REST end of things. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#953?email_source=notifications&email_token=AAI37J3QOVWANXUOUZKCGTLQJKKJZA5CNFSM4IWIXZV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLCNKRA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAI37J5ESXWXTXXJVYR7OKTQJKKJZANCNFSM4IWIXZVQ> .

adamnovak · 2019-09-13T19:35:29Z

How do I tell a list returned by list_namespaced_job describing a successful next page from a list that represents a 410 error caused by my having submitted an expired continue token?

roycaihw · 2019-09-18T20:41:40Z

iiuc the question is "how to read the continue token using the python client in a 410 response from the apiserver"

from a high level, apiserver always return a response body of Status kind when an error occurs, so the client should deserialize the response body into a Status object (python model, golang type) and read the .metadata.continue field

you're right that using this client you can catch the ApiError and deserialize the data field

it would be better if the client deserializes the Status for you when it sees the http status code >= 300 or < 200. This requires more plumbing:

verify if the upstream code generator supports different response schema based on http status code
if so, fix kubernetes openapi to specify response schema when a non-2xx code is returned: Errors not Included in OpenAPI kubernetes/kubernetes#69014
regenerate this client library

roycaihw · 2019-09-18T20:45:18Z

@scottilee An example using paging list and showing how to deal with expired continue token may be useful

fejta-bot · 2019-12-17T21:42:56Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

adamnovak · 2019-12-19T18:39:27Z

Has anyone decided to actually do this? Or should we just close it as being a component of kubernetes/kubernetes#69014 ?

fejta-bot · 2020-01-18T19:12:28Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

adamnovak · 2020-01-21T19:36:02Z

I don't believe that this has yet been documented/implemented; it is still an open problem.

/remove-lifecycle rotten

roycaihw · 2020-03-12T01:10:09Z

I agree this is still a problem. I'm trying to convert this into a good first issue / help wanted issue, so I'm writing down what I think is needed here.

I think we first need an example that does the following:

send a list request with a expired token that will trigger a 410 (either by sleeping 15 minutes or using a fake token)
catch the ApiError
read the data field, either as a dict, or deserializing it into V1Status
read the _continue token out of the metadata
send a list request with the continue token to retrieve a inconsistent list.

fejta-bot · 2020-06-10T02:08:50Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

adamnovak · 2020-06-11T17:54:48Z

I don't think this has yet been addressed.

/remove-lifecycle stale

fejta-bot · 2020-09-09T18:52:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

adamnovak · 2020-09-14T15:42:32Z

/remove-lifecycle stale

This is still outstanding I think

fejta-bot · 2020-12-13T15:53:49Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-01-12T16:38:59Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2021-02-11T17:23:48Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-02-11T17:23:49Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

adamnovak added the kind/documentation Categorizes issue or PR as related to documentation. label Sep 12, 2019

roycaihw added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 18, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 18, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 21, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 14, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 12, 2021

k8s-ci-robot closed this as completed Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with an expired continue token dring list_namespaced_job and silimar calls is not documented #953

How to deal with an expired continue token dring list_namespaced_job and silimar calls is not documented #953

adamnovak commented Sep 12, 2019

smarterclayton commented Sep 13, 2019 via email

adamnovak commented Sep 13, 2019

roycaihw commented Sep 18, 2019

roycaihw commented Sep 18, 2019

fejta-bot commented Dec 17, 2019

adamnovak commented Dec 19, 2019

fejta-bot commented Jan 18, 2020

adamnovak commented Jan 21, 2020

roycaihw commented Mar 12, 2020

fejta-bot commented Jun 10, 2020

adamnovak commented Jun 11, 2020

fejta-bot commented Sep 9, 2020

adamnovak commented Sep 14, 2020

fejta-bot commented Dec 13, 2020

fejta-bot commented Jan 12, 2021

fejta-bot commented Feb 11, 2021

k8s-ci-robot commented Feb 11, 2021

How to deal with an expired continue token dring list_namespaced_job and silimar calls is not documented #953

How to deal with an expired continue token dring list_namespaced_job and silimar calls is not documented #953

Comments

adamnovak commented Sep 12, 2019

smarterclayton commented Sep 13, 2019 via email

adamnovak commented Sep 13, 2019

roycaihw commented Sep 18, 2019

roycaihw commented Sep 18, 2019

fejta-bot commented Dec 17, 2019

adamnovak commented Dec 19, 2019

fejta-bot commented Jan 18, 2020

adamnovak commented Jan 21, 2020

roycaihw commented Mar 12, 2020

fejta-bot commented Jun 10, 2020

adamnovak commented Jun 11, 2020

fejta-bot commented Sep 9, 2020

adamnovak commented Sep 14, 2020

fejta-bot commented Dec 13, 2020

fejta-bot commented Jan 12, 2021

fejta-bot commented Feb 11, 2021

k8s-ci-robot commented Feb 11, 2021