Keep the watch action working all the time #124

vklonghml · 2017-02-13T07:52:44Z

Edited by @mbohlool:
This question lead to an action item to add a retry mechanism to watch class. It should be controlled by a flag and will result in keeping the watch open all the time.

Original post:

below is how i use the client-python in list.py:

config.load_kube_config()
v1 = client.CoreV1Api()
w = watch.Watch()
for event in w.stream(v1.list_persistent_volume_claim_for_all_namespaces):
print("Event: %s %s" % (event['type'], event['object'].metadata.name))

when i run the script with command "python list.py", it will show the event normally,
however i will exited automatic in several minitues.

does anybody konws how could i keep this watch action working all the time?

mbohlool · 2017-02-13T21:52:10Z

Do you get any error? did you try to set timeout_seconds parameter?

webwurst · 2017-02-15T11:14:15Z

I see this happening with one certain cluster, but it works fine with another. So this might be some timeout on somthing in between, like load-balancer or so. But it would be cool if the watch could just do a retry in case the connection is lost.

vklonghml · 2017-02-15T11:17:07Z

@mbohlool i don't set the timeout_seconds parameter, use the default value, but how can i keep the watch work all the time, like the @webwurst said.

ivar-lazzaro · 2017-05-06T18:23:01Z

+1.
I have several threads using the watcher to monitor different types of K8S resources, and I see their connection close every 30-35 min[0]. My application then resumes the watch but when that happens all the items for a given type are returned, which is undesirable. Not sure what is causing this 30 min timeout, the API server runs in a pod on the same host as my watchers do.

Is there a way to store a watch "version number" every time I receive an event that can be used whenever I resume the stream, so that I only get events subsequent to that point?

[0] I can tell looking at the list response "closed" attribute.

mbohlool · 2017-05-06T21:38:39Z

look at resource_version parameter. the returned object's metadata should also have resource_version. pass resource_version of the last received object to the new call.

ivar-lazzaro · 2017-05-09T00:07:46Z

Thanks for the answer.
I'm not sure how to use resource_version without doing a full list for that object type first. When the streaming process is just started it makes sense, but I want to avoid a full list call every time the connection breaks. When I get an event from the watch.stream function I get something like this:

{'raw_object': {u'status': {u'phase': u'Active'}, u'kind': u'Namespace', u'spec': {u'finalizers': [u'kubernetes']}, u'apiVersion': u'v1', u'metadata': {u'name': u'demo', u'resourceVersion': u'921', u'creationTimestamp': u'2017-04-28T22:23:54Z', u'selfLink': u'/api/v1/namespacesdemo', u'uid': u'5d5cd386-2c61-11e7-9d39-84b261c2790e'}}, u'object': {u'status': {u'phase': u'Active'}, u'kind': u'Namespace', u'spec': {u'finalizers': [u'kubernetes']}, u'apiVersion': u'v1', u'metadata': {u'name': u'demo', u'resourceVersion': u'921', u'creationTimestamp': u'2017-04-28T22:23:54Z', u'selfLink': u'/api/v1/namespacesdemo', u'uid': u'5d5cd386-2c61-11e7-9d39-84b261c2790e'}}, u'type': u'ADDED'}

The only resourceVersion I see here is the one concerning the specific k8s object. Should I set that value as the resource_version argument?

mbohlool · 2017-05-09T00:40:31Z

every time you get an event, store event['object'].metadata.resource_version into a variable (let say last_seen_version) and when you want to reconnect, pass it like this w.stream(v1.list_namespace, resource_version=last_seen_version)

Fixes issue: kubernetes-client/python#124

lichen2013 · 2017-11-03T03:25:05Z

I believe the root cause of this issue is the default timeout setting in kube-apiserver:

https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/get.go#L247

        serveWatch(watcher, scope, req, w, timeout)

While the timeout is:

https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/get.go#L231-L237

        timeout := time.Duration(0)
        if opts.TimeoutSeconds != nil {
        	timeout = time.Duration(*opts.TimeoutSeconds) * time.Second
        }
        if timeout == 0 && minRequestTimeout > 0 {
        	timeout = time.Duration(float64(minRequestTimeout) * (rand.Float64() + 1.0))
        }

mbohlool · 2017-11-03T17:48:23Z

@lichen2013 nice catch but regardless of this I think we should get your reconnect PR in.

@caesarxuchao, you looked at timeout issue before, those this mean API server times out for watch calls and our shared informer reconnect?

cc @roycaihw

max-rocket-internet · 2019-02-06T14:12:49Z

Any update on this? I have this code to watch events:

import json
import os
from kubernetes import client, config, watch


if 'KUBERNETES_PORT' in os.environ:
    config.load_incluster_config()
else:
    config.load_kube_config()


v1 = client.CoreV1Api()
w = watch.Watch()

for event in w.stream(v1.list_event_for_all_namespaces, _request_timeout=60):
    print(json.dumps(event['raw_object']))

It runs but if there is no events for some extended amount of time it then dies with this:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 331, in _error_catcher
    yield
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 637, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 569, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "kubernetes/charts/common/event-logger/files/event-watcher.py", line 18, in <module>
    for event in w.stream(v1.list_event_for_all_namespaces, _request_timeout=60):
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 130, in stream
    for line in iter_resp_lines(resp):
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 45, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 665, in read_chunked
    self._original_response.close()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 336, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='xxxxxxxxxxxx.yl4.eu-west-1.eks.amazonaws.com', port=443): Read timed out.

Am I missing something?

max-rocket-internet · 2019-02-06T14:54:00Z

Ah I think maybe my problem is the _request_timeout=60 part? If I remove this it runs indefinitely 😅

fejta-bot · 2019-05-07T15:51:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-06-06T16:34:38Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-07-06T17:22:37Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-07-06T17:22:44Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xuejiezhang · 2022-03-23T22:33:27Z

Any update on this? I have this code to watch events:

import json
import os
from kubernetes import client, config, watch


if 'KUBERNETES_PORT' in os.environ:
    config.load_incluster_config()
else:
    config.load_kube_config()


v1 = client.CoreV1Api()
w = watch.Watch()

for event in w.stream(v1.list_event_for_all_namespaces, _request_timeout=60):
    print(json.dumps(event['raw_object']))

It runs but if there is no events for some extended amount of time it then dies with this:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 331, in _error_catcher
    yield
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 637, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 569, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "kubernetes/charts/common/event-logger/files/event-watcher.py", line 18, in <module>
    for event in w.stream(v1.list_event_for_all_namespaces, _request_timeout=60):
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 130, in stream
    for line in iter_resp_lines(resp):
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 45, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 665, in read_chunked
    self._original_response.close()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 336, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='xxxxxxxxxxxx.yl4.eu-west-1.eks.amazonaws.com', port=443): Read timed out.

Am I missing something?

I got the same issue. is it fixed or any workaround?

Fixes issue: kubernetes-client/python#124

mbohlool changed the title ~~[QUESTION]how to keep the watch action working all the time~~ Keep the watch action working all the time Mar 7, 2017

mbohlool added the help-needed label Mar 7, 2017

lichen2013 added a commit to lichen2013/python-base that referenced this issue Oct 17, 2017

Add flag to enable keep the watch action working all the time

ee58269

Fixes issue: kubernetes-client/python#124

lichen2013 mentioned this issue Oct 17, 2017

Keep the watch action working forever kubernetes-client/python-base#36

Merged

lichen2013 added a commit to lichen2013/python-base that referenced this issue Oct 17, 2017

Add flag to enable keep the watch action working all the time

94ec32b

Fixes issue: kubernetes-client/python#124

lichen2013 added a commit to lichen2013/python-base that referenced this issue Oct 17, 2017

Add flag to enable keep the watch action working all the time

48223bf

Fixes issue: kubernetes-client/python#124

lichen2013 added a commit to lichen2013/python-base that referenced this issue Oct 17, 2017

Add flag to enable keep the watch action working all the time

4f815d4

Fixes issue: kubernetes-client/python#124

lichen2013 added a commit to lichen2013/python-base that referenced this issue Oct 26, 2017

Add flag to enable keep the watch action working all the time

8f7b490

Fixes issue: kubernetes-client/python#124

umohnani8 mentioned this issue Dec 5, 2017

Use kubernetes client to watch api FNNDSC/pman#26

Merged

cben mentioned this issue Jan 10, 2018

Add an API concepts document and describe terminology and API chunking kubernetes/website#6540

Merged

roycaihw added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 30, 2018

missedone mentioned this issue Mar 24, 2019

i/o timeout communicating from kubewatch to master vmware-archive/kubewatch#161

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 6, 2019

k8s-ci-robot closed this as completed Jul 6, 2019

ellieayla mentioned this issue Jul 8, 2019

Implement an Informer in python-client #868

Open

mudler mentioned this issue Jan 29, 2020

Consume RetryWatcher, or provide an option to switch default implementation cloudfoundry-incubator/eirinix#20

Closed

yqwang-ms mentioned this issue Sep 3, 2020

Add HiveD scheduler config adapter for CA microsoft/pai#4868

Merged

petermicuch mentioned this issue Sep 28, 2020

Watcher could implement either keep alive or retry mechanism kubernetes-client/csharp#486

Closed

chrisayoub mentioned this issue Feb 24, 2021

Fix bug with Watch and 410 retries kubernetes-client/python-base#227

Merged

dudleyhunt86 added a commit to dudleyhunt86/python-base-repository-build that referenced this issue Oct 7, 2022

Add flag to enable keep the watch action working all the time

56beed4

Fixes issue: kubernetes-client/python#124

abdul5497 pushed a commit to abdul5497/python-dapp that referenced this issue Apr 1, 2024

Add flag to enable keep the watch action working all the time

fdbfa47

Fixes issue: kubernetes-client/python#124

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep the watch action working all the time #124

Keep the watch action working all the time #124

vklonghml commented Feb 13, 2017 •

edited by mbohlool

Loading

mbohlool commented Feb 13, 2017

webwurst commented Feb 15, 2017

vklonghml commented Feb 15, 2017

ivar-lazzaro commented May 6, 2017

mbohlool commented May 6, 2017

ivar-lazzaro commented May 9, 2017

mbohlool commented May 9, 2017 •

edited

Loading

lichen2013 commented Nov 3, 2017

mbohlool commented Nov 3, 2017

max-rocket-internet commented Feb 6, 2019 •

edited

Loading

max-rocket-internet commented Feb 6, 2019

fejta-bot commented May 7, 2019

fejta-bot commented Jun 6, 2019

fejta-bot commented Jul 6, 2019

k8s-ci-robot commented Jul 6, 2019

xuejiezhang commented Mar 23, 2022

Keep the watch action working all the time #124

Keep the watch action working all the time #124

Comments

vklonghml commented Feb 13, 2017 • edited by mbohlool Loading

mbohlool commented Feb 13, 2017

webwurst commented Feb 15, 2017

vklonghml commented Feb 15, 2017

ivar-lazzaro commented May 6, 2017

mbohlool commented May 6, 2017

ivar-lazzaro commented May 9, 2017

mbohlool commented May 9, 2017 • edited Loading

lichen2013 commented Nov 3, 2017

mbohlool commented Nov 3, 2017

max-rocket-internet commented Feb 6, 2019 • edited Loading

max-rocket-internet commented Feb 6, 2019

fejta-bot commented May 7, 2019

fejta-bot commented Jun 6, 2019

fejta-bot commented Jul 6, 2019

k8s-ci-robot commented Jul 6, 2019

xuejiezhang commented Mar 23, 2022

vklonghml commented Feb 13, 2017 •

edited by mbohlool

Loading

mbohlool commented May 9, 2017 •

edited

Loading

max-rocket-internet commented Feb 6, 2019 •

edited

Loading