-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get events from python #37018
Comments
yes I believe @DmitryKuzmenko would definitely be the one with the definitive answer. But I would think the reason monitoring events with the master stops after restarting the master is because you would need to re-inialize the connection to the master. The same thing happens if I were to run |
@Ch3LL is right. When master restarts it recreates the unix domain socket PUB socket. So after master restart client continues to listen the removed socket file while master publishes to the new socket file. But I'll take a closer look to provide a way to handle this. |
thanks @DmitryKuzmenko, I understand that is the same socket file and it would be nice if I had a method checking the connection or maybe automatically connect back. I tried to see where the exactly does this get stuck but I wasn't able to. ps: I already added your patch for syndic reconnection thinking that it might help but no luck. |
@tsaridas So client code gets None in both cases: timeout and error. We have to fix this. |
@DmitryKuzmenko thanks for checking. Would you be able to provide a quick fix so that the client connects automatically ? For some reason even when I try to get the exception from get_event_noblock() and initialise the var = salt.utils.event.get_event() I cannot reconnect anymore. var.get_event_noblock() just hangs. I tried settings all the variables to None but I still get the same. Thanks |
@tsaridas ah, sure, there are more special #!/bin/env python
import salt.config
import salt.utils.event
import sys
import time
import tornado
opts = salt.config.client_config('/etc/salt/master')
event = salt.utils.event.get_event(
'master',
sock_dir=opts['sock_dir'],
transport=opts['transport'],
opts=opts)
while True:
try:
while True:
data = event.get_event_block()
print(data)
except tornado.iostream.StreamClosedError as ex:
print(ex)
event.close_pub()
tries = 0
while not event.cpub:
if tries > 0:
time.sleep(1)
tries += 1
print('Reconnecting #{0}'.format(tries))
event.connect_pub()
except:
print('Some exception: {0}'.format(sys.exc_info()[0]))
break
print('Done') |
that doesn't work for me. It gets stuck, as I expected, at event.connect_pub() and never times out. I also tried setting event.connect_pub(timeout=10) but without success. I see that you are working with the develop branch hence the issue might be fixed there. I'll test with the develop branch later on this evening and post the results. ps : Did you try to restart master when running the script :) ? just checking .. Thanks |
Tested and it works in develop branch. Not sure which ticket solved this though. I guess it was my mistake I didn't test in develop. Sorry for the inconvenience and thanks for the help @DmitryKuzmenko |
@DmitryKuzmenko, any idea how to make it work in 2016.3.3? There is no close_pub() there, I tried to do what it does based on the latest code but it always hangs on connect_pub() even if you wait a minute and the master is surely up again at that time. |
@szjur you can either re-create the SaltEvent or use the following code instead of missing event.subscriber.close()
event.subscriber = None
event.pending_events = []
event.cpub = False |
@Ch3LL @DmitryKuzmenko thank you for your help. |
@DmitryKuzmenko you do realise that what you suggested is exactly what I said I tried? That doesn't help, at least not with 2016.3.3 on RHEL6. @Ch3LL, I don't know why @tsaridas closed it - most likely he found the code change that fixes this bug but just closing it light that doesn't really help the stable branches to get better and may be a bit confusing to anyone stumbling on this bug and reading this thread. |
@szjur ah! I see, sorry. I'll take a closer look. |
I could reopen the ticket if you think it should correspond to the stable version being buggy. fyi- the pr that fixed the issue in 2016.3.3 was #32329 Not sure why though since the specific one has to do with serializer ... I only patched ipc.py and transport/frame.py together with your syndic patch. I will take another look at the changes one more time and try to see what was the exact line that fixed it. Thnx |
I've backported the fix to 2016.3. @cachedout could you please take a look? |
Excellent, thanks a lot @DmitryKuzmenko :-) We'll also test it thoroughly tomorrow. |
@DmitryKuzmenko, I tested 2016.3.3 with just the second part of the backport (d7e3209) + manually doing what the missing close_pub() does and that does the trick in the scenario you proposed (using get_event_block() and catching tornado.iostream.StreamClosedError). I gave the other part of the backport (82e2763) a miss for now, because it didn't apply cleanly over stock 2016.3.3 and I didn't feel like getting too hung up on that. |
I'd like to address this question to @cachedout and @thatch45. |
This is a good question, I think the optimal situation would be to allow the user to pass the option into |
In any case iter_events() hanging indefinitely after master restart with no way whatsoever to catch it and handle is beyond any user's expectation. It should at least return None. |
I am just wondering why should we be passing an option to raise an exception ? I mean if it does not raise anything, this is sort of broken by default no ? Anyway all in favor of fixing it for get_event and so for iter_events as well. |
with a quick search I see that iter_events is being used in reactor and cherrypy so yes it would make sense to raise an exception and try to catch it in all places being used or all the above would have issues when master is restarted. |
My concern is to avoid changing behavior without explicitly adding the option. Many people use the event system api and I don't like pulling the rug out from under people unless we have to. Hence why I would rather make it an option. |
verified fix for 2016.3.4 and get_event_noblock |
It looks I've done here. |
Description of Issue/Question
Trying to follow the guide at to get events from a python scripts using the salt.utils.event class,
I noticed that when master is restarted the events do not keep coming any more.
Saltstack Documentation
I noticed @DmitryKuzmenko did some changes for the minions and syndics in order to connect after disconnection. Maybe he could suggest something similar for the specific case ?
I also tried using iter_events with the same results.
Also get_event_block in a while loop throughs an exception but when I try to reconnect It fails.
Steps to Reproduce Issue
Follow the steps at the documentation and restart master.
Versions Report
latest 2016.3.3
The text was updated successfully, but these errors were encountered: