get events from python #37018

tsaridas · 2016-10-14T18:40:08Z

Description of Issue/Question

Trying to follow the guide at to get events from a python scripts using the salt.utils.event class,
I noticed that when master is restarted the events do not keep coming any more.
Saltstack Documentation

I noticed @DmitryKuzmenko did some changes for the minions and syndics in order to connect after disconnection. Maybe he could suggest something similar for the specific case ?

I also tried using iter_events with the same results.

Also get_event_block in a while loop throughs an exception but when I try to reconnect It fails.

Steps to Reproduce Issue

Follow the steps at the documentation and restart master.

Versions Report

latest 2016.3.3

Ch3LL · 2016-10-14T20:37:26Z

yes I believe @DmitryKuzmenko would definitely be the one with the definitive answer.

But I would think the reason monitoring events with the master stops after restarting the master is because you would need to re-inialize the connection to the master. The same thing happens if I were to run salt-run state.event pretty=True if I restart the master I have to re-run that command to view events but i think dmitry will have more knowledge around this and might have a workaround/suggestion. Thanks

DmitryKuzmenko · 2016-10-17T07:09:53Z

@Ch3LL is right. When master restarts it recreates the unix domain socket PUB socket. So after master restart client continues to listen the removed socket file while master publishes to the new socket file. But I'll take a closer look to provide a way to handle this.

tsaridas · 2016-10-17T07:32:28Z

thanks @DmitryKuzmenko,

I understand that is the same socket file and it would be nice if I had a method checking the connection or maybe automatically connect back.
connect_pub() returns True after restart but it still doesn't show any events after reconnecting.

I tried to see where the exactly does this get stuck but I wasn't able to.

ps: I already added your patch for syndic reconnection thinking that it might help but no luck.

DmitryKuzmenko · 2016-10-17T12:35:15Z

@tsaridas
After thinking slightly more I've got that actually when publisher closes the socket all the clients get StreamClosedError. And the mentioned Syndic fix introduces this error handling. After looking up the code I've got that we have (now) the way to handle the error in async mode like syndic does but we have no way to detect disconnection in sync client mode, that is described in the docs. It's because for the user disconnect looks similar to timeout because of this:
http://github.com/saltstack/salt/blob/develop/salt/utils/event.py#L532-L533

So client code gets None in both cases: timeout and error. We have to fix this.

tsaridas · 2016-10-17T17:05:12Z

@DmitryKuzmenko thanks for checking.

Would you be able to provide a quick fix so that the client connects automatically ?

For some reason even when I try to get the exception from get_event_noblock() and initialise the var = salt.utils.event.get_event() I cannot reconnect anymore.

var.get_event_noblock() just hangs.

I tried settings all the variables to None but I still get the same.

Thanks

DmitryKuzmenko · 2016-10-18T07:36:44Z

@tsaridas ah, sure, there are more special get_event_noblock() and get_event_block() that don't hide the exception. I've tried to do things you want and everything looks working:

#!/bin/env python 
import salt.config
import salt.utils.event
import sys
import time
import tornado

opts = salt.config.client_config('/etc/salt/master')

event = salt.utils.event.get_event(
        'master',
        sock_dir=opts['sock_dir'],
        transport=opts['transport'],
        opts=opts)

while True:
    try:
        while True:
            data = event.get_event_block()
            print(data)
    except tornado.iostream.StreamClosedError as ex:
        print(ex)
        event.close_pub()
        tries = 0
        while not event.cpub:
            if tries > 0:
                time.sleep(1)
            tries += 1
            print('Reconnecting #{0}'.format(tries))
            event.connect_pub()
    except:
        print('Some exception: {0}'.format(sys.exc_info()[0]))
        break

print('Done')

tsaridas · 2016-10-18T10:23:00Z

@DmitryKuzmenko ,

that doesn't work for me. It gets stuck, as I expected, at event.connect_pub() and never times out.

I also tried setting event.connect_pub(timeout=10) but without success.

I see that you are working with the develop branch hence the issue might be fixed there.

I'll test with the develop branch later on this evening and post the results.

ps : Did you try to restart master when running the script :) ? just checking ..

Thanks

tsaridas · 2016-10-18T10:53:51Z

Tested and it works in develop branch. Not sure which ticket solved this though.

I guess it was my mistake I didn't test in develop. Sorry for the inconvenience and thanks for the help @DmitryKuzmenko

szjur · 2016-10-18T17:28:40Z

@DmitryKuzmenko, any idea how to make it work in 2016.3.3? There is no close_pub() there, I tried to do what it does based on the latest code but it always hangs on connect_pub() even if you wait a minute and the master is surely up again at that time.

DmitryKuzmenko · 2016-10-19T07:37:31Z

@szjur you can either re-create the SaltEvent or use the following code instead of missing close_pub():

        event.subscriber.close()
        event.subscriber = None
        event.pending_events = []
        event.cpub = False

tsaridas · 2016-10-19T19:03:31Z

@Ch3LL @DmitryKuzmenko thank you for your help.
Issue can be closed.

szjur · 2016-10-19T20:26:12Z

@DmitryKuzmenko you do realise that what you suggested is exactly what I said I tried? That doesn't help, at least not with 2016.3.3 on RHEL6. @Ch3LL, I don't know why @tsaridas closed it - most likely he found the code change that fixes this bug but just closing it light that doesn't really help the stable branches to get better and may be a bit confusing to anyone stumbling on this bug and reading this thread.

DmitryKuzmenko · 2016-10-20T07:22:32Z

@szjur ah! I see, sorry. I'll take a closer look.

tsaridas · 2016-10-20T09:28:25Z

I could reopen the ticket if you think it should correspond to the stable version being buggy.

fyi- the pr that fixed the issue in 2016.3.3 was #32329

Not sure why though since the specific one has to do with serializer ... I only patched ipc.py and transport/frame.py together with your syndic patch.

I will take another look at the changes one more time and try to see what was the exact line that fixed it.

Thnx

DmitryKuzmenko · 2016-10-20T12:52:46Z

I agree with @szjur. Also I've checked the latest 2016.3 branch and the bug is there: SaltEvent hangs on reconnecting to the event bus.
The issue is fixed in carbon branch by this PR: #36720
I think it's needed to be backported into 2016.3 branch.

DmitryKuzmenko · 2016-10-20T13:10:33Z

I've backported the fix to 2016.3. @cachedout could you please take a look?

DmitryKuzmenko · 2016-10-20T13:11:10Z

@szjur, @tsaridas thank you for help!

szjur · 2016-10-20T21:20:54Z

Excellent, thanks a lot @DmitryKuzmenko :-) We'll also test it thoroughly tomorrow.

szjur · 2016-10-21T18:46:41Z

@DmitryKuzmenko, I tested 2016.3.3 with just the second part of the backport (d7e3209) + manually doing what the missing close_pub() does and that does the trick in the scenario you proposed (using get_event_block() and catching tornado.iostream.StreamClosedError). I gave the other part of the backport (82e2763) a miss for now, because it didn't apply cleanly over stock 2016.3.3 and I didn't feel like getting too hung up on that.
By the way iter_events still blocks forever after master restart. I'm fine using get_event_block() but maybe at some point it could also be addressed as that problem limits the usability of iter_events().

DmitryKuzmenko · 2016-10-24T10:21:58Z

I'd like to address this question to @cachedout and @thatch45.
In simple words: salt.utils.event has different error handling in function allowing to get events.
get_event_block, get_event_noblock: return None if there's no data, raises StreamClosedError if remote peer is disconnected.
get_event(): returns None if there's no data or error occurred, we have no way to detect errors.
iter_events(): continues waiting for events if got None from get_event(), i.e. it hangs if connection is broken.
Shouldn't we change get_event() and iter_events() behavior?

thatch45 · 2016-10-24T16:35:56Z

This is a good question, I think the optimal situation would be to allow the user to pass the option into get_event() to expose errors raised

szjur · 2016-10-24T20:10:30Z

In any case iter_events() hanging indefinitely after master restart with no way whatsoever to catch it and handle is beyond any user's expectation. It should at least return None.

jeanpralo · 2016-11-03T19:00:27Z

I am just wondering why should we be passing an option to raise an exception ? I mean if it does not raise anything, this is sort of broken by default no ?

Anyway all in favor of fixing it for get_event and so for iter_events as well.

tsaridas · 2016-11-06T09:22:29Z

with a quick search I see that iter_events is being used in reactor and cherrypy so yes it would make sense to raise an exception and try to catch it in all places being used or all the above would have issues when master is restarted.

thatch45 · 2016-11-07T00:28:56Z

My concern is to avoid changing behavior without explicitly adding the option. Many people use the event system api and I don't like pulling the rug out from under people unless we have to. Hence why I would rather make it an option.

jeanpralo · 2016-11-07T00:41:08Z

True, makes sense.

Well #37438 is fixing so I guess we can close #37335 ?

tsaridas · 2016-11-07T20:59:32Z

verified fix for 2016.3.4 and get_event_noblock

DmitryKuzmenko · 2016-11-10T15:15:49Z

It looks I've done here.

Ch3LL added the Question The issue is more of a question rather than a bug or a feature request label Oct 14, 2016

Ch3LL added this to the Approved milestone Oct 14, 2016

tsaridas closed this as completed Oct 19, 2016

DmitryKuzmenko reopened this Oct 20, 2016

DmitryKuzmenko added Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt and removed Question The issue is more of a question rather than a bug or a feature request labels Oct 20, 2016

DmitryKuzmenko mentioned this issue Oct 20, 2016

Backport/36720 fix race condition #37115

Merged

meggiebot modified the milestones: C 2, Approved Oct 20, 2016

meggiebot assigned DmitryKuzmenko Oct 20, 2016

meggiebot added the TEAM Core label Oct 20, 2016

DmitryKuzmenko modified the milestones: C 1, C 2 Nov 1, 2016

This was referenced Nov 3, 2016

raise exception on ipc msg subscriber so we can deal with them #37335

Closed

Fix for #37238 salt hang on master restart #37438

Merged

cachedout added the fixed-pls-verify fix is linked, bug author to confirm fix label Nov 4, 2016

DmitryKuzmenko mentioned this issue Nov 10, 2016

Handle master restart in appropriate places using salt.event listener. #37602

Merged

tsaridas closed this as completed Nov 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get events from python #37018

get events from python #37018

tsaridas commented Oct 14, 2016 •

edited

Loading

Ch3LL commented Oct 14, 2016

DmitryKuzmenko commented Oct 17, 2016

tsaridas commented Oct 17, 2016

DmitryKuzmenko commented Oct 17, 2016

tsaridas commented Oct 17, 2016 •

edited

Loading

DmitryKuzmenko commented Oct 18, 2016

tsaridas commented Oct 18, 2016 •

edited

Loading

tsaridas commented Oct 18, 2016 •

edited

Loading

szjur commented Oct 18, 2016

DmitryKuzmenko commented Oct 19, 2016

tsaridas commented Oct 19, 2016

szjur commented Oct 19, 2016 •

edited

Loading

DmitryKuzmenko commented Oct 20, 2016

tsaridas commented Oct 20, 2016

DmitryKuzmenko commented Oct 20, 2016

DmitryKuzmenko commented Oct 20, 2016

DmitryKuzmenko commented Oct 20, 2016

szjur commented Oct 20, 2016

szjur commented Oct 21, 2016

DmitryKuzmenko commented Oct 24, 2016

thatch45 commented Oct 24, 2016

szjur commented Oct 24, 2016

jeanpralo commented Nov 3, 2016

tsaridas commented Nov 6, 2016 •

edited

Loading

thatch45 commented Nov 7, 2016

jeanpralo commented Nov 7, 2016

tsaridas commented Nov 7, 2016

DmitryKuzmenko commented Nov 10, 2016

get events from python #37018

get events from python #37018

Comments

tsaridas commented Oct 14, 2016 • edited Loading

Description of Issue/Question

Steps to Reproduce Issue

Versions Report

Ch3LL commented Oct 14, 2016

DmitryKuzmenko commented Oct 17, 2016

tsaridas commented Oct 17, 2016

DmitryKuzmenko commented Oct 17, 2016

tsaridas commented Oct 17, 2016 • edited Loading

DmitryKuzmenko commented Oct 18, 2016

tsaridas commented Oct 18, 2016 • edited Loading

tsaridas commented Oct 18, 2016 • edited Loading

szjur commented Oct 18, 2016

DmitryKuzmenko commented Oct 19, 2016

tsaridas commented Oct 19, 2016

szjur commented Oct 19, 2016 • edited Loading

DmitryKuzmenko commented Oct 20, 2016

tsaridas commented Oct 20, 2016

DmitryKuzmenko commented Oct 20, 2016

DmitryKuzmenko commented Oct 20, 2016

DmitryKuzmenko commented Oct 20, 2016

szjur commented Oct 20, 2016

szjur commented Oct 21, 2016

DmitryKuzmenko commented Oct 24, 2016

thatch45 commented Oct 24, 2016

szjur commented Oct 24, 2016

jeanpralo commented Nov 3, 2016

tsaridas commented Nov 6, 2016 • edited Loading

thatch45 commented Nov 7, 2016

jeanpralo commented Nov 7, 2016

tsaridas commented Nov 7, 2016

DmitryKuzmenko commented Nov 10, 2016

tsaridas commented Oct 14, 2016 •

edited

Loading

tsaridas commented Oct 17, 2016 •

edited

Loading

tsaridas commented Oct 18, 2016 •

edited

Loading

tsaridas commented Oct 18, 2016 •

edited

Loading

szjur commented Oct 19, 2016 •

edited

Loading

tsaridas commented Nov 6, 2016 •

edited

Loading