handler threads don't always exit #153

Open
coffeeowl opened this Issue Mar 21, 2012 · 6 comments

3 participants

@coffeeowl

If connection with XMPP server was lost (even if it was clean server shutdown) event handling thread doesn't exit if reconnect attempt was made right after disconnect event. The problem is in while loop https://github.com/fritzy/SleekXMPP/blob/master/sleekxmpp/xmlstream/xmlstream.py#L1311

    try:
        while not self.stop.is_set():
            try:
                wait = self.wait_timeout
                event = self.event_queue.get(True, timeout=wait)
            except queue.Empty:
                event = None
            if event is None:
                continue

due to timeout, self.stop can be cleared before event_queue times out and when while makes new iteration self.stop is already cleared again.

DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
ERROR:sleekxmpp.xmlstream.xmlstream:Error reading from XML stream.
ERROR:sleekxmpp.basexmpp:no element found: line 1, column 734
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/sleekxmpp/xmlstream/xmlstream.py", line 1159, in _process
if not self.__read_xml():
File "/usr/local/lib/python2.7/dist-packages/sleekxmpp/xmlstream/xmlstream.py", line 1195, in __read_xml
for event, xml in ET.iterparse(self.filesocket, (b'end', b'start')):
File "", line 107, in next
ParseError: no element found: line 1, column 734
DEBUG:sleekxmpp.xmlstream.xmlstream:SEND (IMMED): /stream:stream
DEBUG:sleekxmpp.xmlstream.xmlstream:Waiting for /stream:stream from server
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! auto_reconnect: False
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! stop set
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! stop - True
DEBUG:sleekxmpp.thirdparty.statemachine: ==== TRANSITION connected -> disconnected
DEBUG:sleekxmpp.xmlstream.xmlstream:!!!!!!! stop cleared !!!!!!!!
DEBUG:sleekxmpp.xmlstream.xmlstream:Trying to connect to 127.0.0.1:5222
DEBUG:sleekxmpp.xmlstream.xmlstream:Waiting 2.01526779831 seconds before connecting.
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! waiting for events
DEBUG:sleekxmpp.xmlstream.xmlstream:!!! no events

Also this looks strange https://github.com/fritzy/SleekXMPP/blob/master/sleekxmpp/xmlstream/xmlstream.py#L1373

        while not self.stop.is_set():
            while not self.stop.is_set and \
                  not self.session_started_event.is_set():

is_set as a property?

@legastero
Collaborator

As for the strangeness with is_set, that has already been fixed in the develop branch.

@legastero
Collaborator

Would you mind sharing what your main app loop looks like with how you're handling connections/reconnects, etc.

It seems from your recent issues that it is just different enough from the way we write Sleek-based apps here that our testing process isn't catching these kinds of bugs.

@coffeeowl

It is nothing special, simple loop with connect -> process:

while node.run:
    try:
        if node.connect(reattempt=False):
            node.process(threaded=False)
        else:
            raise Exception("connect() failed")
    except Exception, e:
        time.sleep(15)

remove_pid()
sys.exit(0)

Everything else before "while" is code for daemonization, privilege dropping, settings and signal handling and so on.

Maybe I should decrease upper bound for timeout in SleekXMPP and let it handle reconnects... or at least wait a second before calling connect() after process() exits. But anyway, I guess my code is legal, the functions are public and I call them in right order.

@legastero
Collaborator

That is certainly legal code that ought to work. As a temporary measure, adding in a two second delay between connection attempts should be plenty to ensure the threads detect the stop signal.

I'm currently investigating keeping a thread count and not letting disconnect() return until all threads have exited, which effectively would add that two second delay to the disconnect() call.

@duja

Is there any progress on this issue?

@legastero
Collaborator

Develop branch now tracks the main threads and waits for them to exit before returning from disconnect().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment