ENH: webagg: Handle ioloop shutdown correctly #6334

Merged
merged 1 commit into from May 10, 2016

Conversation

Projects
None yet
4 participants
Contributor

perimosocordiae commented Apr 25, 2016

This change correctly triggers the ioloop.stop() method, allowing the user to call plt.show(), then resume control with a SIGINT, then call plt.show() again.

Without this change, any attempt to call plt.show() after killing the webagg server the first time would result in an error because the ioloop instance hadn't been properly stopped.

The motivation for using a signal handler comes from this SO answer, which is the only solution I could find that worked as intended.

mdboom added the needs_review label Apr 25, 2016

Contributor

perimosocordiae commented Apr 25, 2016

The AppVeyor error was caused by one of the builds failing to download Miniconda correctly.

tacaswell added this to the 2.1 (next point release) milestone Apr 26, 2016

Owner

tacaswell commented Apr 26, 2016

wouldn't it be simpler to just set cls.started = False when shutting the loop down?

If we are going to install a signal handler we need to capture the current handler, call it as part of our handler and then clean up by restoring the previous handler.

Contributor

perimosocordiae commented Apr 26, 2016

I'm not sure I understand your question. cls.started is used to prevent the WebAggApplication from starting the tornado IOLoop when it's already running. The problem with the existing code (that this PR fixes) is that the KeyboardInterrupt we catch doesn't actually stop the IOLoop, but does set cls.started = False. That means that the next time someone tries to start it, tornado throws an exception because as far as it knows, the IOLoop is still running.

Owner

tacaswell commented Apr 26, 2016

Can you just add ioloop.stop() into the except block?

I am somewhat surprised that tornado does not stop it's self on a keyboard interupt

Contributor

perimosocordiae commented Apr 26, 2016

That's what I tried first, but it didn't work. As far as I can tell, this is what happens in that scenario:

  1. ioloop.start() -> blocks, transferring control to tornado
  2. KeyboardInterrupt -> takes control back from the ioloop without letting it clean up properly
  3. ioloop.stop() -> does some cleanup, but not all

Specifically, the problem is that the only correct way to stop the IOLoop is by having it run a callback that triggers its stop() method. Otherwise, the cleanup code I linked above never runs.

Contributor

perimosocordiae commented Apr 26, 2016

Correction, that linked code is in a finally block, so it does run, but it runs before the call to stop().

Contributor

perimosocordiae commented Apr 26, 2016

I suppose we could hack around the problem like this:

    try:
        ioloop.start()
    except KeyboardInterrupt:
        # set the relevant flag, bypassing ioloop.stop()
        ioloop._running = False
    print("Server is stopped")
    sys.stdout.flush()
    cls.started = False

I can confirm that this works, ugly as it is.

The key is that for the IOLoop to be restartable, it needs to exit in a state where both self._running and self._stopped are False. The finally block only sets self._stopped = False, and the stop() method sets self._running = False and self._stopped = True. Oddly enough, the start() method exits right away if self._stopped is True, but also sets it to False, so this works, too:

    try:
        ioloop.start()
    except KeyboardInterrupt:
        ioloop.stop()  # clears the _running flag, sets the _stopped flag
        ioloop.start()  # clears the _stopped flag and returns
    print("Server is stopped")
    sys.stdout.flush()
    cls.started = False

This seems like least intrusive change that gets the job done, and apparently it's documented behavior. I'll update the PR to use this now.

Contributor

perimosocordiae commented Apr 26, 2016

Travis failure is unrelated.

@tacaswell tacaswell and 1 other commented on an outdated diff Apr 29, 2016

lib/matplotlib/backends/backend_webagg.py
@@ -22,6 +22,7 @@
import random
import sys
import socket
+import signal
@tacaswell

tacaswell Apr 29, 2016

Owner

do we still need this import?

@perimosocordiae

perimosocordiae Apr 29, 2016

Contributor

Nope, removed.

Owner

tacaswell commented Apr 29, 2016

That makes sense, but I worry that it is going to be very dependent on the internal bits of tornado. I suspect that server people do not understand why you would want to start/stop/restart the server in the same process....

Is it worth trying to get the attention of someone from the tornado project on this?

attn @mdboom

Contributor

perimosocordiae commented Apr 29, 2016

I agree that it's treading pretty close to unexpected usage. The docs for IOLoop.start imply that you should only call stop() from a callback, but docs for IOLoop.stop mention that:

If the event loop is not currently running, the next call to start() will return immediately.

Maybe @bdarnell could clear things up?

After an exception has escaped from IOLoop.start(), the loop is left in an undefined state and should not be used again. To keep the IOLoop restartable, you have to catch the exception before it breaks through IOLoop.start(). The signal handler in the SO answer linked at the top of this thread is the correct approach.

Owner

tacaswell commented Apr 29, 2016

Fair enough. @perimosocordiae Can you do the signal handling with a context manager? Something like

@contextmanager
def signal_int_grabber():
    old_sig = signal.signal(signal.SIGINT, 
                              lambda sig, frame: ioloop.add_callback_from_signal(on_shutdown))
    try:
        yield
    finally:
        signal.signal(signal.SIGINT, old_sig)

with sig_int_grabber():
     ioloop.start()
Contributor

perimosocordiae commented Apr 29, 2016

Sure, pushed and squashed. The datetime import was unused, so I removed it since I was messing with imports anyway.

Owner

tacaswell commented Apr 30, 2016

❤️ pep8

======================================================================
FAIL: matplotlib.tests.test_coding_standards.test_pep8_conformance_installed_files
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/matplotlib/matplotlib/venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/tests/test_coding_standards.py", line 247, in test_pep8_conformance_installed_files
    expected_bad_files=expected_bad_files)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/tests/test_coding_standards.py", line 143, in assert_pep8_conformance
    assert_equal(result.total_errors, 0, msg)
AssertionError: 1 != 0 : Found code syntax errors (and warnings):
/home/travis/build/matplotlib/matplotlib/lib/matplotlib/backends/backend_webagg.py:346:80: E501 line too long (80 > 79 characters)
----------------------------------------------------------------------

This other wise looks good to me 👍 .

@perimosocordiae perimosocordiae ENH: webagg: Handle ioloop shutdown correctly
This change correctly cleans up after the ioloop is
interrupted, allowing the user to call plt.show(),
then resume control with a SIGINT, then call plt.show() again.
500e6fd
Contributor

perimosocordiae commented Apr 30, 2016

Curses, off by one! Fixed and squashed.

tacaswell closed this May 2, 2016

tacaswell reopened this May 2, 2016

@tacaswell tacaswell added needs_review and removed needs_review labels May 2, 2016

Owner

tacaswell commented May 6, 2016

@jenshnielsen Can you review and merge this? Given that you found those issues from the nbagg refactor I take it use use webagg?

Owner

mdboom commented May 9, 2016

This looks good to me. The misunderstanding about SIGINT not actually stopping the server was mine originally, and in fairness I only cared about the whole process stopping and never really considered starting up another server. This looks good to me, but it wouldn't hurt to get @jenshnielsen's opinion as well.

@tacaswell tacaswell merged commit 1d81f97 into matplotlib:master May 10, 2016

1 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
coverage/coveralls Coverage decreased (-0.01%) to 69.648%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

tacaswell removed the needs_review label May 10, 2016

Owner

tacaswell commented May 10, 2016

perimosocordiae deleted the perimosocordiae:restartable-webagg branch May 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment