Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libev #1504

Closed
jamadden opened this issue Jan 6, 2020 · 3 comments
Closed

Update libev #1504

jamadden opened this issue Jan 6, 2020 · 3 comments
Labels

Comments

@jamadden
Copy link
Member

@jamadden jamadden commented Jan 6, 2020

gevent currently tests and bundles 4.25, but there have been several releases lately; the current upstream is 4.31. They bring some new backends for Linux that could be really nice on newer kernels. Building against a non-embedded libev that includes those backends breaks gevent's tests (#1501).

Unfortunately, this update is not as easy as usual. That's because version 4.27 contains these two changes:

  • fail assertions more aggressively on invalid fd's detected in the event loop, do not just silently fd_kill in case of user error.
  • ev_io_start/ev_io_stop now verify the watcher fd using a syscall when EV_VERIFY is 2 or higher.

The latter is the most immediately problematic. It's not uncommon in gevent for file descriptors to be closed before the corresponding watcher object gets closed. (Watcher objects can be closed in a loop callback; file descriptors can be implicitly closed due to refcounting, and watchers don't have a reference to the object owning the file descriptor.) This was never a problem before, but the extra validation in ev_io_stop, which actually aborts the whole process, is now a problem (even libuv isn't that strict.) Issues with invalid descriptors at start had previously been mostly, but not entirely, dealt with thanks to libuv.

One option is to simply not support building with high values for EV_VERIFY. We currently run our own tests with a high value of EV_VERIFY, so that would have to change, which is not a big deal. For non-embedded cases (not recommended because it's usually bigger and slower, but downstream linux distros typically prefer that packaging style), though, we can't control (or even check!) what it was built with: everything is fine, until suddenly it isn't and the whole process dies.

The other option I see is to just take whatever steps are necessary to make sure that the FD isn't closed before the watcher is in the common cases. For sockets, for example, that will mean connecting the lifetime of the socket object to the watcher and not letting the socket object die first. For the Cython extension, that will make those watcher objects bigger.

@jamadden jamadden added the Loop: libev label Jan 6, 2020
@jamadden

This comment has been minimized.

Copy link
Member Author

@jamadden jamadden commented Jan 6, 2020

For some reason, the EV_VERIFY behaviour isn't consistent. I can't get it to break on macOS without resorting to a non-embedded build; Travis shows the same behaviour. But my local Linux box breaks with a standard embedded build. Probably something with macro definitions...

@jamadden

This comment has been minimized.

Copy link
Member Author

@jamadden jamadden commented Jan 8, 2020

The EV_VERIFY behaviour may be timing related. We're substantially faster with an embedded build than a non-embedded build. is because distutils compiles libev with -DNDEBUG when embedded, which turns off assertions.

The one failure I haven't been able to easily fix is essentially a race condition in test_ssl.py:ThreadedTests.test_asyncore_server. A thread is started to run the asyncore loop; this thread uses select.select() to poll for IO, in a short loop. In the meantime, the foreground thread closes the listening socket. If timing is right, that happens while we're in the middle of select.select. Because select.select starts and stops watchers each time it is called, as soon as it wakes up it stops the watcher that used to be the listening socket…but now the listening socket is closed, and stopping blows up. Thanks libev!

It'd be easy to consider this a bug in the test, since it's operating on the same socket file descriptor from two different threads. But it's worked before…and those aren't actually real threads, they're greenlets, so we should be able to figure something out.

@jamadden

This comment has been minimized.

Copy link
Member Author

@jamadden jamadden commented Jan 8, 2020

I'm able to fix the test under Python 3 by letting the loop know that the FD will be closed and feeding an event through to wake up any unrelated watchers. It's not perfect (it works for the test because we immediately close the watchers) and could be extended to cover more cases (keep track of what we're closing for one iteration of the callbacks, if we get an event for it, cause an exception to be raised instead of returning the event; basically what we do for known watchers). The alternative is more expensive: keep track of every created watcher for the FD and invalidate them (exactly what we do for known watchers, but with much more bookkeeping). At this point that doesn't seem worth it.

Python 2 doesn't have deterministic closing of sockets, relying on reference counting, so I can't only trigger the event when we're really going to close the socket. Worst case scenario, though, it's a spurious wakeup that's detected and ignored during loops.

@jamadden jamadden closed this in 5d2801a Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.