-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-2743 Fix compatibility with gevent.Timeout #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tch BaseException here too. Perhaps this should be an empty except to guarentee release.
Thanks for reporting this issue. It reminds me of https://jira.mongodb.org/browse/PYTHON-2334 which you also reported. It's somewhat strange that gevent extends BaseException here. The Python docs say not to do this:
https://docs.python.org/3/library/exceptions.html#built-in-exceptions I'll need to think through this change a bit and research where else (and why) we catch Exception. I suspect it's so that we don't perform unneeded work for KeyboardInterrupt and SystemExit. |
pymongo/pool.py
Outdated
@@ -1415,7 +1415,7 @@ def _get_socket(self, all_credentials): | |||
self._pending -= 1 | |||
self._max_connecting_cond.notify() | |||
sock_info.check_auth(all_credentials) | |||
except Exception: | |||
except BaseException: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really make sense to me. Sure we now catch gevent.Timeout, but won't gevent again raise another timeout when we try to acquire the lock below on line 1422? In that case we're in the same boat and we might as well have not caught the exception to begin with. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what, I was on the old 3.11 branch...
The latest master uses contexts instead for acquiring / releasing, which should prevent the lock from remaining acquired... Let me try reproducing on master, and if it's fixed on master, we can just use an RC build from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(then we can just close this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as followup, it still causes problems on 3.11.4 because the except block isn't executed. BaseException
, or a bare except:
still fix this, because we don't nest timeouts currently.
Considering a worst case scenario where you have infinite timeouts waiting to fire, you would have a timeout fire at each:
- close_socket
- with self.size_cond
- self.size_cond.notify (not 100%, this may not block and therefore not be an eligible point to fire a timeout)
- event_listeners.publish_connection_check_out_failed, if the listener executes blocking code that would call the hub's switch() method
Given we only have 1 active at a time, this isn't an issue for us in particular, but would be if someone did use stacked timeouts and they all fired in a worst-case scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it might be necessary to subclass Pool as GeventPool, I see that custom Pool classes can be provided, so that's what I'll try next. Thanks for being responsive though :)
Ok, hopefully this makes sense.... Instead I simply sub-classed Pool, as GeventPool. When the exception handler is entered, I start a greenlet to run the exception handling. This makes the exception handling code un-interruptible from gevent's perspective so that even multiple successive Timeouts being raised in application code will not prevent releasing the resources properly. I don't know if you'd want to integrate this somehow... But since I can put this into our application as-is without updating pymongo using the If you don't, we can close this. Thanks! (although I would be interested if you have any concerns over this method... It has fixed losing our connections in load tests that raise Timeouts, however) |
As the name suggests, I would still like to improve our compatibly with gevent.Timeout's quirks. Would you be able to point me to the docs for how gevent.Timeout works? Also would you be able to write a test case or other reproducer for the original issue? You might get some ideas from the test I added in PYTHON-2334: c99254f |
https://www.gevent.org/api/gevent.timeout.html see the gevent works by patching the system libraries (socket, threading, sleep, locks etc) that execute I/O or blocking with libuv / libev functionality to run as coroutines. Instead of blocking when you call a socket method, gevent.switch() is called and execution, on the same thread, proceeds on a different coroutine (they call them greenlets). When the blocking is resolved, control is then returned to the coroutine. As such, gevent.Timeout may be raised on any function that blocks, e.g. acquiring a lock, reading a socket, joining a thread, etc. (This resolves a lot of the GIL contention with python) |
I could also see _pool_class being useful for implementing an asyncio pool to run queries from as well, which would be the more built-in way of running coroutines in python. |
Thanks for the link. While I do think the thread-cleanup idea is clever, I am not a fan of adding a special pool class just for Gevent support. Looking into this more, in PYTHON-1663 (PyMongo 3.8) the pool was changed from catching all exceptions using a bare-except to catching only Exception:
I am open to changing it back and adding a comment on why we have to use bare except so the next maintain doesn't accidentally make the same mistake. This would at least deal with the single Timeout issue while we consider how (and if) we want to support multiple timeouts. |
Tracked in: https://jira.mongodb.org/browse/PYTHON-2743 |
I added a unit test that will break if we add Exception back to the except clause. In order for it to function, the Timeout has to raise at a specific point, I used mock to do this, I don't know if you have a more standard "pymongo" way of doing that... It was previously raising at the condition check prior to entering the try, thus not incrementing / decrementing active_count at all. This requires reaching down the call tree though to patch at runtime because an actual connection is in use. I could also test this by instantiating a Pool object without connecting to mongo and executing _get_sockets directly, which would require much less invasive mocking, if you'd prefer. That would be less "real world" but more controlled. |
gevent.Timeout extends BaseException, not Exception.
gevent.Timeout extends BaseException, not Exception. (cherry picked from commit 9c1ff6a)
Thanks @TylerWilley! I've merged this fix in 9c1ff6a and a2d687b. |
Thank you @ShaneHarvey ! :) |
gevent.Timeout extends BaseException, so we still sometimes lose the lock.
Perhaps we should instead have this as an empty except?