New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eventlet with pypy : Bad File Descriptor Error #318
Comments
@ysingh7 I've tried this synthetic example to reproduce the problem, but it didn't manifest. Do you have a better idea how to test it? import eventlet
import sys
host = 'google.com'
if len(sys.argv) > 1:
host = sys.argv[1]
s1 = eventlet.connect((host, 80))
f = s1.makefile()
f.write(b'GET / HTTP/1.0\r\nConnection: close\r\n\r\n')
r = f.read(8<<10)
f.close()
print(r)
s1.close()
import gc
gc.collect() |
@temoto I have modified the above test case to reproduce the error :
This issue is similar to one reported in Issue #213. |
We further investigated the issue and came up with this explaination : Problem: exception leading to application to crash While running under PyPy, and driven by Swift benchmark ssbench/or a customized script simple_put_test.py, OpenStack Swift (the proxy processes on the proxy server) suffered unhandled exception, “Bad file descriptor”, leading to application failure. This happens only running PyPy as the Python interpreter, and appears to be in some random fashion. Same code runs with no issue under CPython. Root cause found related to app code, not PyPy, but PyPy is not entirely out of the loop (it would be nice somehow PyPy could take care of this ref count, but that may be unlikely) It’s advised by PyPy developers that additional application codes must be written specifically for PyPy to deal with networking socket objects (customized ref counting). Eventlet, a standard Python library, was found handling the ref count incorrectly, thus leading to the exception. Solution is to patch the eventlet library code A patch has been created to patch eventlib (base.py) for PyPy. The code change is not expected to impact any application to run under CPython. Issue related to the way networking socket is used When lower level networking socket was shut down unexpectedly earlier, the Python application OpenStack Swift attempting to write to or read from a networking socket will crash (exception). CPython and PyPy handles object memory management differently: Closing socket requires two OS API calls in sequence: Applications such as OpenStack Swift make multiple abstraction of the underlying OS socket or with multiple Python object holding reference to the same underlying OS socket. Making a “close” call from one Python object does not or should not have any impact to other objects that are still holding the reference. Ideally, however, as soon as all Python objects holding the socket are out of scope, the socket should be shut down and closed immediately, which is the way CPython handles socket. PyPy can’t do the same way due to its delayed GC. Solution proposed from PyPy developers is that app owner should track their socket usage, by making a _reuse() call from the application code any time an reference to a socket is made, and _drop() call any time an reference is removed. These two methods are implemented internally by PyPy interpreter and these calls are synchronous. As soon as the last _drop() is made leading to an ref count of zero (ref count internally maintained by PyPy), PyPy will call OS API shutdown and close, effectively close the connection. At the lower level, the socket.c (OS implementation) will flash out any remaining data in the outgoing buffer, and send a “FIN” message to the peer, leading to peer to close its side of the connection, while closing/shutting down the local socket. With all being said and according to the PyPy documentation, the application is responsible to ensure the accounting is done correctly (equal number of _reuse and _drop are called). In further details to give an example where the error occurs, a “conn” object holds a ref to the socket. A duplicate conn is made for converting to a different type of Python object, and then the code creates a new Python object “res” which is returned in “makefile”. The intermediate “duppped” (or a temporary variable) made a _drop call before being out of scope and then “dupped “can be reclaimed. When GC kicks in (any time), the destructor for “dupped” will be called, and a “close” call is made on behalf of the “dupped”. This triggers a call to _drop to the PyPy interpreter by the socket.py library. At this point, the socket is shut down and closed. The “res” object, however, holding a ref to the same socket, is still trying to read/or write from that same socket, and would suffer exception. Our trace indicates “fd=-1” when the exception occurs, which means the Python object using the socket has no idea the underlying socket was already closed. The patch is setting dupp.close = None, meaning the dupped has no method called “close”, thus preventing a “close” call (triggers another _drop()) from happening when it’s destructor is called by PyPy GC. This corrects the ref accounting error, and make sure ref count would not reach zero until the right time.
|
@ysingh7 can you reformat the patch as a pull request? Might make it easier for people to review and then merge. |
@notmyname I have created a pull request for this issue : #326 |
@temoto I dug around through revision history as best I could trying to track down some sort of explanation as to why the eventlet implementation of makefile calls dup and got lost. I left some comments on the #326. @ysingh7 Could you test pypy + swift with out calling Maybe if just getting rid of |
@ysingh7 thank you very much for very detailed explanation and putting much work into it. @notmyname patches are just fine, if not better, compared to pull requests. @clayg maybe |
I've merged the Thanks. |
Hi
I am running Open Stack Swift application, which is a cloud storage solution from the Open Stack application suite. The application runs fine using Cpython but when I try to run it using pypy I run into the following error :
I suspect that the implementation of the GreenSocket makefile function might have something to do with it.
Pypy maintains reference count for the the real socket object _sock and uses function reuse : to increment the reference count and drop to decrement the reference count. If the reference count reaches 0, the underline _sock (real socket) is closed as described in comment in socket.py :
"You make use (or a library you are using makes use) of the internal
classes '_socketobject' and '_fileobject' in socket.py, initializing
them with custom objects. On PyPy, these custom objects need two
extra methods, _reuse() and _drop(), that maintain an explicit
reference counter. When _drop() has been called as many times as
_reuse(), then the object should be freed."
I will like to discuss the implementation of GreenSocket makefile function in /eventlet/greenio/base.py :
Drop function is explicitly called on the dupped object, so the reference count for the underline real socket _sock will be decremented by 1. The dupped variable created is a temporary variable and will go out of scope after this function. So when the Garbage Collector kicks in, it will claim this object by running its destructor. dupped is of type GreenSocket so it will call its destructor :
The destructor of dupped will call the close function which is reference to fd.close, that points to close function of the underlying socketobject.
The close function of the
socketobject
callsdrop
again on the same socket_sock
decrementing it further by 1. Since the reference held by dupped was already decremented once by explicit call to drop in makefile function, the second call is decrementing the refcount further leading to closing of the socket when garbage collector comes into the picture. This is causing the Bad File Descriptor error.If the reference count is already being decremented by the destructor, then there is no need of explicit call to drop. If I run with this code, it runs fine for me :
Can anyone please help me with this issue.
The text was updated successfully, but these errors were encountered: