Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select.get_fileno passed IOClosed object #154

Open
hardys opened this issue Oct 20, 2014 · 8 comments
Open

select.get_fileno passed IOClosed object #154

hardys opened this issue Oct 20, 2014 · 8 comments

Comments

@hardys
Copy link
Contributor

hardys commented Oct 20, 2014

Since the fix for #137, the infinite recursion is fixed, but we see the issue below.

This happens when you cause an RPC timeout on OpenStack, such that the select times out, then this error happens when you reconnect the network to resolve the RPC interruption.

File "/opt/stack/heat/heat/openstack/common/threadgroup.py", line 143, in wait
     x.wait()
   File "/opt/stack/heat/heat/openstack/common/threadgroup.py", line 47, in wait
     return self.thread.wait()
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/greenthread.py", line 174, in wait
     return self._exit_event.wait()
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/event.py", line 121, in wait
     return hubs.get_hub().switch()
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/hubs/hub.py", line 293, in switch
     return self.greenlet.switch()
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/hubs/hub.py", line 331, in run
     self.close_one()
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/hubs/hub.py", line 264, in close_one
     listener.tb(IOClosed(errno.ENOTCONN, "Operation on closed file"))
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/green/select.py", line 59, in on_error
     original = ds[get_fileno(d)]['error']
   File "/usr/lib/python2.7/site-packages/eventlet-0.16.0.dev-py2.7.egg/eventlet/green/select.py", line 20, in get_fileno
     raise TypeError("Expected int or long, got %s" % type(obj))    
 TypeError: Expected int or long, got <class 'eventlet.hubs.IOClosed'>
@yittg
Copy link

yittg commented Nov 30, 2014

Maybe it shouldn't pass 'on_error' as listener's throw back , how about just a 'lambda _: None';

Similarly, #137 'max recursion depth exceeded' is triggered because exception raised when calling 'mark_as_closed' in listener.defang which passed with 'lambda x:None', howerver it's a method takes 0 argument.

@temoto temoto added this to the v0.18 milestone Apr 3, 2015
@temoto temoto self-assigned this Apr 3, 2015
@temoto
Copy link
Member

temoto commented Apr 3, 2015

Just copying Jakub's comment here for ease of future debugging:

So it seems that it introduced a regression, those two lines add throwback that expects a fd as its argument but the listener.tb is always called with IOClosed instance

da87716#diff-0195c9ec76921f751bef650c49d902b1R257
8f98f8b#diff-a51187cea073d644e058c9e92c7a2ef9R79

@dongyanyang
Copy link

Hi, We have been using QPID as the RPC driver in our openstack environment, but we hit this eventlet error, just the same as you described in this bug's description, after a very time-comsuming job in the openstack-nova-compute service, an RPC timeout is reported and this triggers the eventlet type-error, first with a "TypeError: () takes exactly 1 argument (0 given)" and then "TypeError: cannot concatenate 'str' and 'type' objects" error. (the error log is pasted at the last)

this error would lead the openstack compute service down and never come back to up status unless restarted. is there any fix or workaround for this issue? Thanks in advance~

We are using the openstack Juno release which listed the eventlet version requirement as: eventlet>=0.15.1,<=0.15.2, So, in our env, the version we use of eventlet is 0.15.2.

openstack nova compute.log:
2015-11-17 02:15:49.761 59874 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to publish message to topic 'conductor': heartbeat timeout
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 582, in ensure
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid return method()
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 654, in _publisher_send
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid publisher = cls(self.conf, self.session, topic=topic, **kwargs)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 407, in init
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid super(TopicPublisher, self).init(conf, session, node_name)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 339, in init
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid self.reconnect(session)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 343, in reconnect
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid self.sender = session.sender(self.address)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "", line 6, in sender
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 622, in sender
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid sender._ewait(lambda: sender.linked)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 833, in _ewait
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid result = self.session._ewait(lambda: self.error or predicate(), timeout)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 596, in _ewait
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 235, in _ewait
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid self.check_error()
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 228, in check_error
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid raise e
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
2015-11-17 02:15:49.761 59874 TRACE oslo.messaging._drivers.impl_qpid
2015-11-17 02:15:49.793 59874 ERROR oslo.messaging._drivers.impl_qpid [-] Unable to connect to AMQP server on {'username': 'xxxxxx', 'host': 'x.xx.xx.xxx:5671', 'password': 'xxxxxx'}: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/qpid/messaging/driver.py", line 529, in dispatch
self.connect()
File "/usr/lib/python2.6/site-packages/qpid/messaging/driver.py", line 557, in connect
self._transport = trans(self.connection, host, port)
File "/usr/lib/python2.6/site-packages/qpid/messaging/transports.py", line 106, in init
SocketTransport.init(self, conn, host, port)
File "/usr/lib/python2.6/site-packages/qpid/messaging/transports.py", line 28, in init
self.socket = connect(host, port)
File "/usr/lib/python2.6/site-packages/qpid/util.py", line 66, in connect
sock = socket.socket(af, socktype, proto)
File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 130, in init
notify_opened(fd.fileno())
File "/usr/lib/python2.6/site-packages/eventlet/hubs/init.py", line 184, in notify_opened
hub.mark_as_reopened(fd)
File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 240, in mark_as_reopened
self._obsolete(fileno)
File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 205, in _obsolete
listener.defang()
File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 73, in defang
self.mark_as_closed()
TypeError: () takes exactly 1 argument (0 given)
. Sleeping 1 seconds
2015-11-17 02:15:49.795 59874 ERROR oslo.messaging._drivers.impl_qpid [req-aad04c03-4510-4291-89f8-c0cd13e65616 ] Failed to publish message to topic 'conductor': heartbeat timeout
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 582, in ensure
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid return method()
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 654, in _publisher_send
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid publisher = cls(self.conf, self.session, topic=topic, **kwargs)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 407, in init
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid super(TopicPublisher, self).init(conf, session, node_name)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 339, in init
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid self.reconnect(session)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 343, in reconnect
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid self.sender = session.sender(self.address)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "", line 6, in sender
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 622, in sender
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid sender._ewait(lambda: sender.linked)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 833, in _ewait
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid result = self.session._ewait(lambda: self.error or predicate(), timeout)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 596, in _ewait
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 235, in _ewait
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid self.check_error()
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 228, in check_error
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid raise e
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
2015-11-17 02:15:49.795 59874 TRACE oslo.messaging._drivers.impl_qpid
2015-11-17 02:15:49.798 59874 ERROR nova.openstack.common.threadgroup [-] cannot concatenate 'str' and 'type' objects
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py", line 125, in wait
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup x.wait()
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py", line 47, in wait
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/greenthread.py", line 173, in wait
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/event.py", line 121, in wait
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 293, in switch
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 331, in run
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup self.close_one()
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 264, in close_one
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup listener.tb(IOClosed(errno.ENOTCONN, "Operation on closed file"))
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/green/select.py", line 59, in on_error
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup original = ds[get_fileno(d)]['error']
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/green/select.py", line 20, in get_fileno
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup raise TypeError("Expected int or long, got " + type(obj))
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup TypeError: cannot concatenate 'str' and 'type' objects
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup

@dongyanyang
Copy link

The issue described in my last comments has been fixed by the following change:
/usr/lib/python2.6/site-packages/eventlet/green/select.py:

-- listeners.append(hub.add(hub.READ, k, on_read, on_error, lambda x: None))
-- listeners.append(hub.add(hub.WRITE, k, on_write, on_error, lambda x: None))
++ listeners.append(hub.add(hub.READ, k, on_read, lambda: None, lambda: None))
++ listeners.append(hub.add(hub.WRITE, k, on_write, lambda: None, lambda: None))

I only changed the last two arguments as lambda, this can fix our issue.
@yittg @temoto I did some simple test on my env, this works for our issue. But we are not sure whether any side-effect would be introduced by this. Since we are not farmiliar with eventlet, could you please take a look if this change is correct?

Here are some detailed analysis for the arguments error:

As for the first exception, since the lambda function does nothing but just returns none, it don't depend on any arguments, so I just remove the argument of lambda, this can fix the mark_as_closed TypeError listed in my last comments.
File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 73, in defang
self.mark_as_closed()
TypeError: () takes exactly 1 argument (0 given)

As for the second exception of TypeError: Expected int or long, got IOClosed object:
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 264, in close_one
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup listener.tb(IOClosed(errno.ENOTCONN, "Operation on closed file"))
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/green/select.py", line 59, in on_error
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup original = ds[get_fileno(d)]['error']
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/green/select.py", line 20, in get_fileno
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup raise TypeError("Expected int or long, got " + type(obj))
2015-11-17 02:15:49.798 59874 TRACE nova.openstack.common.threadgroup TypeError: cannot concatenate 'str' and 'type' objects

from the code, we can see the on_error function needs a fd as its argument, but the close_one function in hub.py called on_error function with an IOClosed object by: listener.tb(IOClosed(errno.ENOTCONN, "Operation on closed file"))
I don't know much about the machenism of eventlet, so I just tried the method as suggestted by @yittg in comment 2, registering lambda: None as the throwback function, not on_error.

This can fix our issue, the QPID can be reconnected successfully, and our openstack-nova-compute service resumed to up status automatically, so the problem gone.

@temoto temoto modified the milestones: v0.19, v0.18 Jan 23, 2016
u-quark added a commit to smarkets/eventlet that referenced this issue Jan 26, 2016
eventlet#154

This is a workaround and not a complete solution. In
eventlet#154 (comment)
it is mentioned that this fixes the issue.
@temoto temoto modified the milestones: v0.20, v0.19 May 2, 2016
@jstasiak
Copy link
Contributor

It's a shame to admit this - my Smarkets colleagues (pinging @u-quark) and I managed to produce a simple test case for this some time ago and came up with a way to make this code behave less confusingly but didn't have time to contribute the change properly yet.

For the record - this failure is most likely a result of a programmer error (fd reused while select using it is in progress if I remember correctly) so this is still gonna result in an error but with one that actually says something useful and points to the source of the issue.

lig pushed a commit to lig/eventlet that referenced this issue Jun 3, 2016
eventlet#154

This is a workaround and not a complete solution. In
eventlet#154 (comment)
it is mentioned that this fixes the issue.
@dalinhuang
Copy link

I've tried @jstasiak code fix, that works for me.

@temoto
Copy link
Member

temoto commented Jan 25, 2017

@hardys @yittg @dongyanyang @jstasiak @dalinhuang relevant patch was merged into master. It is expected that this issue should be fixed also. Please try and write back how you feel.

pip install https://github.com/eventlet/eventlet/archive/master.zip

@temoto temoto modified the milestones: v0.21, v0.22 Apr 17, 2017
@temoto temoto removed this from the v0.22 milestone Jan 12, 2018
@temoto temoto added this to the 0.23 milestone Jan 12, 2018
@temoto temoto removed this from the 0.23 milestone May 8, 2018
@temoto temoto removed their assignment Jul 13, 2019
@temoto
Copy link
Member

temoto commented Jul 13, 2019

Sorry to leave this open. Probably fixed, but not direct confirmation from OP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants