Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in connectionpool? #2935

Closed
Kronuz opened this issue Dec 18, 2015 · 44 comments

Comments

Projects
None yet
4 participants
@Kronuz
Copy link

commented Dec 18, 2015

I'm getting a weird error during this call conn = old_pool.get(block=False) in connectionpool.py:410.
The Error in Sentry is: Queue in get, KeyError: (1, True)

And this is the traceback, which doesn't make sense, as requests/packages/urllib3/connectionpool.py catches that Empty exception; plus I have no idea what that KeyError is coming from ... any ideas?:

Stacktrace (most recent call last):

  File "pac/business/views.py", line 49, in get_eticket
    response = requests.get(url, timeout=timeout, verify=False)
  File "requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "requests/api.py", line 54, in request
    session.close()
  File "requests/sessions.py", line 649, in close
    v.close()
  File "requests/adapters.py", line 264, in close
    self.poolmanager.clear()
  File "requests/packages/urllib3/poolmanager.py", line 99, in clear
    self.pools.clear()
  File "requests/packages/urllib3/_collections.py", line 93, in clear
    self.dispose_func(value)
  File "requests/packages/urllib3/poolmanager.py", line 65, in <lambda>
    dispose_func=lambda p: p.close())
  File "requests/packages/urllib3/connectionpool.py", line 410, in close
    conn = old_pool.get(block=False)
  File "python2.7/Queue.py", line 165, in get
    raise Empty
@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2015

@Kronuz What version of requests are you using, please?

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 18, 2015

Sorry, forgot to say. I'm using v2.8.1, from PyPI.

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2015

Are you running Python 3?

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 18, 2015

Nope, I'm using Python 2.7.11

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2015

Ok, so I'm inclined to believe that the traceback you're showing me cannot be right. It just doesn't seem to make any sense at all. Are you able to see this live?

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 18, 2015

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2015

Well, that traceback looks impossible. Short of some insane monkeypatching, I'm not sure how this TB could be right.

@sigmavirus24

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2015

Does sentry always elide the full path of things as well? That's just frustrating. The path to an installed version of requests is not the same as the path to Queue.py.

@sigmavirus24

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2015

@Kronuz can you give us an idea of what other packages are installed on your system? Something else may be monkey patching the stdlib and causing this issue.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 18, 2015

It's Django and many other things. Celery, REST Framework... But searching around there doesn't seem to exist monkey patching of the standard libs. But it's true it seems "impossible"

@sigmavirus24

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2015

@Kronuz things like eventlet and gevent may be used in one of your dependencies and they tend to monkey-patch Queue. This is why we asked for your dependencies. Many packages do things without telling the end-user.

Further, it's absolutely bizarre that you're getting a KeyError when the error raised is Queue.Empty.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 18, 2015

Yes, it is really weird. And it has happened in our development boxes from time to time as well as in production. I'm not using fervent or eventlet, but I'll try to dig further to see if there's anything else that could be patching Queue.

@fqroot

This comment has been minimized.

Copy link

commented Dec 23, 2015

I got the same traceback, using python 2.7.5. so I changed into urllib2 to avoid it.

  File "Bee.git/src/agent/custom/article_parser.py", line 23, in get_html
    r = requests.get(url, timeout=10)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 350, in __exit__
    self.close()
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 649, in close
    v.close()
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 270, in close
    self.poolmanager.clear()
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.py", line 100, in clear
    self.pools.clear()
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/_collections.py", line 94, in clear
    self.dispose_func(value)
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.py", line 66, in <lambda>
    dispose_func=lambda p: p.close())
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 410, in close
    conn = old_pool.get(block=False)
  File "/usr/lib64/python2.7/Queue.py", line 165, in get
    raise Empty
@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 23, 2015

This traceback remains impossible. Can anyone get a regular reproduction scenario?

@fqroot

This comment has been minimized.

Copy link

commented Dec 24, 2015

if i run the function alone, it works good.

I change the "close" function in connectionpool.py. it seems this code can not catch Empty.

    def close(self):
        """
        Close all pooled connections and disable the pool.
        """
        # Disable access to the pool
        old_pool, self.pool = self.pool, None

        try:
            while True:
                print '-'*10
                conn = old_pool.get(block=False)
                print [conn]
                if conn:
                    print conn.__dict__
                    conn.close()

        except Empty:
            print 'queue empty!!!!!!!!'
            import traceback
            print traceback.format_exc()
            pass  # Done.

get follow message

----------
[<requests.packages.urllib3.connection.HTTPConnection object at 0x113e390>]
{'_HTTPConnection__state': 'Idle', '_buffer': [], '_tunnel_headers': {}, '_tunnel_host': None, 'sock': <socket._socketobject object at 0x10e8ec0>, 'port': 80, 'strict': True, 'host': 'news.bitauto.com', '_tunnel_port': None, 'timeout': 10, 'socket_options': [(6, 1, 1)], 'source_address': None, '_HTTPConnection__response': <httplib.HTTPResponse instance at 0x1143200>, '_method': 'GET'}
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
Traceback (most recent call last):
......

this line didn`t run print 'queue empty!!!!!!!!'

after i change "except Empty" to "except Exception", the problem does not show ,

----------
[<requests.packages.urllib3.connection.HTTPConnection object at 0x7f8e444fc390>]
{'_HTTPConnection__state': 'Idle', '_buffer': [], '_tunnel_headers': {}, '_tunnel_host': None, 'sock': <socket._socketobject object at 0x7f8e444a6e50>, 'port': 80, 'strict': True, 'host': 'news.bitauto.com', '_tunnel_port': None, 'timeout': 10, 'socket_options': [(6, 1, 1)], 'source_address': None, '_HTTPConnection__response': <httplib.HTTPResponse instance at 0x7f8e44501290>, '_method': 'GET'}
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
[None]
----------
queue empty!!!!!!!!
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 411, in close
    conn = old_pool.get(block=False)
  File "/usr/lib64/python2.7/Queue.py", line 165, in get
    raise Empty
Empty

I don`t know why...

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 24, 2015

@fqroot Are you using gevent or eventlet? Or anything that monkeypatches the Queue module?

@fqroot

This comment has been minimized.

Copy link

commented Dec 24, 2015

I have removed all the code about gevent.

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Dec 24, 2015

@fqroot So, to be clear, you are not using gevent? Because if anything monkeypatches the Queue module to change its idea of what Empty is then this code is at risk of failing.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 24, 2015

@fqroot, I haven't had the chance to try it myself yet, but why don't you catch Exception as exc And then inspect exc to see what the type of it is and where it's source code has been defined.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 24, 2015

I have also stumbled upon problems sort of like this one when imports fail at a certain time or during circular imports, which then leave closures with variables pointing to half constructed modules; such objects in turn generally become None and some times become a whole different type object from there on.

This could be the case of Empty in the exc variable in your except Exception as exc. Ceck where the exc object type's code was defined (to see if it was monkey patched) and compare the exc type's id() against Empty to see if it's any different, or even None and please comment about your findings

@fqroot

This comment has been minimized.

Copy link

commented Dec 24, 2015

@Kronuz OK, I will debug it tommrow. maybe there is a relationship with the module "importlib", I think

@fqroot

This comment has been minimized.

Copy link

commented Dec 25, 2015

@Kronuz

   def close(self):
        """
        Close all pooled connections and disable the pool.
        """
        # Disable access to the pool
        old_pool, self.pool = self.pool, None

        try:
            while True:
                conn = old_pool.get(block=False)
                if conn:
                    conn.close()

        # except Empty:
        except Exception as exc:
            print 'a', type(exc)
            print 'a', type(Empty) 
            print 'a', id(exc)
            print 'a', id(Empty)
            pass  # Done.

output

a <class 'Queue.Empty'>
a <class 'Queue.Empty'>  ###  it should be  <type 'type'>
a 140610585058592
a 140610585058592

and I have found where the err code lead to my problem.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 25, 2015

@fqroot, where? ...Wait, what? how?

It's really weird... why doesn't the except Empty catches it then?

@fqroot

This comment has been minimized.

Copy link

commented Dec 25, 2015

@Kronuz I noticed that there is an error code in other part of my project
simple like

def handle():
    try:
        pass # some code
    except Exception, Queue.Empty:  # may drink too much :)
        pass

the handle function was created by module importlib
so Queue.Empty is not <type "type"> any longer and become a instance of Queue.Empty
This explanation is my guess.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 25, 2015

@fqroot, I still don't get it :/

Could you please elaborate a bit more? I'm getting KeyError instead of Empty (or so the log in Sentry says)... What where you getting there?

@fqroot

This comment has been minimized.

Copy link

commented Dec 25, 2015

@Kronuz My situation may not suit you.
what type exception you get in that close function?

@Kronuz

This comment has been minimized.

Copy link
Author

commented Dec 26, 2015

@fqroot, I'm getting KeyError((1, True)) ...for some reason

@fqroot

This comment has been minimized.

Copy link

commented Dec 26, 2015

@Kronuz I don't know how to help you...

@sigmavirus24

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2015

To anyone who happens upon this thread in the future, a few things:

  1. @Kronuz and @fqroot are having similar (on the face) but very different problems

  2. @fqroot is using Python 2 in which you could do

    try:
        # ...
    except ExceptionClass, exc:
        # Use exc as the instance of ExceptionClass

    when they meant to use

    try:
        # ...
    except (ExceptionClass1, ExceptionClass2, ExceptionClass3):
        # Catch one of those exception classes but do not store it as a local

    this is why Queue.Empty was no longer of type type in @fqroot's case and why he was seeing the Queue.Empty exception as being unhandled by requests/urllib3.

  3. @Kronuz is getting an entirely different exception from code that should only ever raise a Queue.Empty exception. At the time of this writing, it is still unclear how this is either happening or even possible.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

I still keep getting this error (the original one I posted) ...and I honestly have no idea why or how to figure it out.

What are your best guesses?

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Feb 26, 2016

@Kronuz My best guess is that something is monkeypatching the Queue.Empty import, either in the Queue module or in requests.packages.urllib3.connectionpool. That's the only way I can see that we'd hit a problem where the implicit isinstance check in except fails to work.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

@Lukasa, but even if Queue.Empty is monkeyparched, how the message ends up being KeyError: (1, True) ?? And if the method LifoQueue.get() or LifoQueue was monkeyparched, the traceback would come with the monkeyparched version of it, wouldn't it? This is weird!

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Feb 26, 2016

@Kronuz Yeah, this is definitely one of the weirdest things I've seen. You're probably going to want to try to find a consistent reproduction scenario and then shrink it down to as little code as you can.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

This doesn't always happen either ...maybe it not always happening has to do with that old_pool in connectionpool.py and why/when the get() is called and under what circumstances the object could be empty or full or something. What does that old_pool is for? What is it doing there in that part of the code and why? How does that work?

I'll report my findings... if anyone can think of a way to track this down, please, please suggest.

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Feb 26, 2016

The purpose of old_pool is to ensure that everything gets cleaned up quickly.

Essentially, when close is called on the pool the old reference to the pool is moved to old_pool (which is only in scope in the close function) and the pool reference gets changed to None. We then grab all the connections from old_pool so that we can properly close them and throw them away.

Do you have any form of logging other than Sentry, that may contain a complete traceback?

@Kronuz

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

@Lukasa, Sentry provides the full raw log as well, it's the one I pasted above... But I'll try to log more things ...the bad news is it's not always happening either :/

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Feb 26, 2016

@Kronuz I haven't actually seen a full, Python-formatted traceback from you in this thread, which is why I asked. I'm just worried that somewhere along the way we're losing information.

@Kronuz

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

I'll check the server log files for a complete traceback as it comes from Python, hopefully is there.

Also, I don't even have gevent installed (in the whole system), but I'll also look for other things that might be monkeyparching stuff. e.g. do a regex search of\.(Empty|LifoQueue|get)\s*= and setattr\([^)]+["'](Empty|LifoQueue|get)["']\) ...suggestions here?

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Feb 26, 2016

On that front I have absolutely no suggestions. =(

@sigmavirus24

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2016

I would think the first regexp would be the better one to use. @Kronuz, are you still on 2.8.1 or have you changed versions of requests?

@Kronuz

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

@sigmavirus24, I'm still in 2.8.1

The search across all my files with those regexps didn't yield anything even remotely interesting...

@Kronuz

This comment has been minimized.

Copy link
Author

commented Mar 9, 2016

@Lukasa

This comment has been minimized.

Copy link
Collaborator

commented Mar 9, 2016

Hah, very nicely done @Kronuz! And a very subtle bug it was too.

@Lukasa Lukasa closed this Mar 9, 2016

@sigmavirus24

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2016

🍰 for @Kronuz and @Yhg1s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.