Conversation
…has elapsed Earlier I had submitted python-zk#250 to handle select being interrupted, and had changed this behavior to return as if it was a timeout. Now that we've been running this for a while we are seeing some occasional issues with that patch. Primarily that the remaining kazoo codebase assumes that a timeout from select is an actual timeout (which is reasonable)-- except that the previous patch changed that-- so you actually have to check the time elapsed in addition to the return. Instead of doing that (which seems like a mess, and error prone) this patch simply retries until the given timeout (assuming one was given). This does take into account that you may experience more than one interrupt in a given select() call (and we adjust the timeout accordingly for each iteration).
|
Seems that all the tests failed with some gzip error? Sounds unrelated-- esp. since all tests failed with the same error (and I don't get that error locally) |
|
This looks fine to me, but I'm also stumped by the Travis errors. Restarting Travis didn't seem to fix anything either. |
|
@bbangert do you know of a way to re-trigger the run? From the errors it looks like travis might have had an issue. |
|
@jacksontj yup, triggered it again just now |
|
This is really odd, when I run the tests locally they seem to pass just fine. Is there some way to get additional debugging info from travis? Or is there someone who maintains it that we could ping? |
|
@bbangert any ideas who we can talk to? |
|
@jacksontj not offhand. there might have been a few failures before this one though on master, so I need to retry them as well to see when it broke for good |
|
@jacksontj ok, well, I went back to a prior travis test that only had 2 fail, and reran it.... and then they all failed. I'm guessing something on Travis has changed, such that all our tests are now insta-fail. |
|
I've found the issue. The mirror chosen no longer has 3.4.7, I reverted that so that the tests can run. Will be restarting this shortly. |
|
@jacksontj looks like there's 1 error reported. |
d9699ac to
c8e3f22
Compare
This is in a similar vein to python-zk#250, but the root cause of this issue is actually an upstream python issue (http://bugs.python.org/issue20611) which is not fixed in 2.x or <3.5. Python doesn't handle interrupted system calls-- so the applications are responsible for doing so. This patch simply retries the create_connection call on system call interrupt, while honoring the original timeout request. Note: although the timeout for `create_tcp_connection` will be honored, if there are interrupts each subsequent call to `create_connection` will have less time to complete.
c8e3f22 to
e7aac2e
Compare
|
@bbangert fixed the issues :) now travis is happy. |
|
@jacksontj looks good! |
This is a followup to #250
This PR includes 2 patches both fixing more interrupt issues. Basically this is an upstream issue (in python-- http://bugs.python.org/issue20611).
The first patch changes my previous fix (#250) from raising a timeout to retrying-- the issue we see is that the rest of the kazoo code assumes that a timeout is a timeout (which is reasonable). So instead of reworking all timeouts within kazoo, we can simply retry.
The second patch handles the same during connection setup.