Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Travis failures on Python 3 #359

Closed
bitprophet opened this issue Jun 9, 2016 · 5 comments
Closed

Intermittent Travis failures on Python 3 #359

bitprophet opened this issue Jun 9, 2016 · 5 comments

Comments

@bitprophet
Copy link
Member

I create far too many tickets with subjects like this :(

If it's anything like the others, the issue is probably some sort of race condition which is exacerbated on Python 3 as it tends to be slower than Python 2. When these tests fail, retriggering the run almost always succeeds 100%.

Specifically, what is happening lately is Travis builds of master die on Python 3 (and sometimes also PyPy3, tho that may be #358) on the integration test for "basic invocation". There's nothing useful in the captured stderr.

Running integration main.py tests in a loop locally to see if I can recreate it any % of the time, but not hopeful.

@bitprophet
Copy link
Member Author

bitprophet commented Jun 9, 2016

After 40 (!) runs, I got an error on the "Invocation with args" integration test. Same deal as on Travis, exit code is None and no stderr. So something is fucky there, possibly a hidden exception =/

EDIT: Well, or something more subtle, because exit code of None triggers if not result (Result objects act truthy based on the exit value being 0 or not, by default). Why we'd end up in this state is unclear; these tests do not use a pty, so the exited value should be self.process.returncode.

Perhaps it's a race condition between Popen.poll returning non-None and process.returncode being actually set? That seems kinda unlikely but not sure what else would make this occur. Instrumented my loop with a call to pdb...

@bitprophet
Copy link
Member Author

bitprophet commented Jun 9, 2016

199 runs to find out that either something got even more hung or pdb doesn't play nice with Invoke (hrm). Got another 125 runs w/o error in before I decided to move to my Debian vm in case it's one of those issues that crops up more often on Linux. EDIT: sure enough, 13 runs til error.

@bitprophet
Copy link
Member Author

The next run took 15 to error, but perplexingly the debug line I added to print the result of subprocess.poll() if non-None didn't fire, implying it was None. Confirming that now.

Wondering if this is back to ye olde "process finished or dead threads" crap from #351; I did always feel mildly uneasy about how open that was, so perhaps we need some more logic around that? E.g. maybe the timeout we use in thread joining needs to be applied higher up, so that we make sure to obtain the actual exit code if it's "coming soon".

Maybe we even need a tighter check on whether the threads excepted or not, instead of simply whether they're "dead".

@bitprophet
Copy link
Member Author

Yup, that's the issue, the threads are exiting cleanly, so gotta explicitly check whether the 'dead' threads have stored exceptions. HERPADERP.

@bitprophet
Copy link
Member Author

bitprophet commented Jun 9, 2016

With that fix in place I got up to 300 runs with no race conditions surfacing (still on Linux + Python 3, where I was averaging ~13-15 runs before error before). Canceled run, good enough for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant