Intermittent Travis failures on Python 3 #359

bitprophet · 2016-06-09T18:29:06Z

I create far too many tickets with subjects like this :(

If it's anything like the others, the issue is probably some sort of race condition which is exacerbated on Python 3 as it tends to be slower than Python 2. When these tests fail, retriggering the run almost always succeeds 100%.

Specifically, what is happening lately is Travis builds of master die on Python 3 (and sometimes also PyPy3, tho that may be #358) on the integration test for "basic invocation". There's nothing useful in the captured stderr.

Running integration main.py tests in a loop locally to see if I can recreate it any % of the time, but not hopeful.

bitprophet · 2016-06-09T18:32:42Z

After 40 (!) runs, I got an error on the "Invocation with args" integration test. Same deal as on Travis, exit code is None and no stderr. So something is fucky there, possibly a hidden exception =/

EDIT: Well, or something more subtle, because exit code of None triggers if not result (Result objects act truthy based on the exit value being 0 or not, by default). Why we'd end up in this state is unclear; these tests do not use a pty, so the exited value should be self.process.returncode.

Perhaps it's a race condition between Popen.poll returning non-None and process.returncode being actually set? That seems kinda unlikely but not sure what else would make this occur. Instrumented my loop with a call to pdb...

bitprophet · 2016-06-09T19:02:32Z

199 runs to find out that either something got even more hung or pdb doesn't play nice with Invoke (hrm). Got another 125 runs w/o error in before I decided to move to my Debian vm in case it's one of those issues that crops up more often on Linux. EDIT: sure enough, 13 runs til error.

bitprophet · 2016-06-09T20:15:57Z

The next run took 15 to error, but perplexingly the debug line I added to print the result of subprocess.poll() if non-None didn't fire, implying it was None. Confirming that now.

Wondering if this is back to ye olde "process finished or dead threads" crap from #351; I did always feel mildly uneasy about how open that was, so perhaps we need some more logic around that? E.g. maybe the timeout we use in thread joining needs to be applied higher up, so that we make sure to obtain the actual exit code if it's "coming soon".

Maybe we even need a tighter check on whether the threads excepted or not, instead of simply whether they're "dead".

bitprophet · 2016-06-09T20:25:19Z

Yup, that's the issue, the threads are exiting cleanly, so gotta explicitly check whether the 'dead' threads have stored exceptions. HERPADERP.

bitprophet · 2016-06-09T21:03:38Z

With that fix in place I got up to 300 runs with no race conditions surfacing (still on Linux + Python 3, where I was averaging ~13-15 runs before error before). Canceled run, good enough for me.

bitprophet added Support Needs investigation labels Jun 9, 2016

bitprophet closed this as completed in 69180cd Jun 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent Travis failures on Python 3 #359

Intermittent Travis failures on Python 3 #359

bitprophet commented Jun 9, 2016

bitprophet commented Jun 9, 2016 •

edited

Loading

bitprophet commented Jun 9, 2016 •

edited

Loading

bitprophet commented Jun 9, 2016

bitprophet commented Jun 9, 2016

bitprophet commented Jun 9, 2016 •

edited

Loading

Intermittent Travis failures on Python 3 #359

Intermittent Travis failures on Python 3 #359

Comments

bitprophet commented Jun 9, 2016

bitprophet commented Jun 9, 2016 • edited Loading

bitprophet commented Jun 9, 2016 • edited Loading

bitprophet commented Jun 9, 2016

bitprophet commented Jun 9, 2016

bitprophet commented Jun 9, 2016 • edited Loading

bitprophet commented Jun 9, 2016 •

edited

Loading

bitprophet commented Jun 9, 2016 •

edited

Loading

bitprophet commented Jun 9, 2016 •

edited

Loading