Local terminal stdin detatched if ThreadingGroup run includes sleep #1812
Comments
See #1814 as possible second reproducible problem case. This sounds like a legit bug to me and I'm not sure what's causing it offhand. Smells like it could be a generic Unix issue with terminal pipes being attached to multiple subprocesses at once, or (esp going by #1814's example) a race condition around pipe state, or something like that. Will try to reproduce and muddle out a cause/solution. |
Also, this probably requires fixing at the Invoke level and may be purely in its domain (insofar as I simply haven't done much with threading in a pure-Invoke context yet; but see eg pyinvoke/invoke#194 - it's a thing that should happen there too). In that case I'll move this to a ticket there & the Fabric "fix" would be to upgrade one's Invoke once fix released. |
I was on Ubuntu 16.04.2 connecting to the same. |
Another report of same issue in #1829. This is in my next-bugfix milestone and I'll be focusing on that hopefully next OSS day (Mon). |
I just tried to reproduce this (2.0 branch, Python 3.6.4, macOS 10.12) and was unable to, unfortunately. First tried double-localhost, then two separate remote cloud instances, no dice either way; my terminal's fine afterwards. Going to try on a Linux container in a bit just in case that helps, but since the OP was also on macOS, not sure it'll make a difference. I'll also try running it in a loop to see if it's only an occasional repro. I'll also try it on 2.1 in case we somehow introduced it in 2.1, tho this seems very unlikely. @jensenak @nicktimko are you reproducing this 100% of the time? 50%? 5%? |
@bitprophet on 2.1.3 it was happening in my actual workflow fairly often (>80%, I was also going in parallel to 6 servers, not 2), though in my contrived example from #1814 it's much lower, maybe 20%. I can try to come up with a Docker setup, or failing that a Vagrant setup to repro. |
@bitprophet This has been 100% of the time for me. Just to be sure, I started a fresh virtualenv with only fabric installed. I tested 2.0, 2.1, and 2.2. The example code I pasted produced the described behavior each time. In all tests, I was connecting to Ubuntu 14.04 remotes. I am on a different version of OSX (10.13). Perhaps that's related? Though @nicktimko was not on OSX at all. In case another version is an issue, here's what
Seeing as all these were installed as dependencies of fabric 2.2, I'd expect your versions will look similar. If there's more I can do to help, I'm more than willing. Just not really sure where else to look. |
What commit should I test with; have you made any changes recently that might affect things? I'll try with the freeze above, you could also provide another frozen |
@nicktimko @jensenak Thanks for the extra info. I'll keep trying to repro it here; at 20% I'd definitely not have tried it enough to trigger. My remotes have been Mac and some older Debians, I can try Ubuntu Trusty in case it's somehow specific to that (which would be odd, but hey, this whole thing is odd.) Also, what's your local shell environments? Mine is zsh on the (again, macOS 10.12) builtin Terminal.app, inside tmux. I'll also try some permutations around that angle in a bit. |
AHA. This seems to be bash specific! Was still unable to repro under zsh outside of tmux, but the moment I try under bash, I immediately get the symptoms mentioned. Ditto inside of tmux, so tmux has no bearing - it's a shell thing. Why this would behave differently under bash vs zsh, I have no immediate idea. Could be specific to how they are implemented, or (seems more likely) maybe something in my zsh dotfiles is preventing the problem? Will have to dig...though identifying a solution on the Python side is necessary either way most likely. EDIT: also, the reproduction happens even when connecting to my localhost's sshd multiple times concurrently, which isn't too surprising. So the remote end seems to not matter. |
Also, I tried to verify the note about "previous run excepting prevents issue for next run only" but that did not occur for me; I get the behavior every time regardless. |
Moar: I removed the That's also good, the fewer weird-as-heck triggers, the better. This smells like it should be some basic, dumb threading mistake somewhere, which would typically not be impacted by anything specific on the remote or local end aside from length of time making a race condition (or w/e) more likely. |
Wondering if this is related to pyinvoke/invoke#552 which boils down to Invoke's exception handling thread subclass (used in ThreadingGroup here) possibly having screwed up thread-death detection. Gonna have to make sure I understand that (its potential fix, pyinvoke/invoke#553, wasn't an insta-merge since it seemed odd that we'd have gotten something apparently functional, so wrong) and then see whether applying it makes this symptom disappear. |
Sounds like the test case I had, where I needed to hit it a few times before it bugged out. Seems like you have a good handle on it |
I noticed today that I also couldn't reproduce the Exception behavior I had described a month ago... unfortunately I don't remember what I was doing then. :/ I am indeed running bash here. Good find. The fact that the problem is intermittent without the sleep makes me wonder if this is a race condition of some kind. |
You say that, but now I can't repro it again, or at least it's VERY intermittent. Putting the sleep back in makes it come up a lot more often. Gotta love race conditions. Looking at that Invoke issue, the reporter even mentions a messed-up terminal as a symptom; but strangely I cannot reproduce that symptom even under bash, with his code. Still would not be surprised if the root cause is the same (has to do with a few things around thread death and stdin being closed, or perhaps set back to linewise buffering, properly before exit). |
Checking the spots the other issue mentioned, against the repro case here:
Trying another tack...what exactly about the terminal environment after the bug shows up, has changed? Running
According to
All in all it feels like some higher level 'mode' is being applied to the terminal, similar to how we set stdin to character-buffered reading to let us read 1 byte at a time instead of waiting til the user mashes enter. Which sounds like the behavior under display (sort of...), and which I was wondering about earlier; but reading the code in question (because that Invoke patch mentions it as well, though re: thread death), the mode change is phrased as a contextmanager, so it should always be getting unset regardless of how we break out of that loop. But I'll need to triple check that now. |
Minor: simply saying |
OK! I think I figured it out, while staring at that contextmanager: it's probably because the contextmanager snapshots current terminal state for restore at close of block. But what are we doing in this case? We're running two separate high level threads, each of which is running its own copy of this context manager! And in Invoke, while we intend to be threadsafe, we don't currently test anything but our own low-level IO threads; 99% of the "thread-safety" is simply the use of self-contained object state instead of Fabric 1's horribad global module state. So this particular bit of state-keeping is not ever run concurrently with itself (partly because the "state" is literally the controlling terminal, of which there is only ever one, so...global state...). I haven't 100% proven it yet (about to) but there's no way this isn't it. The thread that runs second is highly likely to snapshot the controlling terminal attributes after the first thread has already set it into character-buffer mode; then, if that second thread also finishes second (again, likely but not certain) it "restores" the bad state, effectively undoing the first thread's restoration. |
Confirmed that the ECHO flag, for example, is definitely being snapshotted by the non-first contextmanager, then restored by same. Working on a solution, which I think will end up just being "try to figure out if setcbreak looks already applied, and no-op in that case instead of doing the snapshot-modify-restore dance". Should have intended effect, is marginally cleaner to boot (never runs setcbreak >1 time) and avoids a corner case where a naive fix might always just set ECHO, etc to "on" – which would break in situations where the stream in question is tty-like but was already set to not be echoing. (Unlikely, sure, but probably not impossible.) |
Since this is an Invoke-only issue I'm gonna give it a home on that tracker - I'm expecting to get a test and fix done for this soon but if y'all have anything else to add please pop on over to pyinvoke/invoke#559 To be crystal clear, once that's fixed it should be out in Invoke 1.0.2/1.1.1 (and possibly 1.2.0 if I get that out at the same time) and no Fabric upgrades should be necessary, only Invoke. |
@bitprophet Great! It works after upgrading Invoke :) |
I'm using a threading group to run shell commands. After running a script that includes a
sleep
, the local terminal is left withstdin
detached (keystrokes not visible on command line) and the terminal has to be reset.I've tried this quite a few times and find that it only happens with ThreadingGroups (SerialGroups are okay). The sleep command can be anywhere in a one-liner (first command, middle, last) and can be joined in the one liner with either semicolons, or double ampersand. All of the commands run as expected, but the terminal remains in a bad state.
Strangely, if the previous run exited with an uncaught exception, the terminal will not be affected.
To reproduce:
Run the script above. After it exits, type something on the command line. If you see no output,
<ctrl>+c
and typereset<enter>
. To see the post-exception behavior, uncomment theraise
line, run the code, comment the line, and run twice more. The first successful run will leave the terminal in a good state. The second will leavestdin
detached.I discovered this issue with
sleep
in my testing, but it's possible other commands may have the same effect. There's also a chance I'm just doing something wrong. If that's the case, my apologies.My setup:
python 3.6.4
fabric 2.1.3
OSX 10.13.5, connecting to Ubuntu 14.04
The text was updated successfully, but these errors were encountered: