New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent output freezing #352

Closed
bitprophet opened this Issue Aug 19, 2011 · 13 comments

Comments

Projects
None yet
1 participant
@bitprophet
Member

bitprophet commented Aug 19, 2011

Description

Sometimes fabric output just freezes, like when pressing Ctrl+S in linux console, and in order to unfreeze it I need to press any key (space for example). This happens approximately after interactive prompts (sudo or apt-get), but not exactly after them - I answer required question, fabric emits next one or two remote output lines and only then freezes. Overall it looks like race condition combined with blocking for input.

I can not reproduce this bug reliably and yet not figured out exact conditions when it happens. Environment: Linux, Python 2.6.6, Paramiko 1.7.6, git fabric snapshot (5d27faa, but happened in earlier versions too), local shell is zsh, remote is bash.


Originally submitted by Max Arnold (LwarX) on 2011-05-15 at 07:17am EDT

Relations

  • Related to #320: Issue with buffering input with sudo

Closed as Done on 2011-06-24 at 08:35pm EDT

@ghost ghost assigned bitprophet Aug 19, 2011

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


Catched this again, now with some debugging:

Fabric task:

def apt_update():
    sudo('apt-get -y update')
    sudo('apt-get dist-upgrade')
    sudo('apt-get --purge autoremove')
    sudo('apt-get autoclean')

Position in the output where it freezed:

...
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/main Sources
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/restricted Sources
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/universe Packages
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/universe Sources
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/multiverse Packages
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/multiverse Sources
[hostname] out: Reading package lists... Done
[hostname] out:

... OUTPUT FREEZED HERE ...

[hostname] sudo: apt-get dist-upgrade
[hostname] out: Reading package lists... Done
[hostname] out: Building dependency tree       
[hostname] out: Reading state information... Done
[hostname] out: Calculating upgrade... Done
[hostname] out: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
...

Process tree:

22533 /usr/bin/python2.6 /usr/bin/fab -H hostname:1223 apt_update
22553    /usr/bin/python2.6 /usr/bin/fab -H hostname:1223 apt_update
22550    /usr/bin/python2.6 /usr/bin/fab -H hostname:1223 apt_update

Attached strace to each pid:

% strace -p 22533

Process 22533 attached - interrupt to quit
futex(0x824f670, FUTEX_WAIT_PRIVATE, 0, NULL

... PRESSED SPACE TO UNFREEZE ...

) = 0
futex(0x82579c0, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "\n", 1)                       = 1
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
brk(0x829e000)                          = 0x829e000
brk(0x829a000)                          = 0x829a000
write(1, "[hostname:1223] s"..., 58) = 58
gettimeofday({1305459083, 491875}, NULL) = 0
read(15, "\10\244\26\204w\331mp", 8)    = 8
gettimeofday({1305459083, 492082}, NULL) = 0
times({tms_utime=125, tms_stime=12, tms_cutime=1, tms_cstime=0}) = 546282572
gettimeofday({1305459083, 492222}, NULL) = 0
read(16, "\3\376:\264W\362n\256\251\234!\\YN\272\266\34\357\345", 19) = 19
gettimeofday({1305459083, 492479}, NULL) = 0
poll([{fd=5, events=POLLOUT}], 1, 100)  = 1 ([{fd=5, revents=POLLOUT}])
send(5, "\377\205% ]\337Kv\367\325D\207\202T\340!\34\235\325\207X\213{\205E\2c\254\7\333\334\7"..., 68, 0) = 68
gettimeofday({1305459083, 493016}, NULL) = 0
gettimeofday({1305459083, 493063}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
gettimeofday({1305459083, 494268}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
gettimeofday({1305459083, 496458}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_CONTINUE or TCSETSF, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=39, ws_col=96, ws_xpixel=0, ws_ypixel=0}) = 0
...
% strace -p 22553

Process 22553 attached - interrupt to quit
read(0,

... PRESSED SPACE TO UNFREEZE ...

 " ", 1024)                      = 1
futex(0x824f670, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x82579c0, FUTEX_WAKE_PRIVATE, 1) = 0
madvise(0xb4c66000, 8372224, MADV_DONTNEED) = 0
_exit(0)                                = ?
Process 22553 detached
% strace -p 22550

Process 22550 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)

... PRESSED SPACE TO UNFREEZE ...

poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "I.\370\345u\n\364\31\244&c\332\211\310\272\336", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "t\233\336\23\1\16|'\273S\303\237\0217\257K\200\326\351\331\374^\262\2\363\231\266\t\0\362_\312"..., 36, 0) = 36
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\231\271yl\237\206d\377\300\21m\223\252\307\355@", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\22\203cZeqy[Y\276\7\1d\315\2273\24\260-\272", 20, 0) = 20
futex(0x8251180, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\10$\301\17\265j\17\263>tx/0\357\327\f", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\31l\254]P\245y\2\222\32\315\3468\20U\254\3120\352G\222\337\35\324E\337\245\235\277\307\337\n"..., 36, 0) = 36
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\2735LT\330\20\34\335\26l\301J |\204|", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\210\34\220\3720,\304^\30ob\326]LXi\\\4L]", 20, 0) = 20
futex(0x82599e8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\356\25\223Q\20\333\22\376\235\313\346\205\221\214\257\244", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "!\30\335@B\201\315\300Y\304\241\220MK\232dp\350K0\343D\255\367\26r\335\227\37\355\251\327"..., 324, 0) = 324
futex(0x8237e90, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x82579c0, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "A\323]\300\250o6\207\271\2207\260\267a\351A", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "x\256\3106|\321\306\320?,\230\2031A\26696\360l\342\6\357\301B\rJ5+MZ\335\264"..., 164, 0) = 164
futex(0x8278280, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x82579c0, FUTEX_WAKE_PRIVATE, 1) = 1
...

on 2011-05-15 at 08:04am EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


Catched this again, now with some debugging:

Fabric task:

def apt_update():
    sudo('apt-get -y update')
    sudo('apt-get dist-upgrade')
    sudo('apt-get --purge autoremove')
    sudo('apt-get autoclean')

Position in the output where it freezed:

...
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/main Sources
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/restricted Sources
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/universe Packages
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/universe Sources
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/multiverse Packages
[hostname] out: Hit http://archive.ubuntu.com lucid-updates/multiverse Sources
[hostname] out: Reading package lists... Done
[hostname] out:

... OUTPUT FREEZED HERE ...

[hostname] sudo: apt-get dist-upgrade
[hostname] out: Reading package lists... Done
[hostname] out: Building dependency tree       
[hostname] out: Reading state information... Done
[hostname] out: Calculating upgrade... Done
[hostname] out: 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
...

Process tree:

22533 /usr/bin/python2.6 /usr/bin/fab -H hostname:1223 apt_update
22553    /usr/bin/python2.6 /usr/bin/fab -H hostname:1223 apt_update
22550    /usr/bin/python2.6 /usr/bin/fab -H hostname:1223 apt_update

Attached strace to each pid:

% strace -p 22533

Process 22533 attached - interrupt to quit
futex(0x824f670, FUTEX_WAIT_PRIVATE, 0, NULL

... PRESSED SPACE TO UNFREEZE ...

) = 0
futex(0x82579c0, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "\n", 1)                       = 1
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
brk(0x829e000)                          = 0x829e000
brk(0x829a000)                          = 0x829a000
write(1, "[hostname:1223] s"..., 58) = 58
gettimeofday({1305459083, 491875}, NULL) = 0
read(15, "\10\244\26\204w\331mp", 8)    = 8
gettimeofday({1305459083, 492082}, NULL) = 0
times({tms_utime=125, tms_stime=12, tms_cutime=1, tms_cstime=0}) = 546282572
gettimeofday({1305459083, 492222}, NULL) = 0
read(16, "\3\376:\264W\362n\256\251\234!\\YN\272\266\34\357\345", 19) = 19
gettimeofday({1305459083, 492479}, NULL) = 0
poll([{fd=5, events=POLLOUT}], 1, 100)  = 1 ([{fd=5, revents=POLLOUT}])
send(5, "\377\205% ]\337Kv\367\325D\207\202T\340!\34\235\325\207X\213{\205E\2c\254\7\333\334\7"..., 68, 0) = 68
gettimeofday({1305459083, 493016}, NULL) = 0
gettimeofday({1305459083, 493063}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
gettimeofday({1305459083, 494268}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
gettimeofday({1305459083, 496458}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_CONTINUE or TCSETSF, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=39, ws_col=96, ws_xpixel=0, ws_ypixel=0}) = 0
...
% strace -p 22553

Process 22553 attached - interrupt to quit
read(0,

... PRESSED SPACE TO UNFREEZE ...

 " ", 1024)                      = 1
futex(0x824f670, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x82579c0, FUTEX_WAKE_PRIVATE, 1) = 0
madvise(0xb4c66000, 8372224, MADV_DONTNEED) = 0
_exit(0)                                = ?
Process 22553 detached
% strace -p 22550

Process 22550 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=5, events=POLLIN}], 1, 100)   = 0 (Timeout)

... PRESSED SPACE TO UNFREEZE ...

poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "I.\370\345u\n\364\31\244&c\332\211\310\272\336", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "t\233\336\23\1\16|'\273S\303\237\0217\257K\200\326\351\331\374^\262\2\363\231\266\t\0\362_\312"..., 36, 0) = 36
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\231\271yl\237\206d\377\300\21m\223\252\307\355@", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\22\203cZeqy[Y\276\7\1d\315\2273\24\260-\272", 20, 0) = 20
futex(0x8251180, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\10$\301\17\265j\17\263>tx/0\357\327\f", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\31l\254]P\245y\2\222\32\315\3468\20U\254\3120\352G\222\337\35\324E\337\245\235\277\307\337\n"..., 36, 0) = 36
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\2735LT\330\20\34\335\26l\301J |\204|", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\210\34\220\3720,\304^\30ob\326]LXi\\\4L]", 20, 0) = 20
futex(0x82599e8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "\356\25\223Q\20\333\22\376\235\313\346\205\221\214\257\244", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "!\30\335@B\201\315\300Y\304\241\220MK\232dp\350K0\343D\255\367\26r\335\227\37\355\251\327"..., 324, 0) = 324
futex(0x8237e90, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x82579c0, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "A\323]\300\250o6\207\271\2207\260\267a\351A", 16, 0) = 16
poll([{fd=5, events=POLLIN}], 1, 100)   = 1 ([{fd=5, revents=POLLIN}])
recv(5, "x\256\3106|\321\306\320?,\230\2031A\26696\360l\342\6\357\301B\rJ5+MZ\335\264"..., 164, 0) = 164
futex(0x8278280, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x82579c0, FUTEX_WAKE_PRIVATE, 1) = 1
...

on 2011-05-15 at 08:04am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


So, it looks like freezing happens on the second sudo command (first one asks for password, second doesn't but fabric waits for any input). With pty=False problem still happens. Maybe related to #320?


on 2011-05-15 at 08:22am EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


So, it looks like freezing happens on the second sudo command (first one asks for password, second doesn't but fabric waits for any input). With pty=False problem still happens. Maybe related to #320?


on 2011-05-15 at 08:22am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Thanks for the ticket!

First question, are you intending to press Ctrl-S in order to freeze the program? That's unclear from your description. It's a Unix terminal escape sequence (whose intended behavior is to freeze terminal output, to be continued with Ctrl-Q) which Fabric definitely isn't designed to handle right now.

Secondly, if you were using that for its intended purpose and the problem is that it's not unfreezing correctly at times, yes, it could definitely be related to #320 or the interactive-input feature in general.


on 2011-05-15 at 12:21pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Thanks for the ticket!

First question, are you intending to press Ctrl-S in order to freeze the program? That's unclear from your description. It's a Unix terminal escape sequence (whose intended behavior is to freeze terminal output, to be continued with Ctrl-Q) which Fabric definitely isn't designed to handle right now.

Secondly, if you were using that for its intended purpose and the problem is that it's not unfreezing correctly at times, yes, it could definitely be related to #320 or the interactive-input feature in general.


on 2011-05-15 at 12:21pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


Sorry if I described this not very clearly (english is not my native language). What I tried to say with phrase "like when pressing Ctrl+S in linux console" is that it behaves similarly, not that I actually press this keyboard shortcut.


on 2011-05-15 at 02:09pm EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


Sorry if I described this not very clearly (english is not my native language). What I tried to say with phrase "like when pressing Ctrl+S in linux console" is that it behaves similarly, not that I actually press this keyboard shortcut.


on 2011-05-15 at 02:09pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Thanks, that makes it clearer. I have not heard of this problem happening before and it will be hard to fix if we cannot find a reliable way to reproduce it.

You should definitely follow #320 and if we're lucky, the fix for that ticket may fix this for you as well. If it does not, then we will have to see if there are other obvious problems with this part of the code (re: input/output threads).


on 2011-05-15 at 02:14pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Thanks, that makes it clearer. I have not heard of this problem happening before and it will be hard to fix if we cannot find a reliable way to reproduce it.

You should definitely follow #320 and if we're lucky, the fix for that ticket may fix this for you as well. If it does not, then we will have to see if there are other obvious problems with this part of the code (re: input/output threads).


on 2011-05-15 at 02:14pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


Can you recommend any places in the code where I can add debugging output or logging (so it will be recorded next time when problem happens)?


on 2011-05-15 at 02:37pm EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


Can you recommend any places in the code where I can add debugging output or logging (so it will be recorded next time when problem happens)?


on 2011-05-15 at 02:37pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Not really, since I'm not sure what could be causing such a thing. Perhaps start in fabric/io.py since IIRC that's where we moved most of the printing/waiting for input stuff.


on 2011-05-15 at 09:06pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Not really, since I'm not sure what could be causing such a thing. Perhaps start in fabric/io.py since IIRC that's where we moved most of the printing/waiting for input stuff.


on 2011-05-15 at 09:06pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


It freezes in operations.py _execute():

        # Wait for threads to exit so we aren't left with stale threads
        for worker in workers:
            worker.thread.join()

Probably it waits for input thread.


on 2011-05-16 at 12:56am EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


It freezes in operations.py _execute():

        # Wait for threads to exit so we aren't left with stale threads
        for worker in workers:
            worker.thread.join()

Probably it waits for input thread.


on 2011-05-16 at 12:56am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


The exact line where blocking happens in io.py:

            # Send all local stdin to remote end's stdin
            byte = msvcrt.getch() if win32 else sys.stdin.read(1)

At freezing time remote command already finished (I do not see apt-get in ps -aux output).


on 2011-05-16 at 01:26am EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


The exact line where blocking happens in io.py:

            # Send all local stdin to remote end's stdin
            byte = msvcrt.getch() if win32 else sys.stdin.read(1)

At freezing time remote command already finished (I do not see apt-get in ps -aux output).


on 2011-05-16 at 01:26am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Max Arnold (LwarX) posted:


def input_loop(chan, using_pty):
    while not chan.exit_status_ready():
        if win32:
            have_char = msvcrt.kbhit()
        else:
            r, w, x = select([sys.stdin], [], [], 0.0)
            have_char = (r and r[0] == sys.stdin)
        if have_char:
            print "[DEBUG] read"
            # Send all local stdin to remote end's stdin
            byte = msvcrt.getch() if win32 else sys.stdin.read(1)
            print "[DEBUG] send"
            chan.sendall(byte)
            # Optionally echo locally, if needed.
            if not using_pty and env.echo_stdin:
                # Not using fastprint() here -- it prints as 'user'
                # output level, don't want it to be accidentally hidden
                sys.stdout.write(byte)
                sys.stdout.flush()
        time.sleep(io_sleep)

After some debugging I can confirm race condition between input read thread and getpass.getpass(). There is small chance that "[DEBUG] read" will be emitted during sudo password input. Then this tread just waits for next character indefinitely (actual character is consumed by getpass).

Possible workaround is to either stop this tread before getpass(), or just pause it via additional variable which can be modified outside of the thread:

if have_char and not channel.stop_input:

Proper solution is to have only one input read process in each moment of time.


on 2011-05-18 at 12:37am EDT

Member

bitprophet commented Aug 19, 2011

Max Arnold (LwarX) posted:


def input_loop(chan, using_pty):
    while not chan.exit_status_ready():
        if win32:
            have_char = msvcrt.kbhit()
        else:
            r, w, x = select([sys.stdin], [], [], 0.0)
            have_char = (r and r[0] == sys.stdin)
        if have_char:
            print "[DEBUG] read"
            # Send all local stdin to remote end's stdin
            byte = msvcrt.getch() if win32 else sys.stdin.read(1)
            print "[DEBUG] send"
            chan.sendall(byte)
            # Optionally echo locally, if needed.
            if not using_pty and env.echo_stdin:
                # Not using fastprint() here -- it prints as 'user'
                # output level, don't want it to be accidentally hidden
                sys.stdout.write(byte)
                sys.stdout.flush()
        time.sleep(io_sleep)

After some debugging I can confirm race condition between input read thread and getpass.getpass(). There is small chance that "[DEBUG] read" will be emitted during sudo password input. Then this tread just waits for next character indefinitely (actual character is consumed by getpass).

Possible workaround is to either stop this tread before getpass(), or just pause it via additional variable which can be modified outside of the thread:

if have_char and not channel.stop_input:

Proper solution is to have only one input read process in each moment of time.


on 2011-05-18 at 12:37am EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Yes, that sounds like an implementation of the solution proposed over in #320, glad to hear it seems to be working. I'll take a closer look and expand/integrate soon. Thanks!


on 2011-05-18 at 01:45pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Yes, that sounds like an implementation of the solution proposed over in #320, glad to hear it seems to be working. I'll take a closer look and expand/integrate soon. Thanks!


on 2011-05-18 at 01:45pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


Applied in changeset commit:efd9fef820dd6f2a593c172eac7e500c5e8f0602.


on 2011-06-23 at 11:09pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Applied in changeset commit:efd9fef820dd6f2a593c172eac7e500c5e8f0602.


on 2011-06-23 at 11:09pm EDT

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Aug 19, 2011

Member

Jeff Forcier (bitprophet) posted:


1.0.2 just went out -- includes the patch -- let me know if it somehow doesn't fix, thanks!


on 2011-06-24 at 08:35pm EDT

Member

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


1.0.2 just went out -- includes the patch -- let me know if it somehow doesn't fix, thanks!


on 2011-06-24 at 08:35pm EDT

@bitprophet bitprophet closed this Aug 19, 2011

ramonvanalteren pushed a commit to ramonvanalteren/fabric that referenced this issue Aug 31, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment