Sleep for 0.1s while waiting for command output (fixes ProxyCommand CPU usage) #414

Closed
wants to merge 1 commit into
from

Projects

None yet

4 participants

@jasondunsmore

Avoid using 100% CPU by waiting for 0.1 seconds while waiting for command
output in ProxyCommand.recv().

@jasondunsmore

This is a fix for #413

@lndbrg
Contributor
lndbrg commented Oct 5, 2014

Could you add some comments in the code documenting why we are doing this?

@lndbrg lndbrg added the Bug label Oct 5, 2014
@jasondunsmore

@lndbrg Done. Please see my latest patch.

@lndbrg lndbrg added the Reviewed label Oct 6, 2014
@bitprophet
Member

Does it make sense to use the built-in IO sleep setting (paramiko.common.io_sleep) instead of a hardcoded value?

That setting is used for nearly identical purposes (polling) in an upstream lib (Fabric) as well as the agent module. When we set it up for Fabric it was found that the current default (0.01s, vs your 0.1s) had decent behavior re: combining responsiveness with efficiency. (I.e. 0.01s was a significant CPU savings over having no sleep, but 0.1s or so didn't bring CPU usage down much further and was noticeably less responsive.)

At the very least it feels worth trying with 0.01s and comparing for this specific use case.

@bitprophet bitprophet added this to the 1.15.2 et al milestone Oct 7, 2014
@lndbrg
Contributor
lndbrg commented Oct 7, 2014

It certainly makes sense to use that instead! Totally slipped my mind.

@lndbrg
Contributor
lndbrg commented Oct 7, 2014

@jasondunsmore could you patch up according to @bitprophet comment? :)

@jasondunsmore

@lndbrg Yep. I have a patch. Just need to test it. Will submit it as soon as I'm done.

@jasondunsmore

Yappi report with sleep time of 0.1:

Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg      
..ket.py:333 Packetizer.read_message  6/5    0.000295  0.054750  0.009125
../packet.py:185 Packetizer.read_all  12/11  0.004959  0.054132  0.004511
..miko/proxy.py:74 ProxyCommand.recv  313..  0.018856  0.048483  0.000155
..:1450 Transport._send_user_message  4      0.000074  0.003174  0.000793
..nnel.py:1039 Channel._handle_close  1      0.000035  0.003031  0.003031
..rt.py:1447 Transport._send_message  4      0.000021  0.003029  0.000757
..ket.py:289 Packetizer.send_message  4      0.000163  0.003008  0.000752
..paramiko/packet.py:39 compute_hmac  10     0.002102  0.002995  0.000299
..ient.py:322 SSHClient.exec_command  1      0.000067  0.002396  0.002396
..port.py:590 Transport.open_session  1      0.000011  0.001712  0.001712
..port.py:657 Transport.open_channel  1      0.000153  0.001701  0.001701
..util.py:315 constant_time_bytes_eq  6      0.000543  0.000953  0.000159
[...]

Yappi report with sleep time of 0.01:

Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg      
..ket.py:333 Packetizer.read_message  6/5    0.000253  0.177254  0.029542
../packet.py:185 Packetizer.read_all  12/11  0.004654  0.176926  0.014744
..miko/proxy.py:74 ProxyCommand.recv  307..  0.055892  0.171120  0.000557
..ient.py:322 SSHClient.exec_command  1      0.000063  0.002291  0.002291
..port.py:590 Transport.open_session  1      0.000012  0.001539  0.001539
..port.py:657 Transport.open_channel  1      0.000161  0.001526  0.001526
..:1450 Transport._send_user_message  4      0.000071  0.001124  0.000281
..rt.py:1447 Transport._send_message  4      0.000022  0.000983  0.000246
..ket.py:289 Packetizer.send_message  4      0.000144  0.000961  0.000240
..paramiko/packet.py:39 compute_hmac  10     0.000099  0.000913  0.000091
..nnel.py:1039 Channel._handle_close  1      0.000024  0.000798  0.000798
..util.py:315 constant_time_bytes_eq  6      0.000430  0.000783  0.000131
..py:420 Packetizer._check_keepalive  295    0.000652  0.000652  0.000002
..python2.7/hmac.py:27 HMAC.__init__  10     0.000304  0.000599  0.000060
..amiko/channel.py:51 Channel._check  1      0.000022  0.000547  0.000547
[...]
@bitprophet bitprophet and 1 other commented on an outdated diff Oct 8, 2014
paramiko/proxy.py
@@ -93,6 +95,11 @@ def recv(self, size):
# timeouts; this makes us act more like a real socket
# (where timeouts don't actually drop data.)
self.buffer.append(b)
+ # If we haven't recieved any data yet, wait 0.1
@bitprophet
bitprophet Oct 8, 2014 Member

Comment now disagrees with the actual value! Maybe just say "wait a short period of time" to avoid the problem entirely?

@jasondunsmore
jasondunsmore Oct 9, 2014

Fixed that. Thanks.

@bitprophet
Member

Not familiar with Yappi, does that bear out my memory? Assuming 'yes' since you updated the PR? :) Left a brief note but otherwise this looks good, thanks!

@jasondunsmore jasondunsmore Sleep for 0.1s while waiting for command output
Avoid using 100% CPU by waiting for 0.01 seconds while waiting for command
output in ProxyCommand.recv().
f56f242
@jasondunsmore

@bitprophet Yep, it seemed to be worth the lower wait time. 0.17s CPU time with 0.01s wait vs 0.05s CPU time with a 0.1s wait vs 30s CPU time with no wait.

@bitprophet bitprophet changed the title from Sleep for 0.1s while waiting for command output to Sleep for 0.1s while waiting for command output (fixes ProxyCommand CPU usage) Oct 15, 2014
@acdha
acdha commented Nov 7, 2014

This doesn't seem to be particularly effective in my testing – it went from 100% to 98% CPU usage because it's essentially a select() + single byte read() loop. I think fixing that would require making a blocking read with a timeout so you could set the maximum delay while reading considerably more than one byte when more data is available.

@bitprophet
Member

I still need to integrate & test this on my end but I'll probably ask you for a test patch if my testing bears out your assertion, @acdha :) it sounded like the overall "rub some sleep on it" approach had big dividends for @jasondunsmore though.

@bitprophet
Member

@acdha's notes seem to have turned into #454 which is similar to this but uses the select+ timeout option, plus a few other orthogonal changes which we should rub onto this one here if we decide to go with the sleep option.

@bitprophet
Member

Doing my own tests on this with both the sleep option & the timeout option, using the sample yappi-using snippet provided by @jasondunsmore in the desc of #413.

Hitting one of my prod servers via my prod bastion system from my localhost, with no changes applied, bears out the "all 30 wallclock seconds are spent in recv":

name                                  ncall  tsub      ttot      tavg    
..miko/proxy.py:73 ProxyCommand.recv  321..  13.68937  30.96013  0.096449
..ket.py:333 Packetizer.read_message  6/5    0.000225  30.36787  5.061312
../packet.py:185 Packetizer.read_all  12/11  0.000000  30.36775  2.530646
..ient.py:322 SSHClient.exec_command  1      0.000071  0.003540  0.003540

With this PR (#414) applied, using the 0.01s io_sleep value, I confirm Jason's results, with recv still tops but only active for 0.46 seconds instead of 30ish:

name                                  ncall  tsub      ttot      tavg      
..miko/proxy.py:75 ProxyCommand.recv  303..  0.189383  0.464352  0.001533
..ket.py:333 Packetizer.read_message  6/5    0.000359  0.462989  0.077165
../packet.py:185 Packetizer.read_all  12/11  0.000000  0.462476  0.038540
..ient.py:322 SSHClient.exec_command  1      0.000074  0.003831  0.003831

With #454 applied (this includes not only the select timeout but also a more efficient timekeeping function call & reading >1 byte at a time) things are another order of magnitude faster, only active for 0.057 seconds:

name                                  ncall  tsub      ttot      tavg    
..ket.py:333 Packetizer.read_message  6/5    0.000349  0.057388  0.009565
../packet.py:185 Packetizer.read_all  12/11  0.008463  0.055538  0.004628
..miko/proxy.py:74 ProxyCommand.recv  318..  0.022239  0.044484  0.000140
..ient.py:322 SSHClient.exec_command  1      0.000067  0.003249  0.003249

If I steal the orthogonal improvements from #454 and apply them to #414, we see an improvement, but not nearly as marked as with the select-timeout approach ("only" 0.36 seconds), implying the select-timeout approach is significantly better than sleeping (though we could potentially bump the sleep around, I doubt it'd ever get that good?):

name                                  ncall  tsub      ttot      tavg    
..ket.py:333 Packetizer.read_message  6/5    0.000381  0.363533  0.060589
../packet.py:185 Packetizer.read_all  12/11  0.000000  0.361403  0.030117
..miko/proxy.py:75 ProxyCommand.recv  300..  0.151985  0.360740  0.001202
..ient.py:322 SSHClient.exec_command  1      0.000089  0.004060  0.004060

I went back and ran the test a few more times for each setup to confirm things:

  • #414 by itself: 0.44, 0.46, 0.49
  • #414 w/ cherry-picked #454 changes: 0.36, 0.38, 0.40
  • #454: 0.06, 0.06, 0.06

So while either approach is a huge fix, at this point it looks like the select timeout approach is the best, so I'll merge that (closing this) & give credit for all involved.

@bitprophet bitprophet closed this Dec 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment