Performance of paramiko is pretty slow (with possible solution) #175

Closed
sturmf opened this Issue Jun 13, 2013 · 11 comments

Projects

None yet

9 participants

@sturmf
sturmf commented Jun 13, 2013

Hi,
actually I wanted to post the text below on a mailing list but it seems paramiko has none any more?

I am using paramiko and doing a sftp file transfer from a twisted sftp server.
Unfortunately it turned out that paramiko is about 20 times slower then e.g. the putty sftp client for large files (100MB in my testcase).
So I started investigating and figured out that in the putty case every requested packet is immediately written to the outgoing buffer and in the paramiko case almost always queued waiting for a window advertisement (twisted/conch/ssh/channel.py line 180 method write).
The reason is that the remote window size in paramiko's case was never big enough!

So I checked which remote window size each of the tools set
(twisted/conch/ssh/connection.py line 116 method ssh_CHANNEL_OPEN).

Putty uses: 2147483647
Paramiko uses: 65536

So for a first test I patched the paramiko transport window size to be the same as putty's

self.trans = paramiko.Transport((hostname, port))
self.trans.window_size = 2147483647

and then I reduced the window size back until I saw the first performance degradations.

Here are some (non scientific) performance measurements with client and server on the same localhost
OS: Win7 64Bit, Python 2.7.3, Twisted-13.0.0, pycrypto-2.6, paramiko-1.10.1, Putty 0.62

Putty psftp: 10.00 MB/sec
Paramiko ws=default: 0.54 MB/sec
Paramiko ws=2147483647: 9.09 MB/sec
Paramiko ws=134217727: 9.09 MB/sec
Paramiko ws=67108863 5.55 MB/sec
Paramiko ws=33554431 2.44 MB/sec

So I think the default winow size of paramiko should be raised to be at least 134217727
but maybe even higher. Since this window size seems to be a different one then the TCP window size it seems to be harmless. As a side note I was now for the first time able the transmit a 1GB File in a reasonable time and ran into the twisted bug http://twistedmatrix.com/trac/ticket/4395 that rekeying is broken.
Unfortunately it is still not fixed in twisted 13.0.0! To workaround the bug set the following:

self.trans = paramiko.Transport((hostname, port))
self.trans.packetizer.REKEY_BYTES = pow(2, 40) # 1TB max, this is a security degradation
self.trans.packetizer.REKEY_PACKETS = pow(2, 40) # 1TB max, this is a security degradation

Finally I enabled compression in paramiko by calling self.trans.use_compression() which increased the download performance to 52.63 MB/sec for a file containing only zeros and decreased the performance to 6.88 MB/sec for a file containing complete random data.

For reference this was the unmodified paramiko profile, somehow even in the optimized case a lot of time is spent in the aquire method. I don't understand this since this measurement of acquire and release on Win7 returns a much higher performance:

python -m timeit -s "from threading import Lock; l=Lock(); a=l.acquire; r=l.release" "a(); r()"
10000000 loops, best of 3: 0.159 usec per loop

before:

746002 function calls in 170.389 seconds

Ordered by: call count

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
145033    0.015    0.000    0.015    0.000 {len}
48405  168.886    0.003  168.886    0.003 {method 'acquire' of 'thread.lock' objects}
37137    0.007    0.000    0.007    0.000 {method 'release' of 'thread.lock' objects}
17103    0.011    0.000    0.011    0.000 {_struct.pack}
17091    0.014    0.000    0.014    0.000 {hasattr}
17076    0.010    0.000    0.010    0.000 {_hashlib.openssl_sha1}
17076    0.022    0.000    0.068    0.000 C:\Program Files (x86)\Python27\lib\site-packages\Crypto\Hash\SHA.py:73(__init__)
17076    0.022    0.000    0.046    0.000 C:\Program Files (x86)\Python27\lib\site-packages\Crypto\Hash\hashalgo.py:34(__init__)
17076    0.015    0.000    0.015    0.000 {method 'update' of '_hashlib.HASH' objects}
12841    0.006    0.000    0.006    0.000 {method 'write' of 'cStringIO.StringO' objects}
12814    0.004    0.000    0.004    0.000 {chr}
12807    0.011    0.000    0.024    0.000 C:\Program Files (x86)\Python27\lib\site-packages\Crypto\Hash\hashalgo.py:53(update)
12305    0.005    0.000    0.005    0.000 C:\Program Files (x86)\Python27\lib\threading.py:62(_note)
12298    0.009    0.000    0.009    0.000 {time.time}
10334    0.009    0.000    0.009    0.000 {isinstance}
10277    0.031    0.000    0.082    0.000 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\channel.py:1171(_check_add_window)
10277    0.019    0.000    0.019    0.000 {method 'tostring' of 'array.array' objects}
10277    0.118    0.000  167.202    0.016 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\buffered_pipe.py:101(read)
10277    0.055    0.000  169.884    0.017 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\channel.py:601(recv)
 9198    0.004    0.000    0.004    0.000 {_struct.unpack}
 9195    0.018    0.000    0.018    0.000 {method 'read' of 'cStringIO.StringI' objects}
[...]

and this with window_size = 134217727

298643 function calls in 11.527 seconds

Ordered by: call count

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
88806    0.008    0.000    0.008    0.000 {len}
24538   10.980    0.000   10.980    0.000 {method 'acquire' of 'thread.lock' objects}
15358    0.002    0.000    0.002    0.000 {method 'release' of 'thread.lock' objects}
 9198    0.003    0.000    0.003    0.000 {_struct.unpack}
 9195    0.012    0.000    0.012    0.000 {method 'read' of 'cStringIO.StringI' objects}
 9195    0.015    0.000    0.029    0.000 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\message.py:103(get_bytes)
 6175    0.004    0.000    0.004    0.000 {isinstance}
 6136    0.016    0.000    0.029    0.000 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\message.py:139(get_int)
 6118    0.021    0.000    0.072    0.000 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\channel.py:1171(_check_add_window)
 6118    0.064    0.000   11.044    0.002 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\buffered_pipe.py:101(read)
 6118    0.017    0.000   11.134    0.002 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\channel.py:601(recv)
 6118    0.022    0.000   11.162    0.002 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\sftp.py:144(_read_all)
 6118    0.013    0.000    0.013    0.000 {method 'tostring' of 'array.array' objects}
 6112    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
 6106    0.002    0.000    0.002    0.000 {min}
 6105    0.022    0.000    0.026    0.000 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\sftp_file.py:108(_data_in_prefetch_buffers)
 6105    0.002    0.000    0.002    0.000 {method 'keys' of 'dict' objects}
 3099    0.001    0.000    0.001    0.000 C:\Program Files (x86)\Python27\lib\threading.py:62(_note)
 3092    0.002    0.000    0.002    0.000 {time.time}
 3082    0.001    0.000    0.001    0.000 {cStringIO.StringIO}
 3080    0.005    0.000    0.006    0.000 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\message.py:40(__init__)
 3073    0.000    0.000    0.000    0.000 {ord}
 3062    0.008    0.000    0.008    0.000 {thread.allocate_lock}
 3060    0.002    0.000    0.002    0.000 C:\Program Files (x86)\Python27\lib\threading.py:219(_release_save)
 3060    0.016    0.000   10.935    0.004 C:\Program Files (x86)\Python27\lib\threading.py:234(wait)
 3060    0.002    0.000    0.009    0.000 C:\Program Files (x86)\Python27\lib\threading.py:225(_is_owned)
 3060    0.002    0.000    0.012    0.000 C:\Program Files (x86)\Python27\lib\threading.py:222(_acquire_restore)
 3059    0.029    0.000   11.303    0.004 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\sftp_client.py:716(_read_response)
 3059    0.025    0.000   11.189    0.004 C:\Program Files (x86)\Python27\lib\site-packages\paramiko\sftp.py:174(_read_packet)
[...]
@kilink
kilink commented Aug 4, 2013

Great detective work! I've run into this issue recently as well, where downloading a 33MB file is taking almost 5 minutes (with scp it takes 15 seconds).

Any chance of fixing this?

@kilink
kilink commented Aug 5, 2013

For what it's worth, the default window size in OpenSSH is 2097152.

(see CHAN_TCP_WINDOW_DEFAULT definition in http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/channels.h?rev=1.113)

@antoncohen

The following worked for me, performance went from 0.55 MB/sec to 4.5 MB/sec:

class FastTransport(paramiko.Transport):
    def __init__(self, sock):
        super(FastTransport, self).__init__(sock)
        self.window_size = 2147483647
        self.packetizer.REKEY_BYTES = pow(2, 40)
        self.packetizer.REKEY_PACKETS = pow(2, 40)

ssh_conn = FastTransport(('host.example.com', 22))
ssh_conn.connect(username='username', password='password')
sftp = paramiko.SFTPClient.from_transport(ssh_conn)
@hqhoang
hqhoang commented Sep 23, 2013

This definitely should be reviewed and patched. I have a script that pull Apache log files, and it took 5 minutes to download a 300MB file. Using window_size of 2147483647 reduced the download time to 21 seconds.

@lndbrg
Contributor
lndbrg commented Jan 20, 2014

Specification says that both sides are allowed to send a message to adjust the windows size (sizes up to 2^32-1 is ok or as an int: 4294967295):
http://www.ietf.org/rfc/rfc4254.txt

The message to send is SSH_MSG_CHANNEL_WINDOW_ADJUST, in paramiko it is called MSG_CHANNEL_WINDOW_ADJUST. If I understand the code correctly, we are only listiening on the server sending this message.
To fix the slow file transfer, but still keeping it speedy for smaller actions (e.g standard ssh) keeping the small window size, but let the sftp subsystem renegotiate the window size on file transfers.

We should probably also look into how we can modify the packet size, if needed.

@jzwinck
jzwinck commented Feb 12, 2014

This is an important bug to fix. Adding this tiny line of code cut the time to transfer one file from 60 seconds to 9 seconds:

# do this right after ssh.connect()
ssh.get_transport().window_size = 3 * 1024 * 1024

2 MB (the OpenSSH default) worked very well too; 3 MB gave me a slight boost (5%). This is on a link with RTT around 250 ms.

This was referenced Feb 22, 2014
@lndbrg
Contributor
lndbrg commented Aug 14, 2014

I have a patchset at #372. Could @jzwinck, @hqhoang, @antoncohen, @sturmf and @kilink try it out and see it it works any better?

@lndbrg
Contributor
lndbrg commented Aug 14, 2014

That patchset modifies paramiko to use the same default as openssh.

That patchset also allows you to modify the window and packet size per session/channel/sftp-transfer opened, how to do that is documented in the sphinx docs. :)

@bitprophet
Member

Closing this given the lack of feedback since Olle posted his changeset note. Please leave comments if you've tried the 1.15 release (out today/tomorrow) and you're still experiencing speed issues. Thanks!

@bitprophet bitprophet closed this Sep 8, 2014
@ataki
ataki commented Jun 30, 2015

Not sure if this is exactly related, but I'm experiencing speed issues with put() not closing the connection when writing a 300MB file to an sftp server:

screen shot 2015-06-30 at 2 46 24 pm

The file is written correctly (correct contents and modified date), but the connection lags for 10 min before closing.

I haven't tried with larger file sizes and don't know if this lag time scales linearly.

@gavinwahl

I am still having this bug with paramiko 1.15.2 (and paramiko master). I was reading the strace output, and it looks like paramiko is reading 8 bytes at a time?!

read(3, "...", 8) = 8
poll([{fd=8, events=POLLOUT}], 1, 100)  = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "..."..., 68, 0, NULL, 0) = 68
futex(0x1f2fb20, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
read(3, "...", 8)     = 8
poll([{fd=8, events=POLLOUT}], 1, 100)  = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "..."..., 68, 0, NULL, 0) = 68
futex(0x1f2fb20, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
...

EDIT: It looks like fd 3, the one it keeps reading 8 bytes from, is /dev/urandom

EDIT: I was stracing the main thread, not the one actually doing the download. Here's the strace of the thread:

poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 16, 0, NULL, NULL) = 16
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 32804, 0, NULL, NULL) = 2904
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 29900, 0, NULL, NULL) = 2896
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 27004, 0, NULL, NULL) = 1448
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 25556, 0, NULL, NULL) = 1448
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 24108, 0, NULL, NULL) = 1448
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 22660, 0, NULL, NULL) = 1448
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 21212, 0, NULL, NULL) = 1448
poll([{fd=8, events=POLLIN}], 1, 100)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "...", 19764, 0, NULL, NULL) = 1448
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment