New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of paramiko is pretty slow (with possible solution) #175
Comments
Great detective work! I've run into this issue recently as well, where downloading a 33MB file is taking almost 5 minutes (with scp it takes 15 seconds). Any chance of fixing this? |
For what it's worth, the default window size in OpenSSH is 2097152. (see CHAN_TCP_WINDOW_DEFAULT definition in http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/channels.h?rev=1.113) |
The following worked for me, performance went from 0.55 MB/sec to 4.5 MB/sec: class FastTransport(paramiko.Transport):
def __init__(self, sock):
super(FastTransport, self).__init__(sock)
self.window_size = 2147483647
self.packetizer.REKEY_BYTES = pow(2, 40)
self.packetizer.REKEY_PACKETS = pow(2, 40)
ssh_conn = FastTransport(('host.example.com', 22))
ssh_conn.connect(username='username', password='password')
sftp = paramiko.SFTPClient.from_transport(ssh_conn) |
This definitely should be reviewed and patched. I have a script that pull Apache log files, and it took 5 minutes to download a 300MB file. Using window_size of 2147483647 reduced the download time to 21 seconds. |
Specification says that both sides are allowed to send a message to adjust the windows size (sizes up to 2^32-1 is ok or as an int: 4294967295): The message to send is SSH_MSG_CHANNEL_WINDOW_ADJUST, in paramiko it is called MSG_CHANNEL_WINDOW_ADJUST. If I understand the code correctly, we are only listiening on the server sending this message. We should probably also look into how we can modify the packet size, if needed. |
This is an important bug to fix. Adding this tiny line of code cut the time to transfer one file from 60 seconds to 9 seconds:
2 MB (the OpenSSH default) worked very well too; 3 MB gave me a slight boost (5%). This is on a link with RTT around 250 ms. |
That patchset modifies paramiko to use the same default as openssh. That patchset also allows you to modify the window and packet size per session/channel/sftp-transfer opened, how to do that is documented in the sphinx docs. :) |
Closing this given the lack of feedback since Olle posted his changeset note. Please leave comments if you've tried the 1.15 release (out today/tomorrow) and you're still experiencing speed issues. Thanks! |
Not sure if this is exactly related, but I'm experiencing speed issues with The file is written correctly (correct contents and modified date), but the connection lags for 10 min before closing. I haven't tried with larger file sizes and don't know if this lag time scales linearly. |
I am still having this bug with paramiko 1.15.2 (and paramiko master). I was reading the strace output, and it looks like paramiko is reading 8 bytes at a time?!
EDIT: It looks like fd 3, the one it keeps reading 8 bytes from, is /dev/urandom EDIT: I was stracing the main thread, not the one actually doing the download. Here's the strace of the thread:
|
As an FYI to anyone else who might run into speed issues when writing/uploading files with Here's an example: import paramiko
ssh_client = SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.connect('host', username='username', password='password')
sftp_client = ssh_client.open_sftp()
with sftp_client.file('/path/to/file', mode='w') as file:
file.set_pipelined()
file.writelines([line + '\n' for line in lines]) |
Hello, Does the window_size solution works with SSHClient? Im new to paramiko. But it looks like its very slow compared to executing a command from the shell client. Im using paramiko to execute hive queries in a hive host from another host . i use python paramiko as below. ssh = paramiko.SSHClient() the last line consumes much more time compared to the same command run from shell window |
If the command outputs a lot of data, window_size might be an issue. There is another issues that mentions window size with |
@antoncohen I tried some combinations. And for large data output i did have the hanging error. But window_size does not seem to resolve the issue . But moving My actual problem i posted above was for any command which returns even tiny data. Any command i execute it takes atleast 5 seconds to finish. So because i use several commands one after other, it amounts to a bigger delay overall. |
@sdj12 thanks, that helped me as well. |
I'm running into similar issues with running a python Lambda function (code below). It runs for a few minutes and then Lambda times out. The logs show about 14% of the 15MB file transferred. I can increase the Lambda execution time and/or memory and it has no impact. This issue happens when connecting to our internal dev SFTP; however, when I connect to the client prod SFTP then the file downloads within seconds. Any ideas? import paramiko def printTotals(transferred, toBeTransferred): class FastTransport(paramiko.Transport): def lambda_handler(event, context):
|
another related ticket: #1141 |
I was having a similar issue and I could not afford to copy the file locally because of security reasons, I solved it by using a combination of prefetching and bytesIO: def fetch_file_as_bytesIO(sftp, path):
"""
Using the sftp client it retrieves the file on the given path by using pre fetching.
:param sftp: the sftp client
:param path: path of the file to retrieve
:return: bytesIO with the file content
"""
with sftp.file(path, mode='rb') as file:
file_size = file.stat().st_size
file.prefetch(file_size)
file.set_pipelined()
return io.BytesIO(file.read(file_size)) |
Still seeing problems on paramiko.version I've tried all the changes listed above (using the FastTransport detailed by @antoncohen , For my test, I'm downloading a 190MB file from an sftp server. The results look like: Context - I really need the chunk version because the end goal is to download a bunch of 20 to 30GB files from a third party server once a month as fast as possible and push them to AWS S3 using a multipart upload( a total of maybe 400 GB and really it's ok if not ideal, if it takes 8 or 12 hours total). I won't be using local storage in the context of the executing script, so 64MB chunks streaming through memory to s3 is what the ideal physical implementation looks like. |
After tracing into paramiko, the problem I'm seeing is that in paramiko/file.py, the read method is iterating in python script in 32K chunks, and it's just too slow. One of the cpus on the box is pegged at 99% while running. |
To get decent performance, I ended up using the SFTPClient.getfo method - it doesn't have the same performance issues that the sftp file does, but I had to set up a write thread and a read thread - the write thread just uses SFTPClient.getfo to write to the FIFO, and then I pull 64 MB chunks from the FIFO and use multipart upload to S3. The full transfer for the 190MB test file from the sftp server to s3 takes about 8 seconds, which is just fine for our needs. |
@dawsonlp do you mind sharing some sample code, im running in to the same issue trying to solve exactly the same problem and im sure many others can benefit from it as well |
Happy to - the code below writes via multiload to S3, but it is easy to change the destination by using a different sink function. The entry point is the function at the bottom, sftp_to_s3
|
By the way, I'm not sure the FastTransport makes any difference, you might just use the paramiko transport directly. Also, it seems that when it runs, each chunk is getting read in one go, so it is possible that the read/write in execute_sink could be simplified too. However it is working well as is. |
The FastTransport may be needed (I'm still doing some testing to confirm) but without it I was seeing the occasional |
For downloading/transferring large files paramiko is causing more headaches than its worth, it's slow and unreliable
|
snippet from my code: This sped my file reading from remote server from around 20Mbs to 35-40Mbs using paramiko (with a few spikes of 60-75Mbs) pure sftp is still around gig speed though lol |
Hi,
actually I wanted to post the text below on a mailing list but it seems paramiko has none any more?
I am using paramiko and doing a sftp file transfer from a twisted sftp server.
Unfortunately it turned out that paramiko is about 20 times slower then e.g. the putty sftp client for large files (100MB in my testcase).
So I started investigating and figured out that in the putty case every requested packet is immediately written to the outgoing buffer and in the paramiko case almost always queued waiting for a window advertisement (twisted/conch/ssh/channel.py line 180 method write).
The reason is that the remote window size in paramiko's case was never big enough!
So I checked which remote window size each of the tools set
(twisted/conch/ssh/connection.py line 116 method ssh_CHANNEL_OPEN).
Putty uses: 2147483647
Paramiko uses: 65536
So for a first test I patched the paramiko transport window size to be the same as putty's
and then I reduced the window size back until I saw the first performance degradations.
Here are some (non scientific) performance measurements with client and server on the same localhost
OS: Win7 64Bit, Python 2.7.3, Twisted-13.0.0, pycrypto-2.6, paramiko-1.10.1, Putty 0.62
Putty psftp: 10.00 MB/sec
Paramiko ws=default: 0.54 MB/sec
Paramiko ws=2147483647: 9.09 MB/sec
Paramiko ws=134217727: 9.09 MB/sec
Paramiko ws=67108863 5.55 MB/sec
Paramiko ws=33554431 2.44 MB/sec
So I think the default winow size of paramiko should be raised to be at least 134217727
but maybe even higher. Since this window size seems to be a different one then the TCP window size it seems to be harmless. As a side note I was now for the first time able the transmit a 1GB File in a reasonable time and ran into the twisted bug http://twistedmatrix.com/trac/ticket/4395 that rekeying is broken.
Unfortunately it is still not fixed in twisted 13.0.0! To workaround the bug set the following:
Finally I enabled compression in paramiko by calling self.trans.use_compression() which increased the download performance to 52.63 MB/sec for a file containing only zeros and decreased the performance to 6.88 MB/sec for a file containing complete random data.
For reference this was the unmodified paramiko profile, somehow even in the optimized case a lot of time is spent in the aquire method. I don't understand this since this measurement of acquire and release on Win7 returns a much higher performance:
before:
and this with window_size = 134217727
The text was updated successfully, but these errors were encountered: