# FTP Cheatsheet

Here so far:
* Access
* Check connection
* Download a file

## ftplib

My preferred way to access stuff online is using requests, however requests only deals with http(s) protocols. So for ftp access I have used ftplib. Why? Seems to be most commonly used and basic. If I run into problems and will need a better library for ftp download, then it will be here as well.

_Don't forget to close the connection!!!_

### Connect and check connection.

In [33]:
import ftplib

connection_successful = False

try:
    ftp = ftplib.FTP('ftp.ncbi.nlm.nih.gov')
    ftp.login(user='anonymous', passwd='anonymous')
    ftp.cwd('blast/db/')

    # Just to check the connection. Does nothing
    response = ftp.voidcmd('NOOP')
    
    if response == '200 NOOP command successful':
        connection_successful = True
    
    ftp.quit()
except IOError as e:
    # Get here if there is:
    ##  no connection (turned off wifi)
    ##  time out: when I use vpn from home and ncbi is not accessible
    print('FTP access error. Error: {}'.format(e))
except ftplib.error_temp as e:
    # Get here if there is:
    ##  idle timeout - ftp was closed for not being used
    print('FTP connection error. Error: {}'.format(e))

except ftplib.socket.timeout as e:
    # Don't get here 
    print("Request timed out. Error: {}".format(e))
except ftplib.error_perm as e:
    # Get here if :
    ##  ftp directory does not exist  (programmer's error)
    ## command sent is not recognized (programmer's error)
    print("FTP error. {}".format(e))
    ftp.quit()
except:
    # Get here if:
    ## I put a syntax error in the try block, like a line "blah" (programmer's error)
    print("Something went wrong when connecting to ftp.")
    ftp.quit()
    
connection_successful

Something went wrong when connecting to ftp.


True

### List files in ftp directory

If this throws exceptions, see the code above for testing connection for exception handling

In [28]:
import ftplib

ftp = ftplib.FTP('ftp.ncbi.nlm.nih.gov')
ftp.login(user='anonymous', passwd='anonymous')
ftp.cwd('blast/db/')

#data = []
#ftp.dir(data.append)
#ftp.quit()
#data

files = ftp.nlst()
ftp.quit()
files

['README',
 'FASTA',
 'human_genomic.00.tar.gz.md5',
 'env_nr.00.tar.gz',
 'env_nr.00.tar.gz.md5',
 'env_nr.01.tar.gz',
 'env_nr.01.tar.gz.md5',
 'env_nt.00.tar.gz',
 'env_nt.00.tar.gz.md5',
 'env_nt.01.tar.gz',
 'env_nt.01.tar.gz.md5',
 'env_nt.02.tar.gz',
 'env_nt.02.tar.gz.md5',
 'est.tar.gz',
 'est.tar.gz.md5',
 'est_human.00.tar.gz',
 'est_human.00.tar.gz.md5',
 'est_human.01.tar.gz',
 'est_human.01.tar.gz.md5',
 'est_mouse.tar.gz',
 'est_mouse.tar.gz.md5',
 'est_others.00.tar.gz',
 'est_others.00.tar.gz.md5',
 'est_others.01.tar.gz',
 'est_others.01.tar.gz.md5',
 'est_others.02.tar.gz',
 'est_others.02.tar.gz.md5',
 'est_others.03.tar.gz',
 'est_others.03.tar.gz.md5',
 'est_others.04.tar.gz',
 'est_others.04.tar.gz.md5',
 'est_others.05.tar.gz',
 'est_others.05.tar.gz.md5',
 'est_others.06.tar.gz',
 'est_others.06.tar.gz.md5',
 'est_others.07.tar.gz',
 'est_others.07.tar.gz.md5',
 'est_others.08.tar.gz',
 'est_others.08.tar.gz.md5',
 'est_others.09.tar.gz',
 'est_others.09.tar.gz

### Read ftp file into memory

In [29]:
import ftplib

ftp = ftplib.FTP('ftp.ncbi.nlm.nih.gov')
ftp.login(user='anonymous', passwd='anonymous')
ftp.cwd('blast/db/')

# Just to check the connection. Does nothing
ftp.voidcmd('NOOP')

data = []

ftp.retrbinary('RETR nr.00.tar.gz.md5', data.append)

    
ftp.quit()
data

[b'4f5b6e7d134ccd3e3645c5675cd3e9a3  nr.00.tar.gz\n']

## Download a large file

Helpful post:
https://stackoverflow.com/questions/8323607/download-big-files-via-ftp-with-python

#### Connect

#### Download

Downloading one __nt.03.tar.gz__ file in 6*1024 chunks:
CPU times: user 1.2 s, sys: 1.99 s, total: 3.19 s
Wall time: 3min 8s

Same, but with no chunks:
CPU times: user 1.02 s, sys: 1.71 s, total: 2.73 s
Wall time: 2min 15s


In [35]:
%%time

import ftplib
def ncbi_connect():
    try:
        ftp = ftplib.FTP('ftp.ncbi.nlm.nih.gov')
        ftp.login(user='anonymous', passwd='anonymous')
        ftp.cwd('blast/db/')
    except IOError as e:
        print('Error connecting to ftp: {}'.format(e))
    except ftplib.error_perm as e:
        print("FTP error. {}".format(e))
        ftp.quit()
        
    return ftp


#file_name = 'nt.03.tar.gz'  # 3 mins
file_name = 'nr.35.tar.gz'  # 12 sec

max_download_attempts = 3
ftp = ncbi_connect()
ftp_file_size = ftp.size(file_name)
with open('out/'+file_name, 'wb') as f:
    while ftp_file_size != f.tell():
        try:
            print('Original ftp file size:     {}'.format(ftp.size(file_name)))
            if f.tell() != 0:
                ftp.retrbinary('RETR {}'.format(file_name), f.write, f.tell())
            else:    
                ftp.retrbinary('RETR {}'.format(file_name), f.write)
            print('Downloaded local file size: {}'.format(f.tell()))
        except (ftplib.error_temp, IOError) as e:
            print('Problems with ftp connection. Error: {}'.format(e))
            if max_download_attempts != 0:
                print('Re-trying the download of file: {}'.format(file_name))
                ftp = ncbi_connect()
                max_download_attempts -= 1
            else:
                print('Failed to download file: {}'.format(file_name))
                break
        except:
            print('Something went wrong with the download of file: {} Re-download will not be attempted.'.format(file_name))

ftp.quit()

Original ftp file size:     119934159
Downloaded local file size: 119934159
CPU times: user 86.3 ms, sys: 187 ms, total: 274 ms
Wall time: 6.81 s
