<a href="https://colab.research.google.com/github/paulynamagana/AFDB_notebooks/blob/main/AFDB_FTP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<img src = "https://www.embl.org/about/info/communications/wp-content/uploads/2017/09/Ebi_official_logo.png"
 height="100" align="right">

# Access structures from AlphaFold DB via FTP

FTP, or File Transfer Protocol, is a standard network protocol facilitating the exchange of files between computers.



<br>

As of September 2023, the EMBL-EBI’s FTP area hosts TAR files for proteomes of 48 organisms, including model organisms and WHO pathogens of interest.

We document every data version update in our [CHANGELOG](https://ftp.ebi.ac.uk/pub/databases/alphafold/CHANGELOG.txt).


<br>

The folders are named following a structured convention, comprising three distinct elements separated by underscores:

- Reference Proteome (UPID): UP000000429
- Taxonomy ID: 85962
- Organism: HELPY (derived from the first three characters of the genus, "Helicobacter," and the first two characters of the species, "pylori").

<br>

In order to understand the folders, visit the [Downloads tab](https://alphafold.ebi.ac.uk/download)

You can also find the compressed files for Swiss-Prot which contains 542,378 predicted structures:

|File type|File name|Size|
|---------|--------------|---------------------|
|Swiss-Prot (CIF Files)|swissprot_cif_v4.tar| 37,643 MB|
|Swiss-Prot (PDB files)|swissprot_pdb_v4.tar|26,935 MB|




In [None]:
import ftplib

ftp_server = ftplib.FTP("ftp.ebi.ac.uk")

# Login as an anonymous user
ftp_server.login("anonymous", "anonymous@")

# Navigate to the directory
ftp_server.cwd("/pub/databases/alphafold/")

# List the contents of the directory
ftp_server.retrlines('LIST')

-rw-r--r--    1 ftp      ftp          3557 Oct 26  2022 CHANGELOG.txt
-rw-r--r--    1 ftp      ftp          3044 Oct 26  2022 README.txt
-rw-r--r--    1 ftp      ftp      7502502459 Oct 20  2022 accession_ids.csv
-rw-r--r--    1 ftp      ftp         13342 Oct 26  2022 download_metadata.json
lrwxrwxrwx    1 ftp      ftp             2 Oct 24  2022 latest -> v4
-rw-r--r--    1 ftp      ftp      99265486907 Oct 20  2022 sequences.fasta
drwxr-xr-x    2 ftp      ftp          1003 Oct 24  2022 v1
drwxr-xr-x    2 ftp      ftp          2394 Oct 24  2022 v2
drwxr-xr-x    2 ftp      ftp          2475 Oct 27  2022 v3
drwxr-xr-x    3 ftp      ftp          2507 Oct 27  2022 v4


'226 Directory send OK.'

In [None]:
#@title Navigate to the "v4" directory
ftp_server.cwd("v4")
# List the contents of the directory
ftp_server.retrlines('LIST')

-rw-r--r--    1 ftp      ftp      174073856 Oct 20  2022 UP000000429_85962_HELPY_v4.tar
-rw-r--r--    1 ftp      ftp      4428440576 Oct 20  2022 UP000000437_7955_DANRE_v4.tar
-rw-r--r--    1 ftp      ftp      205083136 Oct 20  2022 UP000000535_242231_NEIG1_v4.tar
-rw-r--r--    1 ftp      ftp      1031859200 Oct 20  2022 UP000000559_237561_CANAL_v4.tar
-rw-r--r--    1 ftp      ftp      183609856 Oct 20  2022 UP000000579_71421_HAEIN_v4.tar
-rw-r--r--    1 ftp      ftp      212448768 Oct 20  2022 UP000000586_171101_STRR6_v4.tar
-rw-r--r--    1 ftp      ftp      3794122752 Oct 20  2022 UP000000589_10090_MOUSE_v4.tar
-rw-r--r--    1 ftp      ftp      480047104 Oct 20  2022 UP000000625_83333_ECOLI_v4.tar
-rw-r--r--    1 ftp      ftp      183460864 Oct 20  2022 UP000000799_192222_CAMJE_v4.tar
-rw-r--r--    1 ftp      ftp      2324951552 Oct 20  2022 UP000000803_7227_DROME_v4.tar
-rw-r--r--    1 ftp      ftp      182347264 Oct 20  2022 UP000000805_243232_METJA_v4.tar
-rw-r--r--    1 ftp      

'226 Directory send OK.'

In [None]:
from ftplib import FTP
from google.colab import files


def download_file_from_ftp(remote_file_path, local_file_path):
    # Connect to the FTP server
    ftp = FTP("ftp.ebi.ac.uk")
    ftp.login(user="anonymous", passwd="anonymous")

    # Switch to passive mode
    ftp.set_pasv(True)

    # Increase the buffer size for potentially faster download
    buffer_size = 8192

    # Download the file
    with open(local_file_path, 'wb') as local_file:
        ftp.retrbinary('RETR ' + remote_file_path, local_file.write, buffer_size)

    # Close the FTP connection
    ftp.quit()

    # Provide a download link for the local file in Colab
    files.download(local_file_path)
    print(f"File downloaded to {local_file_path}")

# Call the function to download the file
download_file_from_ftp("/pub/databases/alphafold/CHANGELOG.txt", "CHANGELOG.txt")
download_file_from_ftp("/pub/databases/alphafold/README.txt", "README.txt")
download_file_from_ftp("/pub/databases/alphafold/v4/UP000000429_85962_HELPY_v4.tar", "UP000000429_85962_HELPY_v4.tar")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

File downloaded to CHANGELOG.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

File downloaded to README.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

File downloaded to UP000000429_85962_HELPY_v4.tar
