## Download Chothia renumbered PDB files from SAbDaB

SAbDaB provides a webservice to download Chothia-renumbered PDB files at the following URL

In [1]:
PDB_URL = "https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/pdb/{}/?scheme=chothia"

### Create function to download PDB file 

Create the URL to download the PDB file for pdb_id '9ds2'

In [2]:
pdb_id = '9ds2'

url = PDB_URL.format(pdb_id)

url

'https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/pdb/9ds2/?scheme=chothia'

To download files from the internet, python provides the `requests` library.

The function `requests.get(URL)` returns a response object with attributes
- status_code
- content

As many things can go wrong when one tries to download files from the internet, it is important to check the `status_code`. A value of 200 means success. Any other value means something went wrong.

In [3]:
import requests

response = requests.get(url)

response.status_code

200

Look at the first 100 characters of the `content` attribute

In [4]:
response.content[0:100]

b'REMARK   5 CHOTHIA RENUMBERED STRUCTURE 9DS2 GENERATED BY SABDAB\nREMARK   5 ANTIBODY CHAINS ARE RENU'

Create the filename using pattern '{PDB_DIR}/{pdb_id}_chothia.pdb' and write the contents to that file.

You need to create the directory first.

In [9]:
PDB_DIR  = '../data/pdbs'

filename = f'{PDB_DIR}/{pdb_id}_chothia.pdb'
filename
# write contents to file
with open(filename, "wb") as fh:
    fh.write(response.content)

Define a function `download_pdb(pdb_id)` that does all the steps above

- create url from PDB_URL template
- create filename
- if file exists, return filename (this allows us to rerun to add failed files)
- download data
- check status_code. If status_code == 200, 
    - write content
    - return filename
- otherwise, 
    - print an error message stating that the pdb_id could not be downloaded
    - return None (to indicate something went wrong)  


In [26]:
import os.path

def download_pdb(pdb_id):

    url = PDB_URL.format(pdb_id)
    filename = f'{PDB_DIR}/{pdb_id}_chothia.pdb'
    if os.path.exists(filename):
        return filename
    response = requests.get(url)
    if response.status_code == 200 and len(response.content) > 100:
        with open(filename, "wb") as fh:
            fh.write(response.content)
        return filename
    else:
        print(f"Failed to download {pdb_id}")
        return None



In [27]:
download_pdb('7six')

Failed to download 7six


### Download PDB files for unique pdb_ids

Note: This download will be a few GB

- Load 'ab_ag.tsv' into a pandas DataFrame
- retrieve unique pdb_ids
- loop over unique pdb_ids and download the PDB files

In [28]:
import pandas as pd

SUMMARY_FILE = '../data/ab_ag.tsv'

df = pd.read_csv(SUMMARY_FILE, sep='\t')
df

unique_pdb_ids = df['pdb'].unique()

for pdb_id in unique_pdb_ids:
    download_pdb(pdb_id)


Failed to download 7six
Failed to download 7cu4
Failed to download 6erx
Failed to download 7ny5
Failed to download 7evw
Failed to download 5usi
Failed to download 7ctu
Failed to download 5uwe
Failed to download 3l5y
Failed to download 3sm5
Failed to download 6mf7
Failed to download 7n01
