## Modelling multiple protein structures together


We use a combination of homology and alphafold (AI models) available in SWISSMODEL, the automated protein structure homology-modelling server, accessible via the Expasy web server. 

Directly taken from;

https://swissmodel.expasy.org/docs/help#modelling_api



Steps:

1. Obtain API token: Downloaded from account @ swissmodel and stored in file @ ~/.config/swissmodel/apikey.txt
2. Get all pdb ids and download the corresponding fasta sequences from RCSB


In [33]:
import requests
from tqdm.notebook import tqdm
import re
pdb_ids = ["7WSM", "7MYJ", "6AE3","2HFP" ,"2M76", "4EYW", "6KSI", "7NYK", "1TNF"]
pdb_fasta_dict = {}
for pdb_id in tqdm(pdb_ids):
    fasta_url = f"https://www.rcsb.org/fasta/entry/{pdb_id}"
    r = requests.get(fasta_url)
    records = r.text.strip().split('>')
    sequences = [ "".join(record.splitlines()[1:]) for record in records if record ]
    for sequence in sequences:
        sequence = re.sub(r'^>.*\n', '', sequence, flags=re.MULTILINE)
    pdb_fasta_dict[pdb_id] = sequences

    with open("all_sequences.fasta", "w") as f:
        for pdb_id, sequences in pdb_fasta_dict.items():
            for seq in sequences:
                f.write(f">{pdb_id}\n{seq}\n")
    

  0%|          | 0/9 [00:00<?, ?it/s]

In [35]:
import os

output_dir = "downloaded_models"
os.makedirs(output_dir, exist_ok=True)

for pdb in tqdm(pdb_ids, desc="Downloading PDB files"):
    url = f"https://files.rcsb.org/download/{pdb}.pdb"
    r = requests.get(url)
    if r.status_code == 200:
        file_path = os.path.join(output_dir, f"{pdb}_coordinates.pdb")
        with open(file_path, "w") as f:
            f.write(r.text)
    else:
        print(f"Failed to download {pdb}: status code {r.status_code}")

Downloading PDB files:   0%|          | 0/9 [00:00<?, ?it/s]

Failed to download 7NYK: status code 404
