<a href="https://colab.research.google.com/github/potterton48/Notebooks/blob/main/TPD_Workshop2_Ternary_Structure_Property_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workshop 2: Ternary Structure Property Analysis

The idea of this workshop is to calculate properties of molecular glue/bifunctional ternary complexes from the PDB.

We will do this in 4 steps (of increasing difficulty).:
1. Rerun the existing code to calculate interface surface area of a molecular glue interface (7LPS - CRBN-IKZF2).
2. Add some code to calculate surface area without the molecular glue (purely protein-protein interface area).
3. Rerun code on your own protein of interest.
4. (Extension) Calculate more properties (such as hydrogen bond count) using your own code.

Before we start, we need to install a few python packages: Biopython for analysing PDB structures and py3Dmol for visualising them.

In [None]:
# Package Installation
!pip install biopython
!pip install py3Dmol

Import the relevant python packages and functions.

In [None]:
# Package imports
import warnings

import py3Dmol
from Bio.PDB import PDBParser, PDBList, Structure, Select, PDBIO, Chain
from Bio.PDB.SASA import ShrakeRupley
from Bio.PDB.PDBExceptions import PDBConstructionWarning

# enable py3Dmol in colab notebook
from google.colab import output
output.enable_custom_widget_manager()

We will now download a crystallograhic structure (PDBID 7LPS) of a protein-protein complex (CRBN-IKZF2) strengthened by a molecular glue.

In [None]:
# Download PDBs
pdb_id = '7LPS'
pdbl = PDBList()
fname = pdbl.retrieve_pdb_file(pdb_id, file_format='pdb')

Load this structure as a Biopython structure.

In [None]:
# load pdb
def read_structure(fname: str) -> Structure:
    """ Reads PDB file and returns biopython structure object
    Args:
        fname: path and filename of pdb file
    Returns:
        Biopython structure object
    """
    parser = PDBParser()
    # silence irrelevant warnings
    warnings.simplefilter("ignore", PDBConstructionWarning)
    return parser.get_structure("struc", fname)

structure = read_structure(fname)

We can visualise the 3D structure using py3Dmol

In [None]:
def visualise_protein(fname: str, structure: Structure, ligand_resname: str) -> None:
    """ Get a 3D visualisation of the protein with the protein as cartoon (coloured by chain) and ligand as orange sticks
    Args:
        fname: filename of the protein viewed
        structure: biopython structure object
        ligand resname: technically can be left as a blank string. 3 letter HET resname to visualise as stick.
    """
    viewer = py3Dmol.view(width=1000, height=800)
    with open(fname) as ifile:
        system = "".join([x for x in ifile])
    viewer.addModelsAsFrames(system)
    viewer.setBackgroundColor('white')
    colours = [
    "#FF5733", "#33FF57", "#3357FF", "#F0A202", "#029FAD", "#9C33FF", "#FF33A8", "#33FFD7", "#FFB533", "#FF5733",
    "#73FF33", "#3333FF", "#FFD700", "#FF4500", "#DA70D6", "#ADFF2F", "#7FFF00", "#6495ED", "#8A2BE2", "#FF69B4",
    "#FFDAB9", "#40E0D0", "#FF6347", "#F08080", "#9370DB", "#4169E1", "#D2691E", "#20B2AA", "#708090", "#FF1493",
    "#228B22"
    ]

    for idx, chain in enumerate(structure.get_chains()):
        chain_id = chain.id
        chain_colour = colours[idx]
        viewer.setStyle({'chain': chain_id}, {'cartoon': {'color': chain_colour}})
        viewer.addLabel(chain_id, {'fontColor': 'black', 'backgroundColor': 'lightgray'}, {'chain': chain_id})

    # colour the molecular glue/PROTAC and select by distance to the glue
    selection = {'resn': ligand_resname, 'byres': 'true', 'expand': 2}  # 2 ang, so anything covalently bound to it
    # set styles
    viewer.setStyle(selection,{'stick': {'colorscheme': 'orangeCarbon'}})

    viewer.zoomTo()
    viewer.animate({'loop': "forward"})
    viewer.show()

visualise_protein(fname, structure, ligand_resname='RN9')

**A few of the chains are not on the molecular glue induced PPI. From the rendering above, decide which chains to keep**

In [None]:
def write_pdb(struct: Structure, pdb_fname: str, select_class: classmethod | None) -> None:
    """
    Writes a PDB file from a Biopython Structure object

    Args:
        select_class: Biopython select class, which will be used to modify saved output
        struct (Bio.PDB.Structure): input Biopython Structure object
        pdb_fname (str): path to the output PDB file
    """
    io = PDBIO()
    io.set_structure(struct)
    if select_class:
        io.save(pdb_fname, select=select_class)
    else:
        io.save(pdb_fname)


class ChainSelect(Select):
    def __init__(self, selected_chains: list[str]):
        self.selected_chains = selected_chains

    def accept_chain(self, chain):
        if chain.id in self.selected_chains:
            return 1
        else:
            return 0

def chain_remover(structure: Structure, chains_to_keep: list[str]) -> str:
    """

    """
    chain_saver = ChainSelect(selected_chains=chains_to_keep)
    filename = f'{pdb_id}_selected_chains_{"".join(chains_to_keep)}.pdb'
    write_pdb(
        struct=structure, pdb_fname=filename,
        select_class=chain_saver,
    )
    return filename

chain_ids_to_keep = ['B', 'C']
selected_structure_fname = chain_remover(structure, chain_ids_to_keep)
selected_structure = read_structure(selected_structure_fname)

**Now when looking at the protein, you can see they are only two chains present.**

In [None]:
visualise_protein(selected_structure_fname, selected_structure, 'RN9')

**We can remove water molecules from the structure.**

In [None]:
class ResidueRemover(Select):
    def __init__(self, selected_resnames: list[str]):
        self.selected_resnames = selected_resnames

    def accept_residue(self, residue):
        if residue.get_resname() in self.selected_resnames:
            return 0
        else:
            return 1

def remove_residues(structure: Structure, residues_to_delete: list[str]) -> str:
    """ Removes specified residues (by residue names) and creates a new PDB file without those residues.
    Args:
        structure: Input Biopython Structure Object
        residues_to_delete: List of residue name strings to delete. E.g. ['HOH', 'CLR']
    Returns:
        Name of PDB file without specified residues.
    """
    residue_remover = ResidueRemover(selected_resnames=residues_to_delete)
    filename = f'{pdb_id}_removed_{"".join(residues_to_delete)}.pdb'
    write_pdb(
        struct=structure, pdb_fname=filename,
        select_class=residue_remover,
    )
    return filename

no_water_fname = remove_residues(selected_structure, 'HOH')
no_water_structure = read_structure(no_water_fname)

**You can work out interface surface area by substracting the SASA of the individual chain from the complex SASA.**

In [None]:
def calculate_sasa(structure: Structure) -> dict[str, float]:
    """ Calculate solvent accessable surface area (SASA) using Shrake-Rupley algorithm of each chain in complex
    Args:
        structure: biopython structure object of protein
    Returns:
        Dictionary of results, key Chain, value SASA (A^2)
    """
    sr = ShrakeRupley()
    sr.compute(structure, level="C")
    sasa_dict = {}  # chain_id : sasa
    for chain in structure.get_chains():
        n_res = len(list(chain.get_residues()))
        sasa_dict[chain.id] = chain.sasa
        print(f"Chain {chain.id} ({n_res} residues), with overall SASA {int(chain.sasa)} A^2")
    return sasa_dict

print('Complex SASA:')
complex_sasa_dict = calculate_sasa(no_water_structure)

**Calculate solvent accessable surface area (SASA) using Shrake-Rupley algorithm of one of the chains separately (ie not in complex)**

In [None]:
one_chain = 'C' # pick one of the two chains
one_chain_fname = chain_remover(structure, one_chain)
single_chain_structure = read_structure(one_chain_fname)
print('Non Complex SASA:')
sasa_dict = calculate_sasa(single_chain_structure)

In [None]:
print(f'Interface surface area is {round(sasa_dict[one_chain] - complex_sasa_dict[one_chain], 2)} A^2')

## Self-guided exercises

Try calculating a few more metrics yourself:
1. Recalculate the interface area of CRBN-IKZF2 complex but without the glue.
2. Try the above with another molecular glue/PROTAC PDB
3. A contact map between the two proteins or the two proteins and the molecular glue
4. Count the number of hydrogen bonds

Clues for how to do it can be found in the documentation of Biopython.