# ABL1 ATP modeling

This notebook aims at generating a structure of ABL1 in complex with ATP and two Mg2+ ions. The presented workflow could be used as a template to implement an automated pipeline into KinoML. The complexes generated by this notebook used functionalities from [OpenEye toolkit 2020.2.0](https://docs.eyesopen.com/toolkits/python/index.html) and [KinoML](https://github.com/openkinome/kinoml/tree/10a1937fa4f4fb48267451e90670f1f29ab37ce0). 

A visual analysis of ATP bound structures annotated in KLIFS revealed a common binding mode for ATP and 2 Mg2+ ions ([notebook for statistics](https://github.com/openkinome/study-abl-resistance/blob/master/notebooks/atp_kinase_conformations.ipynb)). Special kinases that miss critical residues for Mg2+ complexation (e.g., missing D of DFG motif) show either no magnesium bound or a likely more instable coordination. Thus, all kinase ATP complexes could be modeled in the same fashion. Kinases missing critical residues could be modeled with or without magnesium. This is also supported by others ([Roskoski 2015](https://doi.org/10.1016/j.phrs.2015.07.010)).

A prototypical structure that will be used as the template to model kinase ATP complexes is the 1.26 A resolution PDB entry [1RDQ](https://www.rcsb.org/structure/1RDQ), which constitutes of the [PRKACA](https://en.wikipedia.org/wiki/PRKACA) kinase, ATP, ADP, a phosphate and a substrate mimicing peptide inhibitor. It represents the structure with the lowest resolution and interestingly shows electron density for the ATP and the ADP-phosphate bound states. It also contains a mutation (T204A) distant from the active state that was shown to affect catalysis but does not show an effect on the static 3D structure ([article](http://dx.doi.org/10.1016/j.jmb.2003.11.044)).

## Content

- prepare ATP template structure
- determine critical sidechain dihedrals
- select ABL1 structure
- prepare ABL1 structure
- adjust critical sidechain dihedrals in ABL1
- superpose to ATP template structure
- assemble complex

## Code

In [1]:
from appdirs import user_cache_dir
from opencadd.databases.klifs import setup_remote
from openeye import oechem, oespruce

from kinoml.features.complexes import OEHybridDockingFeaturizer
from kinoml.modeling.OEModeling import (
    read_molecules, 
    write_molecules, 
    prepare_complex,
    update_residue_identifiers,
    select_chain,
    superpose_proteins,
    assign_caps,
    remove_non_protein
)
from kinoml.utils import LocalFileStorage, FileDownloader

### Prepare ATP template structure

First, the ATP bound structure 1RDQ will be prepared. This is a little hacky, since 1RDQ contains two phospho residues that prevent OESpruce from building missing loops. So a few manual steps need to be performed to correct the structure. Chain and alternate location need to be selected to pick the ATP bound conformation and to remove the substrate mimicking peptide inhibitor. Two phosphorylated residues are mutated back to the standard residues. A missing loop will be built. The real termini will not be capped. The prepared structure will be saved and can later be used for comparing the behavior in MD simulations.

In [2]:
# get details for 1RDQ from KLIFS
remote = setup_remote()
atp_template = remote.structures.by_structure_pdb_id("1rdq").iloc[0]
atp_template["structure.pdb_id"]

'1rdq'

In [3]:
# download and read ATP template structure
FileDownloader.rcsb_structure_pdb(atp_template["structure.pdb_id"])
atp_structure = read_molecules(LocalFileStorage.rcsb_structure_pdb(atp_template["structure.pdb_id"]))[0]

In [4]:
design_unit = prepare_complex(
    protein_ligand_complex=atp_structure,
    loop_db="/home/david/.OpenEye/rcsb_spruce.loop_db",
    ligand_name="ATP",
    cap_termini=True,
    real_termini=[1, 350])

In [5]:
# extract components
kinase_atp_complex = oechem.OEGraphMol()
design_unit.GetComponents(kinase_atp_complex, oechem.OEDesignUnitComponents_Default)

True

In [6]:
# deleting irrelevant chain I
kinase_atp_complex = select_chain(kinase_atp_complex, "E")

In [7]:
def mutate_residue(protein, old, resid, chain, new):
    hier_view = oechem.OEHierView(protein)
    hier_residue = hier_view.GetResidue(chain, old, resid)
    oe_residue = hier_residue.GetOEResidue()
    oespruce.OEMutateResidue(protein, oe_residue, new)
    for atom in protein.GetAtoms(oechem.OEHasResidueNumber(resid)):
        oe_residue = oechem.OEAtomGetResidue(atom)
        oe_residue.SetHetAtom(False)
        oechem.OEAtomSetResidue(atom, oe_residue)

In [8]:
# mutate phospho residues to standard amino acids
mutate_residue(kinase_atp_complex, "TPO", 197, "E", "THR")
mutate_residue(kinase_atp_complex, "SEP", 338, "E", "SER")
oechem.OEPlaceHydrogens(kinase_atp_complex)

True

In [9]:
# get residues for pdb file
prepared_atp_kinase = kinase_atp_complex.CreateCopy()
for atom in prepared_atp_kinase.GetAtoms():
    residue = oechem.OEAtomGetResidue(atom)
    if residue.GetName() == "ATP":
        prepared_atp_kinase.DeleteAtom(atom)

In [10]:
# get ATP for sdf file and later transfer
prepared_atp_ligand = kinase_atp_complex.CreateCopy()
for atom in prepared_atp_ligand.GetAtoms():
    residue = oechem.OEAtomGetResidue(atom)
    if residue.GetName() != "ATP":
        prepared_atp_ligand.DeleteAtom(atom)
prepared_atp_ligand.SetTitle("ATP")

True

In [11]:
# get MG ions for later transfer
mg_ions = kinase_atp_complex.CreateCopy()
for atom in mg_ions.GetAtoms():
    residue = oechem.OEAtomGetResidue(atom)
    if residue.GetName().strip() != "MG":
        mg_ions.DeleteAtom(atom)

In [12]:
# update atom indeces etc
prepared_atp_kinase = update_residue_identifiers(prepared_atp_kinase)

In [13]:
# write structure
write_molecules([prepared_atp_kinase], user_cache_dir() + f"/{atp_template['structure.pdb_id']}_mg_water.pdb")
write_molecules([prepared_atp_ligand], user_cache_dir() + f"/{atp_template['structure.pdb_id']}_atp.sdf")

### Determine critical sidechain dihedrals

The template for ATP modeling will next be used to analyze the dihedral angles of sidechains involved in Mg2+ complexation. Later, these angles will be used to adjust sidechains of the ABL1 structure to allow a more ideal complexation of transferred ATP and Mg2+ ions. Determining those residues is possible via the KLIFS pocket residue numbering scheme.

In [14]:
# retrieve pocket residues for 1RDQ from KLIFS
atp_pocket_resids = remote.pockets.by_structure_klifs_id(atp_template["structure.klifs_id"])

In [15]:
# collect sidechain dihedrals for KLIFS pocket residues 17, 24, 75, 81
optimal_sidechain_dihedrals = {}
klifs_pocket_numbers = [17, 24, 75, 81]
hierview = oechem.OEHierView(prepared_atp_kinase)
for hier_residue in hierview.GetResidues():
    residue_number = hier_residue.GetResidueNumber()
    if str(residue_number) in atp_pocket_resids["residue.id"].to_list():
        klifs_pocket_number = atp_pocket_resids[atp_pocket_resids["residue.id"] == str(residue_number)]["residue.klifs_id"].iloc[0]
    else:
        klifs_pocket_number = None
    if klifs_pocket_number in klifs_pocket_numbers:
        dihedrals = []
        for chi_id in range(5):
            if chi_id == 0:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi1)
            elif chi_id == 1:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi2)
            elif chi_id == 2:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi3)
            elif chi_id == 3:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi4)
            else:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi5)
            if dihedral == -100:
                break
            else:
                dihedrals.append(dihedral)
        optimal_sidechain_dihedrals[klifs_pocket_number] = dihedrals

In [16]:
optimal_sidechain_dihedrals

{17: [-3.0894843496272166,
  2.9511493094050247,
  -3.056783538767938,
  3.0444681535015468],
 24: [3.0754933404733698, 3.0389703051828976, 0.09488050760391245],
 75: [-1.2411798655255941, -0.4145175148625677],
 81: [-2.8715014240639856, -0.1262769908588875]}

### Select ABL1 structure

The PDB entry used for modeling the ABL1 ATP complex is determined by filtering KLIFS entries for the active kinase conformation, i.e., DFG in and aC helix in, and by selecting the highest quality structure in terms of resolution and KLIFS quality score. 

In [17]:
abl1_klifs_kinase_id = 392

In [18]:
# retrieve ABL1 kinase structures
abl1_df = remote.structures.by_kinase_klifs_id(abl1_klifs_kinase_id)
# remove NMR
abl1_df = abl1_df[abl1_df["structure.resolution"].notnull()]
# filter for DFG in/aC helix in conformation
abl1_df = abl1_df[abl1_df["structure.dfg"] == "in"]
abl1_df = abl1_df[abl1_df["structure.ac_helix"] == 'in']
# pick highest quality structure
abl1_df = abl1_df.sort_values(
    by=[
        "structure.qualityscore", 
        "structure.resolution", 
        "structure.chain", 
        "structure.alternate_model"
    ],
    ascending=[False, True, True, True])
abl1_template = abl1_df.iloc[0]
abl1_template["structure.pdb_id"]

'2f4j'

However, [2F4J](https://www.rcsb.org/structure/2F4J) shows a P-loop conformation that is unlikely to make favorable interactions with the phosphate groups of the ATP, which is crucial for proper ATP binding and catalysis ([Roskoski 2015](https://doi.org/10.1016/j.phrs.2015.07.010)). A manual inspection of the P-loop conformations of human ABL1 structures in the DFG in/aC helix in conformation found in [KLIFS](https://klifs.net/) resulted in the identification of [2V7A](https://www.rcsb.org/structure/2V7A), which will be used to generate the ABL1 ATP complex.

In [19]:
abl1_template = abl1_df[abl1_df["structure.pdb_id"] == "2v7a"].iloc[0]
abl1_template["structure.pdb_id"]

'2v7a'

### Prepare ABL1 structure

In [20]:
# download and read ABL1 template structure
FileDownloader.rcsb_structure_pdb(abl1_template["structure.pdb_id"])
abl1_structure = read_molecules(LocalFileStorage.rcsb_structure_pdb(abl1_template["structure.pdb_id"]))[0]

In [21]:
# prepare ABL1 structure
abl1_structure_du = prepare_complex(
    abl1_structure, 
    loop_db="/home/david/.OpenEye/rcsb_spruce.loop_db", 
    cap_termini=True,
    chain_id=abl1_template["structure.chain"]
)

In [22]:
# extract components
abl1_complex = oechem.OEGraphMol()
abl1_structure_du.GetComponents(abl1_complex, oechem.OEDesignUnitComponents_Default)

True

In [23]:
# select relevant chain
abl1_complex = select_chain(abl1_complex, abl1_template["structure.chain"])

In [24]:
mutate_residue(abl1_complex, "ILE", 315, abl1_template["structure.chain"], "THR")
mutate_residue(abl1_complex, "PTR", 393, abl1_template["structure.chain"], "TYR")

In [25]:
# OESpruce has problems capping the C terminus -> revisit later
for atom in abl1_complex.GetAtoms(oechem.OEAtomMatchResidue(["ILE:502:.*:.*:.*"])):
    abl1_complex.DeleteAtom(atom)
abl1_complex = assign_caps(abl1_complex)

### Adjust critical sidechain residues

Next, the critical dihedrals of the ABL1 structure will be adjusted to match the ATP template structure and subsequently prepared. **Note:** The selection for adjusting dihedrals is only based on residue numbers. This could lead to unexpected behavior if the structure contains multiple residues with the same residue number.

In [26]:
# retrieve pocket residues for ABL1 structure from KLIFS
abl1_pocket_resids = remote.pockets.by_structure_klifs_id(abl1_template["structure.klifs_id"])

In [27]:
# adjust sidechain dihedral for KLIFS pocket residues 17, 24, 75, 81
klifs_pocket_numbers = [17, 24, 75, 81]
hierview = oechem.OEHierView(abl1_complex)
for hier_residue in hierview.GetResidues():
    residue_number = hier_residue.GetResidueNumber()
    if str(residue_number) in abl1_pocket_resids["residue.id"].to_list():
        klifs_pocket_number = abl1_pocket_resids[abl1_pocket_resids["residue.id"] == str(residue_number)]["residue.klifs_id"].iloc[0]
    else:
        klifs_pocket_number = None
    if klifs_pocket_number in klifs_pocket_numbers:
        for index, dihedral in enumerate(optimal_sidechain_dihedrals[klifs_pocket_number]):
            if index == 0:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi1, dihedral)
            elif index == 1:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi2, dihedral)
            elif index == 2:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi3, dihedral)
            elif index == 3:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi4, dihedral)
            else:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi5, dihedral)

### Superpose to ATP template structure

Next, the ABL1 structure will be superimposed with the ATP bound structure 1RDQ to allow later transfer of ATP and Mg2+ ions.

In [28]:
# retrieve pocket residues for 1RDQ from KLIFS
pocket = remote.coordinates.to_dataframe(atp_template["structure.klifs_id"], entity="pocket")
pocket_residues = set(pocket["residue.name"] + pocket["residue.id"])

In [29]:
# superpose proteins
abl1_complex_superposed = superpose_proteins(
    prepared_atp_kinase, 
    abl1_complex, 
    pocket_residues, 
    "A"
)

### Assemble complex

Finally, we can assemble all components and store information in the PDB header. Solvent molecules will only be added if not clashing with ATP or Mg2+ ions. ATP will be saved seperately, which makes setting up MD simulation  easier.

In [30]:
# get abl1 protein and solvent
abl1_protein = remove_non_protein(abl1_complex_superposed, remove_water=True)
abl1_solvent = abl1_complex_superposed.CreateCopy()
for atom in abl1_solvent.GetAtoms():
    residue = oechem.OEAtomGetResidue(atom)
    if residue.GetName().strip() != "HOH":
        abl1_solvent.DeleteAtom(atom)

In [31]:
# assemble complex
abl1_atp_complex = oechem.OEGraphMol()
mg_atp = oechem.OEGraphMol()
# add protein
oechem.OEAddMols(abl1_atp_complex, abl1_protein)
# add atp
oechem.OEAddMols(abl1_atp_complex, prepared_atp_ligand)
oechem.OEAddMols(mg_atp, prepared_atp_ligand)
# add MG
oechem.OEAddMols(abl1_atp_complex, mg_ions)
oechem.OEAddMols(mg_atp, mg_ions)
# check water molecules for clashes with protein, ATP or Mg2+
filtered_solvent = OEHybridDockingFeaturizer._remove_clashing_water(abl1_solvent, mg_atp, abl1_protein)
oechem.OEAddMols(abl1_atp_complex, filtered_solvent)
print(f"Number of molecules before filtering: {int(abl1_solvent.NumAtoms() / 3)}")
print(f"Number of molecules after filtering: {int(filtered_solvent.NumAtoms() / 3)}")

Number of molecules before filtering: 117
Number of molecules after filtering: 101


In [32]:
# adjust protonation, dont flip ASN important for coordination of Mg2+ ions
options = oechem.OEPlaceHydrogensOptions()
options.SetBypassPredicate(oechem.OEAtomMatchResidue(["ASN:368:.*:.*:.*"]))
oechem.OEPlaceHydrogens(abl1_atp_complex, options)

True

In [33]:
# split complex for separate saving
abl1_mg_water = abl1_atp_complex.CreateCopy()
abl1_atp = abl1_atp_complex.CreateCopy()
for atom1, atom2 in zip(abl1_mg_water.GetAtoms(), abl1_atp.GetAtoms()):
    residue = oechem.OEAtomGetResidue(atom1)
    if residue.GetName().strip() == "ATP" :
        abl1_mg_water.DeleteAtom(atom1)
    else:
        abl1_atp.DeleteAtom(atom2)
abl1_atp.SetTitle("ATP")

True

In [34]:
# update atom indeces etc
abl1_mg_water = update_residue_identifiers(abl1_mg_water)

In [35]:
# store info in PDB header
oechem.OEClearPDBData(abl1_mg_water)
oechem.OESetPDBData(abl1_mg_water, "COMPND", f"\tProtein: ABL1")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tLigand: MG,MG")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tKinase template: {abl1_template['structure.pdb_id']}")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tLigand template: {atp_template['structure.pdb_id']}")

True

In [36]:
# write ABL1 ATP complex
write_molecules([abl1_mg_water], user_cache_dir() + f"/{abl1_template['structure.pdb_id']}_mg_water.pdb")
write_molecules([abl1_atp], user_cache_dir() + f"/{abl1_template['structure.pdb_id']}_atp.sdf")