# ABL1 ATP modeling

This notebook aims at generating a structure of ABL1 in complex with ATP and two Mg2+ ions. The presented workflow could be used as a template to implement an automated pipeline into KinoML.

A visual analysis of ATP bound structures annotated in KLIFS revealed a common binding mode for ATP and 2 Mg2+ ions ([notebook for statistics](https://github.com/openkinome/study-abl-resistance/blob/master/notebooks/atp_kinase_conformations.ipynb)). Special kinases that miss critical residues for Mg2+ complexation (e.g., missing D of DFG motif) show either no magnesium bound or a likely more instable coordination. Thus, all kinase ATP complexes could be modeled in the same fashion. Kinases missing critical residues could be modeled with or without magnesium.

A prototypical structure that will be used as the template to model kinase ATP complexes is the 1.26 A resolution PDB entry [1RDQ](https://www.rcsb.org/structure/1RDQ), which constitutes of the [PRKACA](https://en.wikipedia.org/wiki/PRKACA) kinase, ATP, ADP, a phosphate and a substrate mimicing peptide inhibitor. It represents the structure with the lowest resolution and interestingly shows electron density for the ATP and the ADP-phosphate bound states. It also contains a mutation (T204A) distant from the active state that was shown to affect catalysis but does not show an effect on the static 3D structure ([article](http://dx.doi.org/10.1016/j.jmb.2003.11.044)).

## Content

- select ABL1 structure
- superpose ATP template structure
- prepare ATP template structure
- determin critical sidechain dihedrals
- adjust critical sidechain dihedrals in ABL1
- prepare ABL1 structure
- assemble complex

## Code

In [1]:
from appdirs import user_cache_dir
import klifs_utils
from openeye import oechem, oespruce

from kinoml.features.complexes import OEKLIFSKinaseHybridDockingFeaturizer
from kinoml.modeling.OEModeling import read_molecules, write_molecules, select_altloc, select_chain, split_molecule_components, clashing_atoms, update_residue_identifiers, prepare_complex



### Select ABL1 structure

The PDB entry used for modeling the ABL1 ATP complex is determined by filtering KLIFS entries for the active kinase conformation, i.e., DFG in and aC helix in, and by selecting the highest quality structure in terms of resolution and KLIFS quality score. 

In [2]:
abl1_klifs_kinase_id = 392

In [3]:
# retrieve ABL1 kinase structures
abl1_df = klifs_utils.remote.structures.structures_from_kinase_ids(392)
# remove NMR
abl1_df = abl1_df[abl1_df.resolution > 0]
# filter for DFG in/aC helix in conformation
abl1_df = abl1_df[abl1_df.DFG == 'in']
abl1_df = abl1_df[abl1_df.aC_helix == 'in']
# pick highest quality structure
abl1_df = abl1_df.sort_values(by=['quality_score', 'resolution', 'chain', 'alt'],ascending=[False, True, True, True])
abl1_template = abl1_df.iloc[0]
abl1_template.pdb

'2f4j'

### Superpose ABL1 structure

Next, the ATP bound structure 1RDQ will be superimposed over the ABL1 structure to allow later transfer of ATP and Mg2+ ions. The `superpose_proteins` function from KinoML is slightly improved to allow a more accurate superposition based on the ATP binding pocket residues. **Note:** If the ABL1 structure would contain multiple chains and alternate locations, a selection would be neccessary before superpositon. 

In [4]:
# altered KinoML function
def superpose_protein_sites(
    reference_protein: oechem.OEGraphMol, 
    fit_protein: oechem.OEGraphMol, 
    residues,
    chain_id,
    insertion_code=" "
) -> oechem.OEGraphMol:
    """
    Superpose a protein structure onto a reference protein.
    Parameters
    ----------
    reference_protein: oechem.OEGraphMol
        An OpenEye molecule holding a protein structure which will be used as reference during superposition.
    fit_protein: oechem.OEGraphMol
        An OpenEye molecule holding a protein structure which will be superposed onto the reference protein.
    Returns
    -------
    superposed_protein: oechem.OEGraphMol
        An OpenEye molecule holding the superposed protein structure.
    """
    # do not modify input
    superposed_protein = fit_protein.CreateCopy()

    # set superposition method
    options = oespruce.OESuperpositionOptions()
    options.SetSuperpositionType(oespruce.OESuperpositionType_Site)
    for residue in residues:
        options.AddSiteResidue(f"{residue[:3]}:{residue[3:]}:{insertion_code}:{chain_id}")

    # perform superposition
    superposition = oespruce.OEStructuralSuperposition(
        reference_protein, superposed_protein, options
    )
    superposition.Transform(superposed_protein)

    return superposed_protein

In [5]:
# get details for 1RDQ from KLIFS
atp_template = klifs_utils.remote.structures.structures_from_structure_ids(5927).iloc[0]
atp_template.pdb

'1rdq'

In [6]:
# load structures
abl1_structure = read_molecules(user_cache_dir() + f"/rcsb_{abl1_template.pdb}.pdb")[0]
atp_structure = read_molecules(user_cache_dir() + f"/rcsb_{atp_template.pdb}.pdb")[0]

In [7]:
# retrieve pocket residues for 2F4J from KLIFS
pocket_residues = klifs_utils.remote.coordinates.pocket.mol2_to_dataframe(abl1_template.structure_ID).subst_name.unique()

In [8]:
# superpose proteins
atp_structure_superposed = superpose_protein_sites(abl1_structure, atp_structure, pocket_residues, "A")

## Prepare ATP template structure

After superposition, the ATP bound structure 1RDQ will be prepared. Chain and alternate location need to be selected to pick the ATP bound conformation and to remove the substrate mimicking peptide inhibitor. The real termini will not be capped. The prepared structure will be saved and can later be used for comparing the behavior in MD simulations.

In [9]:
# select alternate location B, which contains ATP
atp_structure_superposed = select_altloc(atp_structure_superposed, atp_template.alt)

In [10]:
# select chain E, since chain I contains the substrate mimicing inhibitor
atp_structure_superposed = select_chain(atp_structure_superposed, atp_template.chain)

In [11]:
# prepare complex
atp_structure_du = prepare_complex(atp_structure_superposed, loop_db="~/.OpenEye/rcsb_spruce.loop_db", real_termini=[1, 350])

In [12]:
# extract components
atp_protein, atp_solvent, atp_ligand, atp_mg = oechem.OEGraphMol(), oechem.OEGraphMol(), oechem.OEGraphMol(), oechem.OEGraphMol()
atp_structure_du.GetProtein(atp_protein)
atp_structure_du.GetSolvent(atp_solvent)
atp_structure_du.GetLigand(atp_ligand)
atp_ligand.SetTitle("ATP")
atp_structure_du.GetComponents(atp_mg, oechem.OEDesignUnitComponents_Cofactors)

True

In [13]:
# combine components
reference_mg_water = oechem.OEGraphMol()
oechem.OEAddMols(reference_mg_water, atp_protein)
oechem.OEAddMols(reference_mg_water, atp_solvent)
oechem.OEAddMols(reference_mg_water, atp_mg)

([<oechem.OEAtomBase; proxy of <Swig Object of type 'OEChem::OEAtomBase *' at 0x7f11f3be9ba0> >,
  <oechem.OEAtomBase; proxy of <Swig Object of type 'OEChem::OEAtomBase *' at 0x7f11f3be9510> >],
 [])

In [14]:
# update atom indeces etc
reference_mg_water = update_residue_identifiers(reference_mg_water)

In [15]:
# write structure
write_molecules([reference_mg_water], user_cache_dir() + f"/{atp_template.pdb}_mg_water.pdb")
write_molecules([atp_ligand], user_cache_dir() + f"/{atp_template.pdb}_atp.sdf")

## Determine critical sidechain dihedrals

The template for ATP modeling will next be used to analyze the dihedral angles of sidechains involved in Mg2+ complexation. Later, these angles will be used to adjust sidechains of the ABL1 structure to allow a more ideal complexation of transferred ATP and Mg2+ ions. Determining those residues is possible via the KLIFS pocket residue numbering scheme.

In [16]:
# retrieve pocket residues for 1RDQ from KLIFS
atp_pocket_resids = klifs_utils.remote.interactions.klifs_pocket_numbering_from_structure_id(atp_template.structure_ID)

In [17]:
# collect sidechain dihedrals for KLIFS pocket residues 17, 75, 81
optimal_sidechain_dihedrals = {}
klifs_pocket_numbers = [17, 75, 81]
hierview = oechem.OEHierView(atp_protein)
for hier_residue in hierview.GetResidues():
    residue_number = hier_residue.GetResidueNumber()
    if str(residue_number) in atp_pocket_resids.Xray_position.to_list():
        klifs_pocket_number = atp_pocket_resids[atp_pocket_resids.Xray_position == str(residue_number)]["index"].iloc[0]
    else:
        klifs_pocket_number = None
    if klifs_pocket_number in klifs_pocket_numbers:
        dihedrals = []
        for chi_id in range(5):
            if chi_id == 0:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi1)
            elif chi_id == 1:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi2)
            elif chi_id == 2:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi3)
            elif chi_id == 3:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi4)
            else:
                dihedral = oechem.OEGetTorsion(hier_residue, oechem.OEProtTorType_Chi5)
            if dihedral == -100:
                break
            else:
                dihedrals.append(dihedral)
        optimal_sidechain_dihedrals[klifs_pocket_number] = dihedrals

In [18]:
optimal_sidechain_dihedrals

{17: [-3.0894830477344044,
  2.9511492791080447,
  -3.056785538585207,
  3.044467978268652],
 75: [-1.241179197859294, -0.414521490477942],
 81: [-2.8715000197845297, -0.12627632517047363]}

## Adjust critical sidechain residues

Next, the critical dihedrals of the ABL1 structure will be adjusted to match the ATP template structure and subsequently prepared. **Note:** The selection for adjusting dihedrals is only based on residue numbers. This could lead to unexpected behavior if the structure contains mutliple residues with the same residue number.

In [19]:
# retrieve pocket residues for ABL1 structure from KLIFS
abl1_pocket_resids = klifs_utils.remote.interactions.klifs_pocket_numbering_from_structure_id(abl1_template.structure_ID)

In [20]:
# adjust sidechain dihedral for KLIFS pocket residues 17, 75, 81
klifs_pocket_numbers = [17, 75, 81]
hierview = oechem.OEHierView(abl1_structure)
for hier_residue in hierview.GetResidues():
    residue_number = hier_residue.GetResidueNumber()
    if str(residue_number) in abl1_pocket_resids.Xray_position.to_list():
        klifs_pocket_number = abl1_pocket_resids[abl1_pocket_resids.Xray_position == str(residue_number)]["index"].iloc[0]
    else:
        klifs_pocket_number = None
    if klifs_pocket_number in klifs_pocket_numbers:
        for index, dihedral in enumerate(optimal_sidechain_dihedrals[klifs_pocket_number]):
            if index == 0:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi1, dihedral)
            elif index == 1:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi2, dihedral)
            elif index == 2:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi3, dihedral)
            elif index == 3:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi4, dihedral)
            else:
                oechem.OESetTorsion(hier_residue, oechem.OEProtTorType_Chi5, dihedral)

## Prepare ABL1 structure

In [21]:
# prepare ABL1 structure
abl1_structure_du = prepare_complex(abl1_structure, loop_db="/home/david/.OpenEye/rcsb_spruce.loop_db", cap_termini=False)

In [22]:
# get relevant components
abl1_protein, abl1_solvent = oechem.OEGraphMol(), oechem.OEGraphMol()
abl1_structure_du.GetProtein(abl1_protein)
abl1_structure_du.GetSolvent(abl1_solvent)
abl1_solvent_molecules = split_molecule_components(abl1_solvent)

In [23]:
# process kinase domain
uniprot_id = "P00519"
featurizer = OEKLIFSKinaseHybridDockingFeaturizer()
abl1_protein = featurizer._process_kinase_domain(abl1_protein, uniprot_id)

## Assemble complex

Finally, we can assemble all components and store information in the PDB header. Solvent molecules will only be added if not clashing with ATP or Mg2+ ions. ATP will be saved seperately, which makes setting up MD simulation  easier.

In [24]:
# assemble complex
abl1_atp_complex = oechem.OEGraphMol()
# add protein
oechem.OEAddMols(abl1_atp_complex, abl1_protein)
# add atp
oechem.OEAddMols(abl1_atp_complex, atp_ligand)
# add MG
oechem.OEAddMols(abl1_atp_complex, atp_mg)
# add water if not clashing with protein, ATP or Mg2+
for solvent_molecule in abl1_solvent_molecules:
    if not clashing_atoms(atp_ligand, solvent_molecule):
        if not clashing_atoms(atp_mg, solvent_molecule):
            oechem.OEAddMols(abl1_atp_complex, solvent_molecule)
        else:
            print("Found clashing water!")
    else:
        print("Found clashing water!")

Found clashing water!


In [25]:
# adjust protonation
oechem.OEPlaceHydrogens(abl1_atp_complex)

True

In [26]:
# split complex for separate saving
abl1_mg_water = abl1_atp_complex.CreateCopy()
abl1_atp = abl1_atp_complex.CreateCopy()
for atom1, atom2 in zip(abl1_mg_water.GetAtoms(), abl1_atp.GetAtoms()):
    residue = oechem.OEAtomGetResidue(atom1)
    if residue.GetName().strip() == "ATP" :
        abl1_mg_water.DeleteAtom(atom1)
    else:
        abl1_atp.DeleteAtom(atom2)
abl1_atp.SetTitle("ATP")

True

In [27]:
# update atom indeces etc
abl1_mg_water = update_residue_identifiers(abl1_mg_water)

In [28]:
# store info in PDB header
oechem.OEClearPDBData(abl1_mg_water)
oechem.OESetPDBData(abl1_mg_water, "COMPND", f"\tProtein: ABL1")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tSolvent: Removed water clashing with ATP,MG,MG")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tLigand: MG,MG")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tKinase template: {abl1_template.pdb}")
oechem.OEAddPDBData(abl1_mg_water, "COMPND", f"\tLigand template: {atp_template.pdb}")

True

In [29]:
# write ABL1 ATP complex
write_molecules([abl1_mg_water], user_cache_dir() + "/ABL1_mg_water.pdb")
write_molecules([abl1_atp], user_cache_dir() + "/ABL1_atp.sdf")

The complex looks quite good. Only three minor clashes between ATP and protein atoms were observed.