# ABL1 ATP modeling

This notebook aims at generating ABL1 in complex with ATP by using functionalities from KinoML ([b252ff6]((https://github.com/openkinome/kinoml/commit/b252ff67a35b9149e704d51b7c838dcf1e9fb10f))). A first analysis of ATP bound kinase structures ([notebook](https://github.com/openkinome/study-abl-resistance/blob/master/notebooks/atp_kinase_conformations.ipynb)) revealed 17 PDB entries in the DFG in/aC helix in conformation with exactly two complexed MG2+ ions (4din, 6byr, 4wb6, 3q53, 4xbr, 4dh3, 3qam, 6no7, 3qal, 4wb8, 4wb5, 5dnr, 3x2w, 4x6r, 3x2v, 3x2u, 1rdq), which represents the physiologically most relevant active state of kinases. This notebook can be seen as a first attempt of automatize kinase-ATP complex generation, but definitly requires some attention. Especially, adjusting certain side chain rotamers could improve modeling.

## Content

- Select ABL1 structure
- Select ATP bound structure
 - Sequence identitiy
 - Sequence similarity
- Generate complex
- Prepare reference structure

In [1]:
from appdirs import user_cache_dir
from Bio import pairwise2
from Bio.Align import substitution_matrices
import klifs_utils
from openeye import oechem, oespruce

from kinoml.features.complexes import OEKLIFSKinaseHybridDockingFeaturizer
from kinoml.modeling.OEModeling import read_molecules, write_molecules, split_molecule_components, clashing_atoms, update_residue_identifiers, string_similarity, prepare_complex



## Select ABL1 protein structure

First we need to select an ABL1 structure resolved in the DFG in/aC helix in conformation.

In [2]:
# retrieve kinase structures
kinase_ids = klifs_utils.remote.kinases.kinase_names().kinase_ID.to_list()
kinase_df = klifs_utils.remote.structures.structures_from_kinase_ids(kinase_ids)
kinase_df = kinase_df[kinase_df.resolution > 0]  # remove NMR
print('Number of PDB entries:', len(set(kinase_df['pdb'])))
print('Number of KLIFS entries:', len(kinase_df))

Number of PDB entries: 5294
Number of KLIFS entries: 11408


In [3]:
# filter for ABL1 structures
abl1_klifs_kinase_id = 392
abl1_df = kinase_df[kinase_df.kinase_ID == abl1_klifs_kinase_id]
print('Number of PDB entries:', len(set(abl1_df['pdb'])))
print('Number of KLIFS entries:', len(abl1_df))

Number of PDB entries: 32
Number of KLIFS entries: 90


In [4]:
# filter for DFG in/aC helix in conformation
abl1_df = abl1_df[abl1_df.DFG == 'in']
abl1_df = abl1_df[abl1_df.aC_helix == 'in']
print('Number of PDB entries:', len(set(abl1_df['pdb'])))
print('Number of KLIFS entries:', len(abl1_df))

Number of PDB entries: 6
Number of KLIFS entries: 14


In [5]:
# pick highest quality structure
abl1_df = abl1_df.sort_values(by=['quality_score', 'resolution', 'chain', 'alt'],ascending=[False, True, True, True])
abl1_template = abl1_df.iloc[0]
abl1_template.pdb

'2f4j'

The PDB structure [2F4J](https://www.rcsb.org/structure/2f4j) is picked for modeling the ABL1 protein in the proper active conformation.

## Select ATP bound structure

Next, we need to select an ATP bound kinase structure to *transfer* the ATP and the MG2+ ions to the ABL1 structure selected above. This ATP bound structure should be as similar as possible to ABL1, which can be estimated via sequence identitity or similarity.

In [6]:
# filter for structures in the DFG in/aC helix in conformation and two MG2+ ions
valid_codes = ['4din', '6byr', '4wb6', '3q53', '4xbr', '4dh3', '3qam', '6no7', '3qal', '4wb8', '4wb5', '5dnr', '3x2w', '4x6r', '3x2v', '3x2u', '1rdq']
atp_df = kinase_df[kinase_df.pdb.isin(valid_codes)] 
print('Number of PDB entries:', len(set(atp_df['pdb'])))
print('Number of KLIFS entries:', len(atp_df))

Number of PDB entries: 17
Number of KLIFS entries: 30


In [7]:
# sort by quality to pick representative structure in next step
atp_df = atp_df.sort_values(by=['quality_score', 'resolution', 'chain', 'alt'],ascending=[False, True, True, True])
# keep entry with highest quality per PDB code
atp_df = atp_df.groupby('pdb').head(1)
print('Number of PDB entries:', len(set(atp_df['pdb'])))
print('Number of KLIFS entries:', len(atp_df))

Number of PDB entries: 17
Number of KLIFS entries: 17


### Sequence identitiy

In [8]:
# retrieve ABL1 information from KLIFS
abl1_details = klifs_utils.remote.kinases.kinases_from_kinase_ids(
            [abl1_klifs_kinase_id]
        ).iloc[0]

In [9]:
# calculate sequence identities
sequence_identities = {}
for index, kinase in atp_df.iterrows():
    sequence_identities[kinase.pdb] = string_similarity(kinase.pocket, abl1_details.pocket)

In [10]:
most_identicals = [key for key, value in sequence_identities.items() if value == max(sequence_identities.values())]
atp_template = atp_df[atp_df.pdb.isin(most_identicals)].iloc[0]
print(f"Highest identity: {max(sequence_identities.values())}")
print(f"PDB entry: {atp_template.pdb}")

Highest identity: 0.3764705882352941
PDB entry: 5dnr


### Sequence similarity

In [11]:
blosum62 = substitution_matrices.load("BLOSUM62")

In [12]:
sequence_similarities = {}
for index, kinase in atp_df.iterrows():
    sequence_similarities[kinase.pdb] = pairwise2.align.globalds(kinase.pocket, abl1_details.pocket, blosum62, -10, -10, score_only=True)

In [13]:
most_similars = [key for key, value in sequence_similarities.items() if value == max(sequence_similarities.values())]
atp_template = atp_df[atp_df.pdb.isin(most_similars)].iloc[0]
print(f"Highest similarity: {max(sequence_similarities.values())}")
print(f"PDB entry: {atp_template.pdb}")

Highest similarity: 179.0
PDB entry: 5dnr


Sequence identity and similarity measures suggest both [5DNR](https://www.rcsb.org/structure/5DNR) (Aurora A) as ligand template to tranfer ATP into the ABL1 structure.

## Generate complex

Next, both the protein and water from 2F4J will be combined with the ATP and MG2+ ions from 5DNR. Both proteins need to be superposed to allow the transfer. The `superpose_proteins` function from KinoML was slightly altered to allow for a more accurate superposition based on the ATP binding pockets.

In [14]:
def superpose_protein_sites(
    reference_protein: oechem.OEGraphMol, 
    fit_protein: oechem.OEGraphMol, 
    residues,
    chain_id,
    insertion_code=" "
) -> oechem.OEGraphMol:
    """
    Superpose a protein structure onto a reference protein.
    Parameters
    ----------
    reference_protein: oechem.OEGraphMol
        An OpenEye molecule holding a protein structure which will be used as reference during superposition.
    fit_protein: oechem.OEGraphMol
        An OpenEye molecule holding a protein structure which will be superposed onto the reference protein.
    Returns
    -------
    superposed_protein: oechem.OEGraphMol
        An OpenEye molecule holding the superposed protein structure.
    """
    # do not modify input
    superposed_protein = fit_protein.CreateCopy()

    # set superposition method
    options = oespruce.OESuperpositionOptions()
    options.SetSuperpositionType(oespruce.OESuperpositionType_Site)
    for residue in residues:
        options.AddSiteResidue(f"{residue[:3]}:{residue[3:]}:{insertion_code}:{chain_id}")

    # perform superposition
    superposition = oespruce.OEStructuralSuperposition(
        reference_protein, superposed_protein, options
    )
    superposition.Transform(superposed_protein)

    return superposed_protein

In [15]:
# load both structures
abl1_structure = read_molecules(user_cache_dir() + "/rcsb_2f4j.pdb")[0]
atp_structure = read_molecules(user_cache_dir() + "/rcsb_5dnr.pdb")[0]

In [16]:
# retrieve pocket residues for 2F4J from KLIFS
pocket_residues = klifs_utils.remote.coordinates.pocket.mol2_to_dataframe(abl1_template.structure_ID).subst_name.unique()

In [17]:
# superpose proteins
atp_structure_aligned = superpose_protein_sites(abl1_structure, atp_structure, pocket_residues, "A")

In [18]:
# write superposed structure for manual validation
write_molecules([atp_structure_aligned], user_cache_dir() + "5dnr_superposed_2f4j.pdb")

In [19]:
# prepare superposed ATP bound structure
atp_structure_aligned_du = prepare_complex(atp_structure_aligned)

In [20]:
# get ATP
atp = oechem.OEGraphMol()
atp_structure_aligned_du.GetLigand(atp)

True

In [21]:
# get magnesium
mg = atp_structure_aligned.CreateCopy()
for atom in mg.GetAtoms():
    residue = oechem.OEAtomGetResidue(atom)
    if residue.GetName().strip() != "MG" :
        mg.DeleteAtom(atom)
mg.NumAtoms()

2

In [22]:
# prepare ABL1 structure
abl1_structure_du = prepare_complex(abl1_structure, cap_termini=False)

In [23]:
# get ABL1 protein and solvent 
abl1, solvent = oechem.OEGraphMol(), oechem.OEGraphMol()
abl1_structure_du.GetProtein(abl1)
abl1_structure_du.GetSolvent(solvent)
solvent_molecules = split_molecule_components(solvent)

In [24]:
# process kinase domain
uniprot_id = "P00519"
featurizer = OEKLIFSKinaseHybridDockingFeaturizer()
abl1 = featurizer._process_kinase_domain(abl1, uniprot_id)

In [25]:
# assemble complex
abl1_atp_mg_water = oechem.OEGraphMol()
# add protein
oechem.OEAddMols(abl1_atp_mg_water, abl1)
# add atp
oechem.OEAddMols(abl1_atp_mg_water, atp)
# add MG
oechem.OEAddMols(abl1_atp_mg_water, mg)
# add water if not clashing with MG or ATP
for solvent_molecule in solvent_molecules:
    if not clashing_atoms(mg, solvent_molecule):
        if not clashing_atoms(atp, solvent_molecule):
            oechem.OEAddMols(abl1_atp_mg_water, solvent_molecule)
        else:
            print("Found clashing water!")
    else:
        print("Found clashing water!")

In [26]:
# store info in PDB header
oechem.OEClearPDBData(abl1_atp_mg_water)
oechem.OESetPDBData(abl1_atp_mg_water, "COMPND", f"\tProtein: ABL1")
oechem.OEAddPDBData(abl1_atp_mg_water, "COMPND", f"\tSolvent: Removed water clashing with ATP,MG,MG")
oechem.OEAddPDBData(abl1_atp_mg_water, "COMPND", f"\tLigand: ATP,MG,MG")
oechem.OEAddPDBData(abl1_atp_mg_water, "COMPND", f"\tKinase template: {abl1_template.pdb}")
oechem.OEAddPDBData(abl1_atp_mg_water, "COMPND", f"\tLigand template: {atp_template.pdb}")

True

In [27]:
# adjust protonation
oechem.OEPlaceHydrogens(abl1_atp_mg_water)

True

In [28]:
# update atom indeces etc
abl1_atp_mg_water = update_residue_identifiers(abl1_atp_mg_water)

In [29]:
# write ABL1 ATP complex
write_molecules([abl1_atp_mg_water], user_cache_dir() + "/ABL1_ATP_MG.pdb")

Looking at the generated complex, we can observe several small atom clashes between newly placed ATP and the protein. However, the overall quality looks good and observed atom clashes may relax upon energy minimization and MD simulation.

## Prepare reference structure

To have a realistic reference, we will also prepare the ATP template 5DNR. This way, we can compare the behavior of both complexes in MD simulations.

In [30]:
# prepare complex
atp_structure_du = prepare_complex(atp_structure, loop_db="~/.OpenEye/rcsb_spruce.loop_db")

In [31]:
# retrieve relevant components
protein, solvent, ligand, mg = oechem.OEGraphMol(), oechem.OEGraphMol(), oechem.OEGraphMol(), oechem.OEGraphMol()
atp_structure_du.GetProtein(protein)
atp_structure_du.GetSolvent(solvent)
atp_structure_du.GetLigand(ligand)
atp_structure_du.GetComponents(mg, oechem.OEDesignUnitComponents_Cofactors)

True

In [32]:
# combine components
prepared_complex = oechem.OEGraphMol()
oechem.OEAddMols(prepared_complex, protein)
oechem.OEAddMols(prepared_complex, ligand)
oechem.OEAddMols(prepared_complex, solvent)
oechem.OEAddMols(prepared_complex, mg)

([<oechem.OEAtomBase; proxy of <Swig Object of type 'OEChem::OEAtomBase *' at 0x7eff67903de0> >,
  <oechem.OEAtomBase; proxy of <Swig Object of type 'OEChem::OEAtomBase *' at 0x7eff67903600> >],
 [])

In [33]:
# update atom indices etc
prepared_complex = update_residue_identifiers(prepared_complex)

In [34]:
# write reference ATP complex
prepared_complex_path = user_cache_dir() + "/5dnr_prep.pdb"
write_molecules([prepared_complex], prepared_complex_path)