# Kinase domain modeling

This notebook shows how to generate a complex of a ligand and kinase of interest by providing a smiles and a Uniprot ID.

## Content

1. Select KLIFS structure
2. Prepare structure
3. Dock ligand

In [1]:
%load_ext autoreload
%autoreload 2

## 1. Select KLIFS structure

With the following lines the KLIFS database will be queried for kinases matching the specified uniprot ID.

### 1.1 Workflow

- [x] query KLIFS by uniport ID 
  - takes quite long, since no direct way for this information
- [x] search for orthosteric ligands  
- [x] if no orthosteric ligand:  
  - [x] return the structure with the best quality
- [x] else:  
  - [x] retrieve conformations of co-crystalized ligands
  - [x] prepare ligand of interest and generate conformations
  - [x] score shape overlay of ligand conformations with co-crystalized ligands
  - [x] select structures with highest TanimotoCombo of overlay
    - acceptable scores are >= best_TanimotoScore - 0.1
  - [x] return the structure with the best quality

In this example larotrectinib should be docked into NTRK1.

In [2]:
from kinoml.modeling.OpenEyeModeling import select_structure

In [3]:
smiles = "O=C(Nc1cnn2ccc(N3CCC[C@@H]3c3cc(F)ccc3F)nc12)N1CC[C@H](O)C1"
uniprot_id = "P04629"

In [4]:
kinase = select_structure(uniprot_id, smiles)
kinase

structure_ID                                                      3620
kinase                                                            TRKA
species                                                          Human
kinase_ID                                                          480
pdb                                                               4yne
alt                                                                  A
chain                                                                A
rmsd1                                                            0.834
rmsd2                                                            2.203
pocket               WELGEGAFGKVFLVAVKALDFQREAELLTMLQQHIVRFFGVLMVFE...
resolution                                                        2.02
quality_score                                                        8
missing_residues                                                     0
missing_atoms                                                        0
ligand

This structure contains a substructure that matches larotrectinib and was also chosen before in a manual selection.

## 2. Prepare structure

With the following lines a kinase structure will be prepared by building missing loops, cropping non kinase domain sequences, mutating back to the wildtype sequence and renumbering of residues.

### 2.1 Workflow

1. Prepare a structure from PDB
- [x] retrieve structure    
- [] loop modeling
  - OESpruce sometimes misses to add hydrogens to backbone nitrogens (workaround)
  - sp2 like hybridization of carbon atoms with 4 partners and less that 2 hydrogen atoms 
  - doesn't properly connect modeled loops with resolved structure (workaround)
- [x] protonation  
- [x] no capping  (workaroud for missing OXT atoms, which are not added in OESpruce 1.1.0)
- [x] select design unit based on Iridium  
- [x] receive kinase domain sequence from Uniprot
- [x] mutate or delete wrong residues according to Uniprot kinase domain sequence
  - needs another protonation step, which adds hydrogen to backbone atoms that were not properly connected (workaround)
- [x] renumbering  
- [] cap termini with OESpruce if not real termini  
  - N terminus has only 1 hydrogen of not capped
- [x] write prepared structure

In [5]:
pdb_id = kinase.pdb
pdb_id

'4yne'

In [6]:
from appdirs import user_cache_dir
from Bio import pairwise2
from kinoml.core.sequences import KinaseDomainAminoAcidSequence
from kinoml.modeling.OpenEyeModeling import has_ligand, read_molecules, read_electron_density, prepare_complex, prepare_protein, write_molecules
from kinoml.utils import download_file
from openeye import oechem, oespruce

In [7]:
# download files
download_file(f"https://files.rcsb.org/download/{pdb_id}.pdb", f"{user_cache_dir()}/{pdb_id}.pdb")
structure = read_molecules(f"{user_cache_dir()}/{pdb_id}.pdb")[0]
download_file(f"https://edmaps.rcsb.org/coefficients/{pdb_id}.mtz", f"{user_cache_dir()}/{pdb_id}.mtz")
electron_density = read_electron_density(f"{user_cache_dir()}/{pdb_id}.mtz")

In [8]:
# loop building, protonation, no capping at this point
protein, ligand = prepare_complex(structure, 
                                  electron_density=electron_density, 
                                  loop_db="~/.OpenEye/rcsb_spruce.loop_db",
                                  cap_termini=False)

In [9]:
write_molecules([protein], f"{user_cache_dir()}/{pdb_id}_prep.pdb")

Bugs in OESpruce lead to missing hydrogens, flat geometries and not connected loops. With the following code, we can see that the 'N' atoms before the modeled loops have less than three bonds.

In [10]:
def count_bonds_of_N_and_C(protein):
    hv = oechem.OEHierView(protein)
    for residue in hv.GetResidues():
        resname = residue.GetResidueName()
        resid = residue.GetResidueNumber()
        if resname not in ["ACE", "NME"]:  # exclude ACE and NME residues
            for atom in residue.GetAtoms():
                atomname = atom.GetName().strip()
                if atomname in ['N', 'C']:
                    bond_number = len(list(atom.GetBonds()))
                    if bond_number < 3:  # both atoms should never have less than three bonds
                        print(f"Wrong number of bonds for residue {resname}{resid} atom {atomname}")

In [11]:
count_bonds_of_N_and_C(protein)

Interestingly, after saving and reloading the structure, 'C' and 'N' atoms before and after the modeled loop miss a bond.

In [12]:
protein = read_molecules(f"{user_cache_dir()}/{pdb_id}_prep.pdb")[0]
count_bonds_of_N_and_C(protein)

Wrong number of bonds for residue ASP793 atom C


The mentioned errors lead to wrong structures in the following steps. However, the functions do not fail and might produce good results, once OESpruce is fixed.

The kinase domain sequence can be retrieved from Uniprot. Also information about the termini and the residue numbers is retrieved and stored.

In [13]:
# retrieve 
kinase_domain_sequence = KinaseDomainAminoAcidSequence.from_uniprot(uniprot_id)
print(kinase_domain_sequence.name)
print(kinase_domain_sequence.metadata)
print(kinase_domain_sequence)

NTRK1_HUMAN
{'uniprot_id': 'P04629', 'begin': 510, 'end': 781, 'true_N_terminus': False, 'true_C_terminus': False}
IVLKWELGEGAFGKVFLAECHNLLPEQDKMLVAVKALKEASESARQDFQREAELLTMLQHQHIVRFFGVCTEGRPLLMVFEYMRHGDLNRFLRSHGPDAKLLAGGEDVAPGPLGLGQLLAVASQVAAGMVYLAGLHFVHRDLATRNCLVGQGLVVKIGDFGMSRDIYSTDYYRVGGRTMLPIRWMPPESILYRKFTTESDVWSFGVVLWEIFTYGKQPWYQLSNTEAIDCITQGRELERPRACPPEVYAIMRGCWQREPQQRHSIKDVHARL


In [14]:
from kinoml.modeling.OpenEyeModeling import mutate_structure, renumber_structure, get_sequence

In this case we will introduce a mutation to the wild type sequence retrieved from Uniprot, so we can use the `mutate_structure` function, which can mutate residues and will delete all residues that are not covered by the provided sequence.

In [15]:
kinase_domain_sequence_edit = kinase_domain_sequence[:2] + "I" + kinase_domain_sequence[3:]
kinase_domain_sequence_edit

'IVIKWELGEGAFGKVFLAECHNLLPEQDKMLVAVKALKEASESARQDFQREAELLTMLQHQHIVRFFGVCTEGRPLLMVFEYMRHGDLNRFLRSHGPDAKLLAGGEDVAPGPLGLGQLLAVASQVAAGMVYLAGLHFVHRDLATRNCLVGQGLVVKIGDFGMSRDIYSTDYYRVGGRTMLPIRWMPPESILYRKFTTESDVWSFGVVLWEIFTYGKQPWYQLSNTEAIDCITQGRELERPRACPPEVYAIMRGCWQREPQQRHSIKDVHARL'

In [16]:
mutated_structure = mutate_structure(protein, kinase_domain_sequence_edit)
write_molecules([mutated_structure], f"{user_cache_dir()}/{pdb_id}_mutated.pdb")

Next the residues can be renumbered to fit the numbering provided by Uniprot. In this case it is actually correct so we will just renumber by starting the residue numbering with 1.

In [17]:
renumbered_structure = renumber_structure(mutated_structure, list(range(len(get_sequence(mutated_structure)))))
write_molecules([renumbered_structure], f"{user_cache_dir()}/{pdb_id}_renumbered.pdb")

In the last step, one can use the prepare_protein function to add caps where desired. This step currently adds caps to all modeled loops, which is likely due to the bond problem discussed above.

In [19]:
# since no true termini, termini will be capped, TODO: if N terminus not capped it only has one hydrogen
kinase_domain = prepare_protein(renumbered_structure, cap_termini=False)
write_molecules([kinase_domain], f"{user_cache_dir()}/{pdb_id}_kinase_domain.pdb")

## 3. Dock ligand

Dock ligand of interest in prepared protein structure. In general works (docking.py).