# Docking into NTRKs

This notebook docks the inhibitors larotrectinib, selitrectinib and repotrectinib into NTRK1-3.

## 1. Selecting structures for docking

NTRK structures for docking were chosen based on searching the [KLIFS database](https://klifs.vu-compmedchem.nl/) for NTRK entries in complex with ligands similar to the substructure `C1CC(c2ccccc2)N(C1)c1ccn2c(ccn2)n1`, which is the core structure of larotrectinib and is very similar in case of selitrectinib and repotrectinib. This procedure resulted in the identification of NTRK1 structure [4YNE](https://www.rcsb.org/structure/4YNE), which was resolved with a DFG in/$\alpha$C helix out conformation according to the KLIFS conformation enumeration.

Assuming that larotrectinib, selitrectinib and repotrectinib would prefer binding to the same conformation in case of NTRK2 and NTRK3, the KLIFS database was searched for structures matching the DFG in/$\alpha$C helix out conformation. Additionally, co-crystalized ligands were checked for their similarity towards the larotrectinib core structure. This procedure resulted in the selection of [4AT3](https://www.rcsb.org/structure/4AT3) for NTRK2. However, NTRK3 structures with a DFG in/$\alpha$C helix out conformation were not yet released in the PDB. Hence, we used the NTRK3 structure with the most similar co-crystalized ligand compared to the larotrectinib core structure, i.e., [6KZD](https://www.rcsb.org/structure/6KZD) with a DFG out/$\alpha$C helix out conformation.

Relevant data about preferred chain and alternate locations for PDB entries 4YNE, 4AT3 and 6KZD were retrieved from KLIFS and stored in `data/complexes.csv`.

## 2. Read data

In [1]:
# imports
from dockin.oe_docking import get_structure_from_pdb, select_chain, select_altloc, \
    select_ligand, prepare_complex, create_hybrid_receptor, create_hint_receptor, \
    hybrid_docking, chemgauss_docking
from openeye import oechem
import pandas as pd
from rdkit import Chem

In [2]:
# read activity data
activities = pd.read_csv('../../data/activities.csv')
activities

Unnamed: 0,compound,smiles,NTRK1 WT,NTRK1 G595R,NTRK1 G667C,NTRK2 WT,NTRK2 G639R,NTRK3 WT,NTRK3 G623R,NTRK3 G696A,DOI
0,larotrectinib,O=C(Nc1cnn2ccc(N3CCC[C@@H]3c3cc(F)ccc3F)nc12)N...,0.9,69.0,45.5,,,2.8,48.0,4.5,10.1158/2159-8290.CD-17-0507
1,selitrectinib,C[C@@H]1CCc2ncc(F)cc2[C@H]2CCCN2c2ccn3ncc(c3n2...,0.6,2.0,9.8,,,2.5,2.3,2.5,10.1158/2159-8290.CD-17-0507
2,repotrectinib,C[C@H]1CNC(=O)c2cnn3ccc(nc23)N[C@H](C)c2cc(F)c...,0.533,2.67,,0.297,2.66,0.211,4.46,,10.1158/2159-8290.CD-18-0484


In [3]:
# convert SMILES to sdf and keep name of molecules
mols = []
with open('../../data/activities.csv', 'r') as rf:
    for line in rf.readlines()[1:]:
        line = line.split(',')
        name, smiles = line[0], line[1]
        mol = Chem.MolFromSmiles(smiles)
        mol.SetProp('_Name', str(name))
        mol = Chem.AddHs(mol)
        mols.append(mol)
wf = Chem.SDWriter('data/ligands.sdf')
for mol in mols:
    wf.write(mol)

In [4]:
# read molecules as openeye mols
mols = []
ifs = oechem.oemolistream()
if ifs.open('data/ligands.sdf'):
    for mol in ifs.GetOEGraphMols():
        mols.append(oechem.OEGraphMol(mol))
else:
    oechem.OEThrow.Fatal(f'Unable to open {file_name}')

In [5]:
# read structural data
structure_entries = pd.read_csv('data/complexes.csv')
structure_entries

Unnamed: 0,structure,pdb,ligand,chain,alt
0,NTRK1,4YNE,4EK,A,A
1,NTRK2,4AT3,LTI,A,
2,NTRK3,6KZD,DZ6,A,


## 3. Docking

In [6]:
# hybrid docking
for index, entry in structure_entries.iterrows():
    structure = get_structure_from_pdb(entry['pdb'])
    structure = select_chain(structure, entry['chain'])
    if not pd.isna(entry['alt']):
        structure = select_altloc(structure, entry['alt'])
    structure = select_ligand(structure, entry['ligand'])
    protein, ligand = prepare_complex(structure, protein_save_path=f'data/{entry["pdb"]}_protein.pdb')
    hybrid_receptor = create_hybrid_receptor(protein, ligand)
    docking_poses = hybrid_docking(hybrid_receptor, mols, 
                                   docking_poses_save_path=f'data/{entry["pdb"]}_hybrid_docking.sdf')

Re-optimizing hydrogen positions...
Identifying design units...
Re-optimizing hydrogen positions...
Identifying design units...
Re-optimizing hydrogen positions...
Identifying design units...


The docking results were good for NTRK1 and NTRK2 compared to binding modes in Figure 1 from [Drilon et al. 2017](https://www.doi.org/10.1158/2159-8290.CD-17-0507). Docking into NTRK3 in the wrong conformation gave worse results. Hence, another approach was performed using the coordinates of the nitrogen atom named `NAN` of the co-crystalized ligand in [6KZD](https://www.rcsb.org/structure/6KZD) as hint coordinates for chemgauss docking.

In [7]:
# select structure
ntrk3 = structure_entries.iloc[2]
ntrk3

structure    NTRK3
pdb           6KZD
ligand         DZ6
chain            A
alt            NaN
Name: 2, dtype: object

In [8]:
# perform chemgauss docking with hint coordinate
structure = get_structure_from_pdb(ntrk3['pdb'])
structure = select_chain(structure, ntrk3['chain'])
structure = select_ligand(structure, ntrk3['ligand'])
protein, ligand = prepare_complex(structure)
hintx, hinty, hintz = -4.232, -18.096, 11.762 # coordinates of atom named NAN from ligand DZ6 in 6KZD
hint_receptor = create_hint_receptor(protein, hintx, hinty, hintz)
docking_poses = chemgauss_docking(hint_receptor, mols, 
                                  docking_poses_save_path=f'data/{ntrk3["pdb"]}_chemgauss_docking.sdf')

Re-optimizing hydrogen positions...
Identifying design units...


However, also these docking results were not very promissing, which underlines the importance of docking into the correct conformation.

Importantly, OESpruce was not able to model all missing residues. Also, relatively short chain breaks were sometimes not modeled, which might affect the behavior of the systems in MD simulations.

## 4. Generated data

- prepared proteins
  - `data/4YNE_protein.pdb`
  - `data/4AT3_protein.pdb`
  - `data/6KZD_protein.pdb`
- docking poses
  - `data/4YNE_hybrid_docking.sdf`
  - `data/4AT3_hybrid_docking.sdf`
  - `data/6KZD_hybrid_docking.sdf`
  - `data/6KZD_chemgauss_docking.sdf`