## MD Simulations of NTRKs

This notebook analyses results from MD simulations of inhibitors larotrectinib, selitrectinib and repotrectinib docked into NTRK1-3.

### 1. System preparation

### 1.1 Docked structures

#### Data can be found in `../docking/docking.ipynb`

PDB structures `4YNE`, `4AT3`, and `6KZD` were used as structures for NTRK1, NTRK2, and NTRK3 respectively. 

A short summary of how structures were produced is shown below (check `../docking/docking.ipynb` for a full description):

* NTRK structures for docking were chosen based on searching the KLIFS database for NTRK entries in complex with ligands similar to the core structure of larotrectinib.
* Both `4YNE` and `4AT3` are in the DFG in/$\alpha$C helix out conformation, whilst `6KZD` is in the DFG out/$\alpha$C helix out conformation.
* OESpruce was not able to model all missing residues.
* Docking results were good for NTRK1 and NTRK2 compared to binding modes in Figure 1 from [Drilon et al. 2017](https://cancerdiscovery.aacrjournals.org/content/7/9/963).
* Docking into NTRK3 in the wrong conformation gave worse results. Another approach was performed using the coordinates of the nitrogen atom named NAN of the co-crystalized ligand in `6KZD` as hint coordinates for chemgauss docking. 

### 1.2 Adding missing residues

#### Data can be found in: `./data/add_missing_loops/`

All models created via docking contained missing loop residues that were not modelled by OESpruce.

* `4YNE` contained 4 missing loops (between 2-5 residues missing in each loop).
* `4AT3` contained 4 missing loops (between 2-4 residues missing in each loop).
* `6KZD` contained 2 missing loops (between 12-16 residues missing in each loop).

[MODELLER](https://salilab.org/modeller/) was used to add in the short loops in both `4YNE` and `4AT3`. Since `6KZD` contained two large loops (12 and 16 residues respoectively) loops were not modelled in.

Alignment files were manually created using the full protein sequence and PDB file sequence (extracted via [PyMol](https://pymol.org/2/)). Any artifacts (e.g. the expression tag `GSGIR` in `4YNE`) were from the structural sequence before creating models.

A total of 200 models were created and scored using [Discret Optimised Protein Energy](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2242414/) (DOPE). The best scoring DOPE models (most negative value) for each strcuture were:

* `4YNE`: 4YNE_fill.BL01040001.pdb (DOPE score -4225.328)
* `4AT3`: 4AT3_fill.BL00850001.pdb (DOPE score -3242.686)
* `6KZD`: N/A

### 1.3 Equilibration

#### Data can be found in: `./data/md_equil/`

All models were capped with ACE and NME residues at their N and C termini. `TER` cards were added at the end of each chain and the final model saved as: `<pdb_code>_prepped.pdb` (e.g. `./data/md_equil/4YNE/4YNE_prepped.pdb`).

The equilibration protocol for each system can be found in `./data/md_equil/<pdb_code>/md_equil.py`. A general overview is given below:

* Each protein:ligand system was prepared using the `amberff14sb`:`openff-1.1.0` forcefields and energy minimised.
* Temperature = 300 K, Pressure = 1 bar, Ionic strength = 150 mM NaCl. 
* Each system was equilibrated in the NPT ensemble for 5 ns.
* The final equilibrated structure was written out to a PDB file named `equilibrated_state_5ns.pdb` (e.g. `./data/md_equil/output/larotrectinib/equilibrated_state_5ns.pdb`) for visual inspection.



### 2. Production runs