# 3. Protein-Ligand Unbinding

## Introduction

Besides binding thermodynamics, binding kinetics are an important aspect of host-guest complexation. Ideal ligands do not only have a favourable binding free energy, but also a considerable residence time, implying it has a high enough reaction barrier to prevent frequent binding events.

Unlike thermodynamics, studying kinetics requires knowledge of the entire path of the process. In this section, we present an adaptive biasing algorithm, designed to acquire unbinding paths of protein-ligand complexes.



<center>
<img src="static/unbinding_chart.jpeg" alt="flowchart" width="600"/>
</center>

Details of the algorithm are available in [this](https://pubs.acs.org/doi/full/10.1021/acs.jctc.1c00924) publication, the code is available [online](https://github.com/rostaresearch/unbinding).

This tutorial contains a modified and simplified version to showcase its logic in this notebook.

### Notes about the unbinding trajectory
These trajectories were stripped of ions and water molecules for this demonstrative analysis.

The example system is a trypsin-benzamidine complex, PDB ID [3ATL](https://www.rcsb.org/structure/3ATL). We used CHARMM36m for the protein, TIP3 water, and standard CGenFF parametrisation of the benzamidine.

<center>
<img src="static/bound.png" alt="bound benzamidine" width="600"/>
</center>

## Standard Usage

In [None]:
import os
from main import Arguments, run

We will set the ligand name to 'BEN', and in order to read the `dcd` trajectories, we will use the file `topology_clean.pdb`.

We have to fetch the example from [here](https://www.dropbox.com/sh/wy7rbqxrofaq946/AABFGGXAxWz7LquJnISeVYQca?dl=0).

In [None]:
os.chdir("example")

### Important arguments
| Argument    | Type     | Default |
|-------------|:---------|--------:|
| lig         | string   |     LIG |
| top         | string   |  "find" |
| cutoff      | float(Å) |     3.5 |
| maxdist     | float(Å) |     9.0 |

We have a control over what is considered a contact (cutoff), and what is the point when we do not have to bias it anymore (maxdist). We found the defaults fairly universal, but there is a freedom to adapt them to your needs.

You may notice there is a third criterium regarding selecting a distance: it's variance. Biasing a largely flexible contact may distort the structure instead of helping the unbinding. However, this issue primarily emerges when no chemical moiety clustering is employed.

### Clustering
We use a structure based atom grouping, or clustering to deal with molecular symmetries. Take a look at the figure above, and the contact of the benzamidine to the Asp189. Considering the Lewis structure (which manifests in the atom names, not the parameters), there are rotations the system should be invariant to:

<center>
<img src="static/clustering_example.png" alt="clustering" width="400"/>
</center>

Therefore, by default, contacts are biased between the centre of mass of such groups. For protein residues, the clustering is embedded in the code, for ligands, you can define it `toppar/LIG_clusters.dat`. (You will see the three heavy atoms of the amidine group in one line.)

In [None]:
with open("toppar/LIG_clusters.dat", "r") as f:
    for l in f: print(l.strip())

Now we are ready to analyse the unbiased run `traj_0`

In [None]:
args = Arguments(lig="BEN", top="topology_clean.pdb", processonly=True)
run(args)
# processonly is necessary to supress writing inputs for the next iteration

The distances are displayed between certain atoms, but in fact, they are groups in the colvar function used in NAMD. Furthermore, they are not biased individually, but their sum.

<center>
<img src="static/bias1.png" alt="initial contacts" width="600"/>
</center>

This now have written a binary checkpoint file in you `example` directory. It is automatically used in further iterations.

In [None]:
args = Arguments(processonly=True)
for _ in range(3):
    run(args)

After processing `traj_3`, you notice two new distances.
<center>
<img src="static/bias4.png" alt="first new contact" width="600"/>
</center>

The colvars are recorded during the iterations, and they are summerised in `tracked_distances.csv`.

In [None]:
with open("distances_tracked.csv", "r") as f:
    for l in f: print(l.strip())

In [None]:
run(args)   # trajectory 5 is coming up

Now we can see the first distance being excluded.

In [None]:
for _ in range(2): run(args)

The process goes on similarly, sometimes adding, sometimes removing contacts from the colvar. At any point, you may see what the status is according to the checkpoint (the interesting parts are on the top, it finishes with the NAMD input):

In [None]:
run(Arguments(report=True))

In [None]:
for _ in range(4): run(args)

We have finished processing the trajectories saved in this example. Let us inspect the summary.

In [None]:
with open("distances_tracked.csv", "r") as f:
    for l in f: print(l.strip())

At this point, the ligand is outside the pocket, exposed to the bulk water. It may wonder around the protein, so depending on the unbinding settings, it may take further iterations to eliminate all contacts. Nevertheless, running an unbiased simulation from this point will also likely result in a free-roaming ligand.

<center>
<img src="static/bias11.png" alt="after 11 iterations" width="600"/>
</center>

To better understand the biasing, run another step, but this time without `processonly`.

In [None]:
run(Arguments())

This should create the folder `traj_12` with a NAMD input file `traj_12.inp`, based on the template and the colvar file `sum_12.col`. In the latter, you will see the groups defined by indices (same as in VMD) and that the sum of those is being progressively shifted from 32.79 to 36.79.

In [None]:
with open("traj_12/sum_12.col", "r") as f:
    for l in f: print(l.strip())

## Additional Options

The complete list of options is available in the [public repository of the unbinding method](https://github.com/rostaresearch/unbinding), here we learn about a few more to play with the results.

You can always rerun any existing step without giving up your checkpoint with `nosave`.

In [None]:
args = Arguments(processonly=True, trajectory=7, nosave=True)
run(args)

Should you corrupt your checkpoint, there is also an option to catch up from scratch. This is especially useful if you want to play with the contact definition.

In [None]:
# with cumulative, you will always need the initial setup parameters as well, as it does not use the checkpoint
args = Arguments(cutoff=3.3, maxdist=6, lig="BEN", top="topology_clean.pdb", processonly=True, trajectory=5, nosave=True, cumulative=True)
run(args)

## Explore the Unbinding

Here I leave two cells for you to try things. Feel free to dig in and ask questions.

In [None]:
args = Arguments(
    trajectory=5,
    cumulative=True,
    cutoff=3,
    maxdist=5,
    processonly=True,
    nosave=True,
    lig="BEN",
    top="topology_clean.pdb",
)
run(args)

In [None]:
with open("distances_tracked.csv", "r") as f:
    for l in f: print(l.strip())