# Umbrella sampling in Molecular dynamics to enhance conformational sampling

Sometimes we want to ensure sampling of a particular region of phase space. For example, we may want to sample the transition state of a reaction, or the folded state of a protein. Other times, we may struggle to sample a particular region of phase space, and want to enhance sampling there. Umbrella sampling is a technique that can be used to achieve both of these goals.

Briefly, umbrella sampling works by adding a harmonic biasing potential to the potential energy of the system, tethering the bias to a "point" in phase space. We implement this "point" using some known order parameter that is "useful" to our context (as we will show below). 

For example, if we want to sample a loop motion in a protein, we might use define some distances as our CVs, and then set up a biasing potential centered on some specific pair of distances. 

The biasing potential is essentially a "spring" (a harmonic potential) based on the distance CVs, and is high ("pulling harder") when the distances are far away from our desired Point, and lower as we approach the center. This "pulling" is what biases the system to sample the region of phase space around the Point. The harmonic nature of this potential is what makes it an "umbrella" sampling potential, because it's an umbrella centered around a point of interest.

In this notebook, we will implement umbrella sampling in a 2D protein. We will also show how to quickly track the progress of the simulation as time progresses with a single umbrella. Other analysis methods will be briefly discussed but are [implemented](https://pymbar.readthedocs.io/en/master/fes_with_pymbar.html#fes-with-pymbar) [elsewhere](https://fastmbar.readthedocs.io/en/latest/butane_PMF.html).

Okay first let's import basic libraries and set up our system

In [57]:
from openmm.app import *
from openmm import *
from openmm.unit import *

# Load an already solvated PDB file and set up the system + state
pdb = PDBFile("../villin.pdb")
omm_forcefield = ForceField("amber/ff14SB.xml", "amber14/tip3p.xml")
system = omm_forcefield.createSystem(pdb.topology,
                                         nonbondedMethod=PME,
                                         nonbondedCutoff=10.0 * angstrom,
                                         constraints=HBonds,
                                         rigidWater=True,
                                         hydrogenMass=4.0 * amu)


Okay now let's define our CVs. We will use the distances between the alpha carbons of three residues as our CVs (forming a triangle). First let's define the relevant atom indices (these are system specific, so you will need to change them for your system).

In [58]:
# Define four atom indices - these will be used to define distance CVs
d1_atom1_ind = 83
d1_atom2_ind = 151
d2_atom1_ind = 83
d2_atom2_ind = 254

Now it's time to actually add the harmonic potential to our system. We will use the `HarmonicBondForce` class to do this. This class allows us to define a Harmonic bond that will be represent our biasing potential. The `HarmonicBondForce` defines a harmonic potential based on the distances between the alpha carbons of the three residues we defined above.

It's worth noting that in our `addBond` definition, we are defining the center point of our harmonic potential (ie a distance between the alpha carbons) as 0.3 - this is simply defined for our current system of example, but should change depending on your system of interest. This will result in our simulation to be biased to sample around this center-distance (0.3).

If we wanted to bias the system to a different point, we would simply change the `center` argument to the desired distance.
We are also defining a potential "strength" of 1.0 (again, system specific), which is the spring constant of the harmonic potential. This is the "strength" of the biasing potential, and will determine how strongly the system is biased to sample around the center point.

In [59]:
# Create Harmonic Force that operate on the CVs 
bias = HarmonicBondForce()

# Add the first distance - we'll call it D1
bias.addBond(d1_atom1_ind,
             d1_atom2_ind,
             0.3, # Equilibrium distance and center of the umbrella
             1.0 )# Force constant

# Add the second distance - we'll call it D2
bias.addBond(d1_atom1_ind,
             d2_atom2_ind,
             0.3, # Equilibrium distance and center of the umbrella
             1.0) # Force constant

1

Then we add the `bias` that we created to the system.

In [60]:
# Add force to the system
system.addForce(bias)

5

Now we can run the simulation. We will use a `LangevinIntegrator` with a timestep of 2 fs, and a collision rate of 1/ps.

In [61]:
integrator = LangevinMiddleIntegrator(300*kelvin,
                                      1.0/picosecond,
                                      0.002 * picosecond)

In [62]:
simulation = Simulation(pdb.topology,system, integrator)

Alright - we now have our `simulation` object, which contains our single umbrella. We can now run the simulation and see how the distances change over time.

It is useful to be able to track these distances as we run the simulation.\ To track D1 and D2, we could use a whole `Reporter` object to save the whole trajectory and measure it afterwards.

In our case, we will track it live by simply tracking the state every `n` steps of the simulation.
We will track this `n` using the `recording_frequency` variable, and print out the values of D1 and D2 every `recording_frequency` steps.

First we set our initial positions:

In [63]:
simulation.context.setPositions(pdb.positions)

For both D1 and D2, we also know the value of the biasing potential that was applied (using the constants `k` and `r0`). From this we can compute the Reduced Free Energy at each frame per window.

Note that we have here set `recording_frequency` to be every 10 steps, and our total simulation lasts 100 steps.\
This is _incredibly_ frequent, and honestly unnecessary and inefficient. For larger datasets of larger systems, it is usually worth to sample these kinds of values between every 100 picoseconds to 1 nanosecond. 

In [64]:
recording_frequency = 10
for i in range(100):
    simulation.step(1)
    n_steps = str(simulation.context.getStepCount())
    v0 = simulation.context.getState(getEnergy=True).getPotentialEnergy()
    if (int(n_steps) % recording_frequency == 0):
        positions = simulation.context.getState(getPositions=True).getPositions(asNumpy=True)
        d1 = norm(positions[d1_atom1_ind]-positions[d1_atom2_ind])
        d2 = norm(positions[d1_atom1_ind]-positions[d2_atom2_ind])
        print(n_steps, v0, d1, d2)

10 -125015.32561972504 kJ/mol 0.4559841483272196 nm 0.41995640009332197 nm
20 -125566.60686972504 kJ/mol 0.4596610267954368 nm 0.42130091281210436 nm
30 -125996.41936972504 kJ/mol 0.448091928472435 nm 0.40907275245966596 nm
40 -126575.20061972504 kJ/mol 0.451266223841282 nm 0.41691610152012887 nm
50 -126753.91936972504 kJ/mol 0.44552891148232104 nm 0.42611134470872025 nm
60 -126954.98186972504 kJ/mol 0.4461977338169795 nm 0.42951389447410804 nm
70 -127114.32561972504 kJ/mol 0.45394686580624755 nm 0.4392666442667449 nm
80 -127074.38811972504 kJ/mol 0.46161737579219086 nm 0.44696347913249496 nm
90 -126817.70061972504 kJ/mol 0.4742899827232068 nm 0.4398387987847776 nm
100 -126489.13811972504 kJ/mol 0.4687859500133253 nm 0.43901035396907867 nm


Briefly, in our simulation, to generate an Free Energy Profile (or a Potential of Mean Force - PMF) we need 4 things:
1. A set of features that each biasing potential will be projected upon (D1 and D2)
2. A biasing potential that upon an specific CV value for each window (our harmonic constants)
3. The actual measured D1 and D2 values for each frame, for each window 
4. For each frame, the calculated total Potential energy

Note that the above simulation is done for a _single_ window. To compute the PMF, we need to run multiple simulations, each with a different biasing potential applied (ie a different window). See the [tutorial](https://openmm.github.io/openmm-cookbook/latest/notebooks/tutorials/umbrella_sampling.html#Step-3---Analysis---compute-the-PMF) page for further details! 

The easiest way to set up multiple windows, you'll just want to set up multiple `system` objects.\
For each `system`, you'll define a set of umbrellas with something like:

```
M = 20
r0_range = np.linspace(0.3, 2.0, M, endpoint = False)
```

Then for each new system (1 per window), you'd can define it after `context` construction: 
```
simulation.context.setParameter('r0_d1', r0_range[m])
```