# Umbrella sampling in Molecular dynamics to enhance conformational sampling

Sometimes we want to ensure sampling of a particular region of phase space. For example, we may want to sample the transition state of a reaction, or the folded state of a protein. Other times, we may struggle to sample a particular region of phase space, and want to enhance sampling there. Umbrella sampling is a technique that can be used to achieve both of these goals.

Briefly, umbrella sampling works by adding a harmonic biasing potential to the potential energy of the system, tethering the bias to a "point" in phase space. We implement this "point" using some known order parameter that is "useful" to our context (as we will show below). 

For example, if we want to sample a loop motion in a protein, we might use define some distances as our CVs, and then set up a biasing potential centered on some specific pair of distances. 

The biasing potential is essentially a "spring" (a harmonic potential) based on the distance CVs, and is high ("pulling harder") when the distances are far away from our desired Point, and lower as we approach the center. This "pulling" is what biases the system to sample the region of phase space around the Point. The harmonic nature of this potential is what makes it an "umbrella" sampling potential, because it's an umbrella centered around a point of interest.

In this notebook, we will implement umbrella sampling in a 2D protein. We will also show how to quickly track the progress of the simulation as time progresses with a single umbrella. Other analysis methods will be briefly discussed but are [implemented](https://pymbar.readthedocs.io/en/master/fes_with_pymbar.html#fes-with-pymbar) [elsewhere](https://fastmbar.readthedocs.io/en/latest/butane_PMF.html).

Okay first let's import basic libraries and set up our system

In [1]:
from openmm.app import *
from openmm import *
from openmm.unit import *

# Load an already solvated PDB file and set up the system + state
pdb = PDBFile("../villin.pdb")
omm_forcefield = ForceField("amber/ff14SB.xml", "amber14/tip3p.xml")
system = omm_forcefield.createSystem(pdb.topology,
                                         nonbondedMethod=PME,
                                         nonbondedCutoff=10.0 * angstrom,
                                         constraints=HBonds,
                                         rigidWater=True,
                                         hydrogenMass=4.0 * amu)


Okay now let's define our CVs. We will use the distances between the alpha carbons of three residues as our CVs (forming a triangle). First let's define the relevant atom indices (these are system specific, so you will need to change them for your system).

In [2]:
# Define three atom indices - these will be used to define distance CVs
d1_atom1_ind = 83
d1_atom2_ind = 151
d2_atom2_ind = 254

Now it's time to actually add the harmonic potential to our system. We will use the `HarmonicBondForce` class to do this. This class allows us to define a Harmonic bond that will be represent our biasing potential. The `HarmonicBondForce` defines a harmonic potential based on the distances between the alpha carbons of the three residues we defined above.

It's worth noting that in our `addBond` definition, we are defining the center point of our harmonic potential (ie a distance between the alpha carbons) as 0.3 - this is simply defined for our current system of example, but should change depending on your system of interest. This will result in our simulation to be biased to sample around this center-distance (0.3).

If we wanted to bias the system to a different point, we would simply change the `center` argument to the desired distance.
We are also defining a potential "strength" of 1.0 (again, system specific), which is the spring constant of the harmonic potential. This is the "strength" of the biasing potential, and will determine how strongly the system is biased to sample around the center point.

In [3]:
# Create Harmonic Force that operate on the CVs 
bias = HarmonicBondForce()

# Add the first distance - we'll call it D1
bias.addBond(d1_atom1_ind,
             d1_atom2_ind,
             0.3, # Equilibrium distance and center of the umbrella
             1.0 )# Force constant

# Add the second distance - we'll call it D2
bias.addBond(d1_atom1_ind,
             d2_atom2_ind,
             0.3, # Equilibrium distance and center of the umbrella
             1.0) # Force constant

1

Then we add the `bias` that we created to the system.

In [4]:
# Add force to the system
system.addForce(bias)

5

It is also useful for us to be able to track these distances as we run the simulation. To track D1 and D2, we could use a whole `Reporter` object to save the whole trajectory and measure it afterwards.

We can do this by creating a `CustomCVForce` that will track the distances between the alpha carbons of the three residues we defined above. This will allow us to see how the distances change as the simulation progresses.

A lightweight way of doing this without the whole trajectory is to use a `customCVForce` object set to 0 (i.e. no bias). This will compute the value of a CV at each step of the simulation.

First we define our distance measuring variable `dist_measurer`

In [5]:
# define a distance measurer
dist_measurer = CustomCVForce("0")

Next, we will create our two distances as separate Bond Forces and add them to the system.
Note: If we added both D1 and D2 to the same BondForce, then doing `dist_measurer.getCollectiveVariableValues` will return only the first distance (D1), so we have to add them as separate `customBondForce` variables.


In [6]:
# Define our measuring BondForces
D1 = CustomBondForce("r")
D1.addBond(d1_atom2_ind, d1_atom1_ind)
D2 = CustomBondForce("r")
D2.addBond(d2_atom2_ind, d1_atom1_ind)

# Add each BondForce as CVs into the dist_measurer
dist_measurer.addCollectiveVariable("D1", D1)
dist_measurer.addCollectiveVariable("D2", D2)
system.addForce(dist_measurer)

6

Now we can run the simulation. We will use a `LangevinIntegrator` with a timestep of 2 fs, and a collision rate of 1/ps.

In [7]:
integrator = LangevinMiddleIntegrator(300*kelvin,
                                      1.0/picosecond,
                                      0.002 * picosecond)

In [8]:
simulation = Simulation(pdb.topology,system, integrator)

Alright - we now have our `simulation` object, which contains our single umbrella. We can now run the simulation and see how the distances change over time.

First we set our initial positions:

In [9]:
simulation.context.setPositions(pdb.positions)

Using our `dist_measurer`, we can track D1 and D2 for each frame of the simulation.\
For both D1 and D2, we also know the value of the biasing potential that was applied (using the constants `k` and `r0`). From this we can compute the Reduced Free Energy at each frame per window.\

In [10]:
for i in range(5):
    simulation.step(10)
    n_steps = str(simulation.context.getStepCount())
    v0 = simulation.context.getState(getEnergy=True).getPotentialEnergy()
    d1,d2 = dist_measurer.getCollectiveVariableValues(simulation.context)
    print(n_steps, v0, d1, d2)

10 -125075.91936972504 kJ/mol 0.4543943703174591 0.4209526777267456
20 -125568.32561972504 kJ/mol 0.4560399055480957 0.41610270738601685
30 -126105.88811972504 kJ/mol 0.4422377943992615 0.4061473608016968
40 -126663.73186972504 kJ/mol 0.4465791583061218 0.413379043340683
50 -126888.26311972504 kJ/mol 0.4487186670303345 0.42601025104522705


Briefly, in our simulation, to generate an Free Energy Profile (or a Potential of Mean Force - PMF) we need 4 things:
1. A set of features that each biasing potential will be projected upon (D1 and D2)
2. A biasing potential that upon an specific CV value for each window (our harmonic constants)
3. The actual measured D1 and D2 values for each frame, for each window 
4. For each frame, the calculated total Potential energy

Note that the above simulation is done for a _single_ window. To compute the PMF, we need to run multiple simulations, each with a different biasing potential applied (ie a different window).

The easiest way to set up multiple windows, you'll just want to set up multiple `system` objects.\
For each `system`, you'll define a set of umbrellas with something like:

```
M = 20
r0_range = np.linspace(0.3, 2.0, M, endpoint = False)
```

Then for each new system (1 per window), you'd can define it after `context` construction: 
```
simulation.context.setParameter('r0_d1', r0_range[m])
```