# The Weighted Ensemble Method

## Propanol Membrane Permeation Kinetics, with OpenMM


---------

### Part 1: Building the OpenMM system
We begin by importing the packages required to build the simulation system in OpenMM: 

In [None]:
import openmm.app as omm_app
import openmm as omm
import openmm.unit as unit

from matplotlib import pyplot as plt
%matplotlib inline

import json

Now we create the `system`, and then a `simulation` object:

In [None]:
prmtop = omm_app.AmberPrmtopFile('pol_dppc.prmtop')
inpcrd = omm_app.AmberInpcrdFile('pol_dppc_md6.ncrst')
system = prmtop.createSystem(nonbondedMethod=omm_app.PME, nonbondedCutoff=10.0*unit.angstrom,
        constraints=omm_app.HBonds)

T = 300.0 * unit.kelvin  ## temperature
fricCoef = 1.0 / unit.picoseconds ## friction coefficient 
stepsize = 0.002 * unit.picoseconds ## integration step size
integrator = omm.LangevinIntegrator(T, fricCoef, stepsize)

simulation = omm_app.Simulation(prmtop.topology, system, integrator)
simulation.context.setPositions(inpcrd.positions)
if inpcrd.boxVectors is not None:
    simulation.context.setPeriodicBoxVectors(*inpcrd.boxVectors)
    
print(f'OpenMM will use the {simulation.context.getPlatform().getName()} platform')

### Part 2: Building the WE workflow
Now we import WElib and other utilities that will be useful. Many are the same as those used for the simple double well potential example, but we have OpenMM-compatible versions of the `Stepper` and `ProgressCoordinator`:

In [None]:
import mdtraj as mdt
import numpy as np
import time
from WElib import Walker, FunctionStepper, FunctionProgressCoordinator, Recycler, StaticBinner, SplitMerger

Create some walkers, each begins in the initial, dissociated, state:

In [None]:
initial_state = simulation.context.getState(getPositions=True, enforcePeriodicBox=True)

n_reps = 5
walkers = [Walker(initial_state, 1.0/n_reps) for i in range(n_reps)]
for w in walkers:
    print(w)

The first progress coordinate will be the Z-distance between centre of mass of the propanol molecule and the centre of the lipid bilayer (centre of mass of the lipid head group P atoms). The second will be the orientation of the propanol (the angle between the z-axis and the vector between the oxygen and terminal C-atom positions). We need to create a function that returns this, given the `state` of a walker:

In [None]:
def ztpc(state, ligand_head_atom, ligand_tail_atom, membrane_atoms):
    """
    2D progress coordinates for molecule permeation
    """
    crds = state.getPositions(asNumpy=True) / unit.nanometers
    lig_com = crds[[ligand_head_atom, ligand_tail_atom]].mean(axis=0)
    memb_com = crds[membrane_atoms].mean(axis=0)
    lig_v = crds[ligand_head_atom] - crds[ligand_tail_atom]
    lig_v /= np.sqrt((lig_v*lig_v).sum())
    cos_t = np.dot(lig_v, np.array([0., 0., 1.]))
    return ((lig_com - memb_com)[2], np.arccos(cos_t) * 180.0 / np.pi)

top = mdt.load_topology('pol_dppc.prmtop')
membrane_atoms = top.select('name P31')
ligand_head_atom = top.select('resname POL and name O1')[0]
ligand_tail_atom = top.select('resname POL and name C3')[0]

progress_coordinator = FunctionProgressCoordinator(ztpc, ligand_head_atom, ligand_tail_atom, membrane_atoms)
walkers = progress_coordinator.run(walkers)
for w in walkers:
    print(w)

We bin along Z in 0.2 nm intervals, but divide theta into just three bins: 0-45 degrees, 45-135 degrees, and > 135 degrees. This is probably not optimal, but will do for now:

In [None]:
binner = StaticBinner([[-3.6, -3.4, -3.2, -3.0, -2.8, -2.6, -2.4, -2.2, -2.0, -1.8, -1.6, -1.4, -1.2, -1.0, 
                       -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 
                       0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6],
                       [45.0, 135.0]])
walkers = binner.run(walkers)
for w in walkers:
    print(w)

We will recycle walkers when the propanol Z-coordinate exceeds 3.6 nm nm, there is no target value for theta (any value will do):

In [None]:
recycler = Recycler(initial_state, [3.6, None])
walkers = recycler.run(walkers)
for w in walkers:
    print(w)
print('recycled flux = ',recycler.flux)

There is an issue we need to deal with: due to periodic boundary conditions, the ligand could take a "short cut" to the other side of the membrane by _decreasing_ its Z-coordinate sufficiently. To avoid this, we add a supplementary recycler:

In [None]:
pbc_recycler = Recycler(initial_state, [-3.8, None], retrograde=True)
walkers = pbc_recycler.run(walkers)
for w in walkers:
    print(w)
print('recycled flux = ',recycler.flux) # just to check - we will not be concerned with this during the simulations

The SplitMerger is just the same as that used for the DWP example. We create it and run it, even though we know that at this time it will have nothing to do:

In [None]:
splitmerger = SplitMerger(n_reps)
walkers = splitmerger.run(walkers)
for w in walkers:
    print(w)

Create a function that will run an OpenMM simulation. The function needs to take the current state of the system as its first argument, and return the final state at the end of the MD. Then use this created function to initialise a FunctionStepper, as was done for the DWP example.

In [None]:
def OMMSim(state, simulation, nsteps):
    simulation.context.setPositions(state.getPositions())
    simulation.context.setPeriodicBoxVectors(*state.getPeriodicBoxVectors())
    simulation.step(nsteps)
    return simulation.context.getState(getPositions=True, enforcePeriodicBox=False) # don't try to wrap coordinates

stepper = FunctionStepper(OMMSim, simulation, 500)

Now we will apply the stepper. Note this will take some time longer to run than in the DWP example, exactly how long will depend on power of your laptop/workstation:

In [None]:
start_time = time.time()
new_walkers = stepper.run(walkers) # this is where the MD happens
end_time = time.time()
print(f'{len(walkers)} simulations completed in {end_time-start_time:6.1f} seconds')

Let's see where those MD steps have moved each walker to:

In [None]:
new_walkers = progress_coordinator.run(new_walkers)
print('before recycling adjustments:')
for w in new_walkers:
    print(w)
new_walkers = pbc_recycler.run(new_walkers)
if pbc_recycler.flux > 0.0:
    new_walkers = progress_coordinator.run(new_walkers)
new_walkers = recycler.run(new_walkers)
if recycler.flux > 0.0:
    new_walkers = progress_coordinator.run(new_walkers)
new_walkers = binner.run(new_walkers)

print('recycled flux = ', recycler.flux)
for w in new_walkers:
    print(w)

Apply the SplitMerger to the list of walkers:

In [None]:
new_walkers = splitmerger.run(new_walkers)
for w in new_walkers:
    print(w)

### Part 3: Iterating the WE workflow
OK, that's all the components in place, they have been tested individually and seem to be bahaving. Time to run a few cycles:

In [None]:
n_cycles=10
results = []
print(' cycle    n_walkers   left-most bin  right-most bin   flux')
for i in range(n_cycles):
    new_walkers = stepper.run(new_walkers)
    new_walkers = progress_coordinator.run(new_walkers)
    new_walkers = pbc_recycler.run(new_walkers)
    if pbc_recycler.flux > 0.0:
        new_walkers = progress_coordinator.run(new_walkers)
    new_walkers = recycler.run(new_walkers)
    if recycler.flux > 0.0:
        new_walkers = progress_coordinator.run(new_walkers)
    new_walkers = binner.run(new_walkers)
    new_walkers = splitmerger.run(new_walkers)
    occupied_bins = list(binner.bin_weights.keys())
    print(f' {i:3d} {len(new_walkers):10d} {str(min(occupied_bins)):^14s} {str(max(occupied_bins)):^12s} {recycler.flux:20.8f}')
    result = {'cycle': i, 'n_walkers': len(new_walkers), 'pcdata': [w.pc for w in new_walkers]}
    results.append(result)

Plot the current position of each walker in the (Z, theta) plane:

In [None]:
pcdata = np.array(results[-1]['pcdata'])
plt.plot(pcdata[:, 0], pcdata[:, 1], 'go')
plt.xlabel('Z (nm)')
plt.ylabel('theta (degrees)')

### Analysis of a longer simulation

Clearly this is not an exampole you are going to be able to complete in a notebook. We have provided you with the log file, `pol_dppc.json` obtained when this simulation was run for 100 cycles with 6 walkers per bin. 

**NB**: as it happens, in this simulation the z-coordinate is efectively 'flipped', so the propanol starts on the *positive* Z-side of the membrane).

In [None]:
with open('pol_dppc.json') as f:
    results = json.load(f)
# convert progress coordinatre data to numpy:  
for result in results:
    result['pcdata'] = np.array(result['pcdata'])

See how the number of walkers increases over time, and how the 'pathfinder' (most advanced walker) advances through the membrane:

In [None]:
plt.figure(figsize=(14, 6))
plt.subplot(121)
plt.plot([r['n_walkers'] for r in results])
plt.xlabel('cycle')
plt.ylabel('n_walkers)')
plt.title('number of walkers')
plt.subplot(122)
plt.plot([r['pcdata'][:, 0].min() for r in results])
plt.xlabel('cycle')
plt.ylabel('z_min (nm)')
plt.title('pathfinder Z-coordinate')

By the time simulations have percolated all the way through the membrane, we are having to run 500 simulations per cycle - but at least each of these is short! The data we have provided is no way converged, so we will not attemp to calculate a permeation rate from it, but we can still see interesting details:

In [None]:
plt.figure(figsize=(14, 4))
plt.subplot(131)
pcdata_all = np.concatenate([ r['pcdata'] for r in results])
plt.plot(pcdata_all[:,0], pcdata_all[:, 1], 'r.')
plt.title('walker positions')
plt.xlabel('Z (nm)')
plt.ylabel('theta (degrees)')
plt.subplot(132)
H, x_edges, y_edges, b = plt.hist2d(pcdata_all[:, 0], pcdata_all[:, 1], bins=30)
plt.title('walker distribution')
plt.xlabel('Z (nm)')
plt.ylabel('theta (degrees)')
plt.subplot(133)
xcen = x_edges[:-1] + x_edges[1:] / 2
ycen = y_edges[:-1] + y_edges[1:] / 2
out = plt.contour(xcen, ycen, H.T)
plt.title('walker distribution')
plt.xlabel('Z (nm)')
plt.ylabel('theta (degrees)')

The plots show data for all walkers, all simulation cycles. We see that when walkers are in the aqueous compartments either side of the membrane, the propanol ligands freely adopt any orientation with respect to the membrane plane, but that there is strong - and opposite - orientation preferences at the membrane interfaces themselves. The ligand must 'flip' as it enters the bilayer, and  again as it passes through it. If we had not included theta as a progress coordinate, sampling this process would have been much less efficient. It's important to realise that this would not change the (eventually converged) value of the percolation rate, but would significantly increase the time to get to this.