# Running an MSTIS simulation

This notebook gives a run-through of how to set up multiple state transition interface sampling. Here, we will use the toy model engine that is built into OPS. However, you'll see that the process of setting up a path sampling simulation is the same: set up engine, then CVs, then states (and interfaces for TIS), then the sampling network, then the move scheme. After that, you're ready to run your simulation!

This notebook does not include any "YOUR TURN" steps; it can just be run directly.

Tasks covered in this notebook:
* Loading OPS objects from storage
* Ways of assigning initial trajectories to initial samples
* Setting up a path sampling simulation with various move schemes
* Visualizing trajectories while the path sampling is running

In [None]:
from __future__ import print_function
%matplotlib inline
import sys
import openpathsampling as paths
import numpy as np


# toy_plot_helpers.py has conveniences for plotting the 2D contour plots, etc.
%run ./toy_plot_helpers.py

## Setting up the engine

First we set up our system: for the toy dynamics, this involves defining a potential energy surface (PES), setting up an integrator, and giving the simulation an initial configuration. In real MD systems, the PES is handled by the combination of a topology file and a force field definition, and the initial configuration would come from a file instead of being described by hand.

First we need to describe the system we'll be simulating. With biomolecular systems, this is often done with an initial PDB structure and a choice of force field. For the toy model, we need to give a snapshot as a template, as well as a potential energy surface. The template snapshot also includes a pointer to the topology information (which is relatively simple for the toy systems.)

In [None]:
# convenience for the toy dynamics
import openpathsampling.engines.toy as toys

# Toy_PES supports adding/subtracting various PESs. 
# The OuterWalls PES type gives an x^6+y^6 boundary to the system.
pes = (
    toys.OuterWalls(sigma=[1.0, 1.0], x0=[0.0, 0.0])
    + toys.Gaussian(A=-0.7, alpha=[12.0, 12.0], x0=[0.0, 0.4])
    + toys.Gaussian(A=-0.7, alpha=[12.0, 12.0], x0=[-0.5, -0.5]) 
    + toys.Gaussian(A=-0.7, alpha=[12.0, 12.0], x0=[0.5, -0.5])
)

topology=toys.Topology(
    n_spatial=2,
    masses=[1.0, 1.0],
    pes=pes
)

integ = toys.LangevinBAOABIntegrator(dt=0.02, temperature=0.1, gamma=2.5)

options={
    'integ' : integ,
    'n_frames_max' : 5000,
    'n_steps_per_frame' : 1
}

engine = toys.Engine(
    options=options,
    topology=topology
)

Now let's look at the potential energy surface we've created:

In [None]:
plot = ToyPlot()
plot.contour_range = np.arange(-1.5, 1.0, 0.1)
plot.add_pes(pes)
fig = plot.plot()

## Defining collective variables

TIS methods usually require that you define states and interfaces before starting the simulation. State and interfaces are both defined in terms of `Volume` objects. The most common type of `Volume` is one based on some set of collective variables, so the first thing we have to do is to define the collective variable.

For this system, we'll define the collective variables as circles centered on the middle of the state. OPS allows us to define one function for the circle, which is parameterized by different centers. Note that each collective variable is in fact a separate function.

In [None]:
def circle(snapshot, center):
    import math
    return math.sqrt((snapshot.xyz[0][0]-center[0])**2 + (snapshot.xyz[0][1]-center[1])**2)
    
opA = paths.CoordinateFunctionCV(name="opA", f=circle, center=[-0.5, -0.5])
opB = paths.CoordinateFunctionCV(name="opB", f=circle, center=[0.5, -0.5])
opC = paths.CoordinateFunctionCV(name="opC", f=circle, center=[0.0, 0.4])

## Defining states and interfaces

Next we'll use those collective variables to define both states and interfaces. In this example, the innermost interface is the same as the state definition; this does not have to be the case, but when it is not, you should make sure that all frames in the state definition are also inside the innermost interface. The `VolumeInterfaceSet` gives a shortcut to create the full set of volume objects using the same collective variable.

In [None]:
stateA = paths.CVDefinedVolume(collectivevariable=opA, lambda_min=0.0, lambda_max=0.2)
stateB = paths.CVDefinedVolume(collectivevariable=opB, lambda_min=0.0, lambda_max=0.2)
stateC = paths.CVDefinedVolume(collectivevariable=opC, lambda_min=0.0, lambda_max=0.2)

interfacesA = paths.VolumeInterfaceSet(cv=opA, minvals=0.0, maxvals=[0.2, 0.3, 0.4])
interfacesB = paths.VolumeInterfaceSet(cv=opB, minvals=0.0, maxvals=[0.2, 0.3, 0.4])
interfacesC = paths.VolumeInterfaceSet(cv=opC, minvals=0.0, maxvals=[0.2, 0.3, 0.4])

## Build the MSTIS transition network

Once we have the collective variables, states, and interfaces defined, we can create the entire transition network. In this one small piece of code, we create all the path ensembles needed for the simulation, organized into structures to assist with later analysis.

In [None]:
ms_outers = paths.MSOuterTISInterface.from_lambdas(
    {ifaces: 0.5
     for ifaces in [interfacesA, interfacesB, interfacesC]}
)
mstis = paths.MSTISNetwork(
    [(stateA, interfacesA),
     (stateB, interfacesB),
     (stateC, interfacesC)],
    ms_outers=ms_outers
).named('mstis')

## Equilibration: Setting up the move scheme

In the following, we will first do a (very) brief equilibration, and then 

In [None]:
equil_scheme = paths.OneWayShootingMoveScheme(mstis, engine=engine)

## Equilibration: loading the samples

Loading from storage is very easy. Each store is a list. We take the 0th snapshot as a template (it doesn't actually matter which one) for the next storage we'll create. There's only one engine stored, so we take the only one.

In [None]:
# some aspects of storage depend on Python version
if sys.version_info < (3,):
    filename = "./inputs/mstis_bootstrap_py2.nc"
elif (3, 6) <= sys.version_info < (3, 8):
    filename = "./inputs/mstis_bootstrap_py3.nc"
elif (3, 8) <= sys.version_info < (3, 10):
    filename = "./inputs/mstis_bootstrap_py38.nc"
else:
    raise RuntimeError(
        "Uh oh! Looks like we don't have an input file for your Python version: "
        + ".".join(str(x) for x in sys.version_info[:3])
    )

In [None]:
old_store = paths.AnalysisStorage(filename)

In [None]:
template = old_store.snapshots[0]
old_sampleset = old_store.samplesets[0]

In [None]:
sset = equil_scheme.initial_conditions_from_trajectories(
    trajectories=[s.trajectory for s in old_sampleset.samples]
)

In [None]:
print(len(sset))

At this point, we've loaded 9 samples, which is one for each ensemble that plays a role in the TIS sampling, which are the ensembles sampled by the one-way shooting scheme we use for equilibration.

## Equilibration

In molecular dynamics, you need to equilibrate if you don't start with an equilibrium frame (e.g., if you start with solvent molecules on a grid, your system should equilibrate before you start taking statistics). Similarly, if you start with a set of paths which are far from the path ensemble equilibrium, you need to equilibrate. This could either be because your trajectories are not from the real dynamics (generated with metadynamics, high temperature, etc.) or because your trajectories are not representative of the path ensemble (e.g., if you put transition trajectories into all interfaces).

As with MD, running equilibration can be the same process as running the total simulation. However, in path sampling, it doesn't have to be: we can equilibrate without replica exchange moves or path reversal moves, for example. In the example below, we create a `MoveScheme` that only includes shooting movers.

In [None]:
equilibration = paths.PathSampling(
    storage=None,
    sample_set=sset,
    move_scheme=equil_scheme
)

In [None]:
equilibration.run(5)

To continue with the equilibrated results (without having used a storage to save them), we extract the `sample_set` at the end of the calculation.

In [None]:
sset = equilibration.sample_set

## Main simulation

Again we set up a scheme. This time we use `DefaultScheme`, which includes shooting moves, replica exchange, path reversals, minus moves, and shooting in the multiple state outer interface (if one exists). Note that we use the same network, just a different move scheme.

In [None]:
scheme = paths.DefaultScheme(mstis, engine=engine).named("full TIS")

### Setting up additional intial conditions

Unlike the equilibration scheme we used, `DefaultScheme` involves the multiple state outer interface and the minus interfaces. Our current sample set doesn't have them:

In [None]:
print(scheme.initial_conditions_report(sset))

#### Minus interface ensemble

The minus interface ensembles do not yet have a trajectory. We will generate them by starting with same-state trajectories (A-to-A, B-to-B, C-to-C) in each interface, and extending into the minus ensemble.

* check whether the traj is A-to-A
* extend

First we need to make sure that the trajectory in the innermost ensemble of each state also ends in that state. This is necessary so that when we extend the trajectory, it can extends into the minus ensemble.

If the trajectory isn't right, we run a shooting move on it until it is.

In [None]:
# this first part is only really important when not working interactively
# interactively, you can probably find an appropriate trajectory on your own
for transition in mstis.sampling_transitions:
    innermost_ensemble = transition.ensembles[0]
    shooter = None
    if not transition.stateA(sset[innermost_ensemble].trajectory[-1]):
        shooter = paths.OneWayShootingMover(ensemble=innermost_ensemble,
                                            selector=paths.UniformSelector(),
                                            engine=engine)
        pseudoscheme = paths.LockedMoveScheme(root_mover=shooter)
        pseudosim = paths.PathSampling(storage=None, 
                                       move_scheme=pseudoscheme, 
                                       sample_set=sset,
                                      )
    while not transition.stateA(sset[innermost_ensemble].trajectory[-1]):
        pseudosim.run(1)
        sset = pseudosim.sample_set

Now that all the innermost ensembles are safe to use for extending into a minus interface, we extend them into a minus interface:

In [None]:
minus_samples = []
for transition in mstis.sampling_transitions:
    minus_samples.append(transition.minus_ensemble.extend_sample_from_trajectories(
        sset[transition.ensembles[0]].trajectory,
        replica=-len(minus_samples)-1,
        engine=engine
    ))
sset = sset.apply_samples(minus_samples)

In [None]:
print(scheme.initial_conditions_report(sset))

#### Mutliple state outer ensemble (`UnionEnsemble`)

The missing ensemble is the multiple state outer ensemble. As it happens, there's actually a trajectory from the initial file that will satisfy it:

In [None]:
sset = scheme.initial_conditions_from_trajectories(
    trajectories=[s.trajectory for s in old_sampleset.samples],
    sample_set=sset
)

Note the differences between this and when we filled the equilibration list: first, the ensembles that get filled by `MoveScheme.initial_conditions_from_trajectories` depend on which ensembles are used by the move scheme. The network may define ensembles that are unused. Second, here we want to keep most of the sample set unchanged, so we give an initial sample set. In this case, the method only appends new samples.

## Running RETIS

Now we have our initial conditions, so we'll make the storage and run the full calculation. 

Up to here, we haven't been storing any of our results. This time, we'll start a storage object, and we'll save the network we've created. Then we'll run a new `PathSampling` calculation object.

In [None]:
storage = paths.storage.Storage("mstis.nc", "w", template=template)

In [None]:
mstis_calc = paths.PathSampling(
    storage=storage,
    sample_set=sset,
    move_scheme=scheme
)
mstis_calc.save_frequency = 50

The next block sets up a live visualization. This is optional, and only recommended if you're using OPS interactively (which would only be for very small systems). Some of the same tools can be used to play back the behavior after the fact if you want to see the behavior for more complicated systems. You can create a background (here we use the PES contours), and the visualization will plot the trajectories.

In [None]:
xval = paths.FunctionCV("xval", lambda snap : snap.xyz[0][0])
yval = paths.FunctionCV("yval", lambda snap : snap.xyz[0][1])
mstis_calc.live_visualizer = paths.StepVisualizer2D(
    network=mstis, 
    cv_x=xval, 
    cv_y=yval,
    xlim=[-1.0, 1.0],
    ylim=[-1.0, 1.0]
)
background = ToyPlot()
background.contour_range = np.arange(-1.5, 1.0, 0.1)
background.add_pes(engine.pes)
mstis_calc.live_visualizer.background = background.plot()
mstis_calc.status_update_frequency = 1 # increasing this number speeds things up, but isn't as pretty

The next question is, how many steps do we want to run? You can just choose an arbitrary number, but often we want to think in terms of how many shooting moves per ensemble. Assuming each ensemble has the same probability of doing a shooting move (as is the case in the `DefaultScheme`) we can select an arbitrary shooting mover as `scheme.movers['shooting'][0]`. Let's say we want to run enough moves that, on average, each shooting mover would do 10 steps. (In practice, this number would be more like 1000).

In [None]:
n_steps = scheme.n_steps_for_trials(scheme.movers['shooting'][0], 10)
print(n_steps)

Another important consideration is how many times other movers will be called. For example, we wouldn't want to have any mover completely left out because the probability of running it is too low. In the `DefaultScheme`, the minus movers are the least likely to run. So we check how many trials we expect for an arbitrary minus mover (all have same probability):

In [None]:
print(scheme.n_trials_for_steps(mover=scheme.movers['minus'][0], n_steps=n_steps))

Now everything is ready: let's run the simulation!

Note that the `n_steps` defined above is a `float`, so we need to turn it into an `int` first:

In [None]:
mstis_calc.run(int(n_steps))

In [None]:
storage.close()