# (Embarrassingly) Parallel TIS vs. RETIS: Analysis

Now that you've run your embarrassingly parallel simulation and your RETIS simulation, let's compare the results. We'll check if the simulations seem consistent by looking at the acceptance of the shooting moves in each ensemble (you could also compare path length distributions or even do per-interface path density plots to give a more thorough check.) Then we'll look at the crossing probabilities to see if they seem to be converged (and perhaps whether one seems closer to converged than the other).

In [None]:
import matplotlib.pyplot as plt
import openpathsampling as paths
from openpathsampling.analysis import tis

We'll start out by loading the output files we generated, and from each file we load in the move scheme that it used.

In [None]:
storage_0 = paths.Storage('scheme_0.nc', mode='r')
storage_1 = paths.Storage('scheme_1.nc', mode='r')
storage_2 = paths.Storage('scheme_2.nc', mode='r')
storage_3 = paths.Storage('scheme_3.nc', mode='r')
storage_4 = paths.Storage('scheme_4.nc', mode='r')
storage_retis = paths.Storage('retis.nc', mode='r')

In [None]:
scheme_0 = storage_0.schemes['scheme_0']
scheme_1 = storage_1.schemes['scheme_1']
scheme_2 = storage_2.schemes['scheme_2']
scheme_3 = storage_3.schemes['scheme_3']
scheme_4 = storage_4.schemes['scheme_4']
scheme_retis = storage_retis.schemes['retis']

## Comparing the acceptance rates

Recall that we can get per-mover acceptance by passing the appropriate string to the `movers` keyword of the `move_summary`:

In [None]:
scheme_retis.move_summary(storage_retis.steps)

In [None]:
scheme_retis.move_summary(movers='shooting')

In [None]:
scheme_retis.move_summary(movers='repex')

In [None]:
scheme_0.move_summary(storage_0.steps)

In [None]:
scheme_1.move_summary(storage_1.steps)

In [None]:
scheme_2.move_summary(storage_2.steps)

In [None]:
scheme_3.move_summary(storage_3.steps)

In [None]:
scheme_4.move_summary(storage_4.steps)

* Is there a major difference in the shooting move acceptance for any ensemble? Would you expect there to be one?

## TIS analysis (crossing probabilities, etc.)

We don't have the flux here, so we can't calculate the actual rates. However, we can create a fake flux that says that the flux through the out of state $A$ and through the innermost interface is `1.0`. This allows us to use the rest of the `StandardTISAnalysis` object. It just means that the rate that gets reported is actually the total transition probability.

You can get the actual flux either from including a minus interface move in your TIS simulation, or from using direct MD. The `paths.TrajectoryTransitionAnalysis` class will analyze existing MD trajectories, or the `paths.DirectSimulation` class can run MD and analyze the flux on the fly.

In [None]:
# because we used a setup file, the netword/state/interface are the same in both
network = storage_retis.networks[0]
state_A = storage_retis.volumes['A']
state_B = storage_retis.volumes['B']
interface_0 = network.sampling_transitions[0].interfaces[0]
fake_flux = tis.DictFlux({(state_A, interface_0): 1.0})

Finally, we assemble the `StandardTISAnalysis` and perform the analysis:

In [None]:
%%time
retis_analysis = tis.StandardTISAnalysis(
    network=network,
    flux_method=fake_flux,
    max_lambda_calcs={t: {'bin_width': 0.025, 'bin_range': (-0.6, 0.6)}
                      for t in network.sampling_transitions},
    combiners={t.interfaces: paths.numerics.WHAM(cutoff=0.01,  # lower cutoff, default is 0.05
                                                 interfaces=t.interfaces.lambdas)
               for t in network.sampling_transitions},
    steps=storage_retis.steps
)

Currently, the parallel analysis needs an extra step to run correctly. We need to create the `weighted_trajectories` object from the steps, and then perform the overall analysis using that as the input, instead of the steps themselves.

In [None]:
# currently we need to manually join the weighted trajectories from each storage
# Future versions of OPS will simplify this
weighted_trajectories = {}
storages = [storage_0, storage_1, storage_2, storage_3, storage_4]
for storage, ensemble in zip(storages, network.sampling_ensembles):
    weighted_trajectories.update(
        tis.core.steps_to_weighted_trajectories(storage.steps, [ensemble])
    )

In [None]:
#state_A = storage_0.volumes['A']
#interface_0 = network.sampling_transitions[0].interfaces[0]
#fake_flux = tis.DictFlux({(state_A, interface_0): 1.0})
parallel_analysis = tis.StandardTISAnalysis(
    network=network,
    flux_method=fake_flux,
    max_lambda_calcs={t: {'bin_width': 0.025, 'bin_range': (-0.6, 0.6)}
                      for t in network.sampling_transitions},
    combiners={t.interfaces: paths.numerics.WHAM(cutoff=0.01,  # c cutoff, default is 0.05
                                                 interfaces=t.interfaces.lambdas)
               for t in network.sampling_transitions}
)
parallel_analysis.results['flux'] = fake_flux.calculate('foo')
parallel_analysis.results = parallel_analysis.from_weighted_trajectories(weighted_trajectories)

### Plotting the crossing probabilities

One of the spot-checks to see if your simulation is converged is to plot the crossing probabilities functions. For each ensemble, the `StandardTISAnalysis` calculates a crossing probability along the order parameter, defined as the fraction of paths in that ensemble that reach at least the given value on the $x$ axis. As such, the crossing probability is always 1 for values less than the cutoff for the interface. Additionally, two ensemble crossing probabilities should never cross; the one from an outer interface should always be higher at a given value of the order parameter than one from an inner interface.

There is also the *total* crossing probability, which is generated by using a histogram combining algorithm (usually WHAM) to combine the individual ensemble crossing probabilities into a good estimate for the true crossing probability (from the innermost interface). Like all crossing probabilities, this should be monotonically decreasing; if it is not, that is a sign of insufficient sampling.

Since the y-axis is probability, and we're looking at rare events, we frequently plot crossing probabilities on a semi-log plot.

Note: In some cases, you may get an error here about a lack of overlap between ensembles. Normally, this can be solved by continuing the simulation longer. For the purposes of this tutorial, you can probably just re-run the sampling stage and then re-run this notebook.

In [None]:
for ensemble in network.transitions[(state_A, state_B)].ensembles:
    crossing = retis_analysis.crossing_probability(ensemble)
    label = "Interface at $x$={:3.2f}".format(ensemble.lambda_i)
    plt.plot(crossing.x, crossing, label=label)

tcp_AB = retis_analysis.total_crossing_probability[(state_A, state_B)]
plt.plot(tcp_AB.x, tcp_AB, lw=2, color='k', label="Total crossing probability")
plt.legend(loc='upper right')
plt.yscale('log')
plt.xlabel('$x$')
plt.ylabel('Crossing probability')
plt.title("RETIS");

In [None]:
for ensemble in network.transitions[(state_A, state_B)].ensembles:
    crossing = parallel_analysis.crossing_probability(ensemble)
    label = "Interface at x={:3.2f}".format(ensemble.lambda_i)
    plt.plot(crossing.x, crossing, label=label)

tcp_AB = parallel_analysis.total_crossing_probability[(state_A, state_B)]
plt.plot(tcp_AB.x, tcp_AB, lw=2, color='k', label="Total crossing probability")
plt.legend(loc='upper right')
plt.yscale('log')
plt.xlabel('$x$')
plt.ylabel('Crossing probability')
plt.title('Parallel TIS');

* Are your total crossing probabilities monotonically decreasing?
* Do your individual ensemble crossing probabilities cross each other?
* Based on the crossing probability plots, which approach seems more converged?

OPS keeps track of how long each sampling step took. We can look at that to get a rough comparison of the sampokling times (note that other aspects of the computing environment, such as other processes running at the same time, may have a significant effect on the timings here).

In [None]:
def calculate_total_time(storage):
    # step 0 (init conds) doesn't have timing data; all the others do
    return sum(step.change.details.timing for step in storage.steps[1:])

In [None]:
%%time
calculate_total_time(storage_retis)

In [None]:
total_times = [calculate_total_time(storage)
               for storage in [storage_0, storage_1, storage_2, storage_3, storage_4]]
print(sum(total_times), total_times)

* The times reported here do not include the cost of storing results to disk. How do they compare to the time it actually took to run the simulation? Is storing to disk a significant overhead for simulations of toy models?
* The RETIS simulation includes replica exchange moves and path reversals, as well as the shooting moves used by both approaches. Do replica exchange and path reversal contribute significantly to the total simulation time?