# Splitting a simulation

Included in this notebook:

* Split a full simulation file into trajectories and the rest

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import openpathsampling as paths
import numpy as np

The optimum way to use storage depends on whether you're doing production or analysis. For analysis, you should open the file as an `AnalysisStorage` object. This makes the analysis much faster.

In [2]:
%%time
storage = paths.AnalysisStorage("mstis.nc")

CPU times: user 7.18 s, sys: 219 ms, total: 7.4 s
Wall time: 7.41 s


In [3]:
storage.reference_by_uuid

True

In [4]:
st_traj = paths.Storage('mstis_traj.nc', 'w')
st_data = paths.Storage('mstis_data.nc', 'w')

In [5]:
st_data.fallback = storage

Store all trajectories completely in the data file

In [6]:
st_data.snapshots.save(storage.snapshots[0])
st_traj.snapshots.save(storage.snapshots[0])

UUID('27f954d4-4817-11e6-9be1-0000000022b9')

Store only shallow trajectories (empty snapshots) in the main file

fix CVs first, rest is fine

In [7]:
q = storage.snapshots.all()
cvs = storage.cvs

fill weak cache from stored cache. This should be fast and we can later
use the weak cache (as long as q exists) to fill the cache of the data file.

In [8]:
%%time
_ = [cv(q) for cv in cvs]

CPU times: user 1.48 s, sys: 25.1 ms, total: 1.51 s
Wall time: 1.5 s


Now that we have cached the CV values we can save the CVs in the new store.
This will also set the disk cache to the new file and since the file is new
this one is empty. 

In [9]:
%%time
# this will also switch the storage cache to the new file
_ = map(st_data.cvs.save, storage.cvs)

CPU times: user 20.1 ms, sys: 58.2 ms, total: 78.2 ms
Wall time: 77.2 ms


if all cvs are really cached we can store snapshots now and the auto-complete will fill
the CV disk store automatically when snapshots are saved. This takes a little while.

In [10]:
%%time
_ = map(st_data.trajectories.mention, storage.trajectories)

CPU times: user 27.7 s, sys: 401 ms, total: 28.1 s
Wall time: 28.2 s


Fill trajectory store only with trajectories and their snapshots. We are using lots of small snapshots and these are slow in comparison to large ones. So this will also take a minute or so.

In [11]:
%%time
_ = map(st_traj.trajectories.save, storage.trajectories)

CPU times: user 1min 41s, sys: 1.08 s, total: 1min 42s
Wall time: 1min 42s


Finally try storing all steps from the simulation. This should contain ALL you need.

In [12]:
%%time
_ = map(st_data.steps.save, storage.steps)

CPU times: user 7.53 s, sys: 131 ms, total: 7.66 s
Wall time: 7.69 s


And compare file sizes

In [13]:
print 'Original file:', storage.file_size_str
print 'Data file:', st_data.file_size_str
print 'Traj file:', st_traj.file_size_str

Original file: 26.26MB
Data file: 15.85MB
Traj file: 8.86MB


now we do the trick and use the small data file instead of the full simulation and see if that works.

In [14]:
st_data.close()
st_traj.close()
storage.close()

In [None]:
st_data.snapshots.only_mention = True