# Trp-cage haMSM Construction and Analysis

For these examples, we'll be constructing an haMSM from simulations of Trp-cage unfolding.

In [1]:
from msm_we import msm_we
import numpy as np
import ray

The simulation data I'm using has already been augmented with pairwise alpha-carbon RMSDs, so we just need to strip out an extra dimension that's present in there.

(That extra dimension is just an artifact of how this data was prepared)

```
R = rms.RMSD(structure,  # universe to align
             structure,  # reference universe or atomgroup
             select='name CA',  # group to superimpose and calculate RMSD
             groupselections=['name CA'],  # groups for RMSD
             ref_frame=0)  # frame index of the reference
R.run()
```

This is just to clean up the logging output for display in the documentation webpage.

In [2]:
msm_we.log.handlers[0]._log_render.show_time = False
msm_we.log.handlers[0].console.width = 65

## Model building

First, let's set some parameters for haMSM building.

In [3]:
h5file_paths = ['data/west.h5']

# Number of MSM microstates to initially put in each stratum/WE bin
clusters_per_stratum = 35

dimreduce_method = 'vamp'

# Boundaries of the basis/target, in progress coordinate space
pcoord_bounds = {
    'basis': [[0, 0.1]],
    'target': [[0.7, 100]]
}

model_name = 'NaCl Sample'

# Reference structure
ref_file = 'data/2JOF.pdb'

# WESTPA resampling time
tau = 1e-9

In [4]:
def processCoordinates(self, coords):
    
    return coords.reshape(coords.shape[0], -1)
    
msm_we.modelWE.processCoordinates = processCoordinates

### With `build_analyze_model`

You can construct an haMSM (and validation haMSMs) with one call to `build_analyze_model()`.

In [5]:
model = msm_we.modelWE()

model.build_analyze_model(
    file_paths=h5file_paths,
    # ref_struct=basis_ref_dict,
    ref_struct=ref_file,
    modelName=model_name,
    basis_pcoord_bounds=pcoord_bounds['basis'],
    target_pcoord_bounds=pcoord_bounds['target'],
    dimreduce_method=dimreduce_method,
    n_clusters=clusters_per_stratum,
    tau=tau
)

Output()

Getting coordSet:   0%|          | 0/100 [00:00<?, ?it/s]

Clustering:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/98 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/98 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/98 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/98 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/50 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/50 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/50 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/50 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/49 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/49 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/49 [00:00<?, ?it/s]

### Step-by-step

`build_analyze_model()` is just a convenient wrapper around the following steps.

You can run them manually if you want to observe each step of the process.

It's helpful to start off by running step-by-step while you're starting analysis for a system, to fine-tune parameters without re-running the entire workflow.

In [6]:
ray.init(ignore_reinit_error=True)

0,1
Python version:,3.9.13
Ray version:,2.0.0
Dashboard:,http://127.0.0.1:8265


In [7]:
model = msm_we.modelWE()

In [9]:
model.initialize(
    fileSpecifier=h5file_paths,
    refPDBfile=ref_file,
    modelName=model_name,
    basis_pcoord_bounds=pcoord_bounds['basis'],
    target_pcoord_bounds=pcoord_bounds['target'],
    dim_reduce_method=dimreduce_method,
    tau=tau
)

In [10]:
model.get_iterations()
model.get_coordSet(last_iter = model.maxIter, streaming=True)

Getting coordSet:   0%|          | 0/100 [00:00<?, ?it/s]

In [11]:
model.dimReduce()

In [13]:
model.cluster_coordinates(
    n_clusters=clusters_per_stratum,
    use_ray=True,
    stratified=True,
    store_validation_model=True # Required for block validation
)

Clustering:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

In [14]:
model.get_fluxMatrix(n_lag=0)

Constructing flux matrix:   0%|          | 0/98 [00:00<?, ?it/s]

In [15]:
model.organize_fluxMatrix()

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

  self.targetRMSD_all = np.array(cluster_pcoord_all)[pcoord_sort_indices]


Constructing flux matrix:   0%|          | 0/98 [00:00<?, ?it/s]

In [16]:
model.get_Tmatrix()

In [17]:
model.get_steady_state()

In [18]:
model.get_steady_state_target_flux()

In [19]:
model.do_block_validation(
    cross_validation_groups=2, 
    cross_validation_blocks=4
)

Submitting fluxmatrix tasks:   0%|          | 0/50 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/50 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/50 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/50 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/49 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/49 [00:00<?, ?it/s]

Submitting discretization tasks:   0%|          | 0/99 [00:00<?, ?it/s]

Retrieving discretized trajectories:   0%|          | 0/99 [00:00<?, ?it/s]

Submitting fluxmatrix tasks:   0%|          | 0/49 [00:00<?, ?it/s]

Retrieving flux matrices:   0%|          | 0/49 [00:00<?, ?it/s]

## Save model

In [20]:
import pickle
with open('data/pickled_model', 'wb') as of:
    pickle.dump(model, of)