# Getting Started with Markov State Modeling

## Introducation
Markov state models (MSMs) are used to analyze and understand the kinetics of protein conformational dynamics based on molecular dynamics (MD) simulations. In this context, MSMs are a type of statistical model that describe the transitions between different conformations (often referred to as states) over time by means of memoryless jumps. MD simulations provide atomistic details about the movement of the protein over time, but the resulting data can be difficult to analyze and interpret, especially when the protein undergoes complex conformational changes. Therefore, MSMs can be used to reduce the complex MD data to a simpler, more manageable representation of the protein's conformational dynamics, which allows understanding of the biological process of interest.

In an MSM, the states are defined as discrete clusters of protein conformations and the transitions between states are modeled as a Markov process. This means that the probability of transition from one state to another depends only on the current state and not on the history of previous states. The resulting MSM can be used to calculate various quantities, such as the rate constants for transitions between states and the equilibrium populations of each state.

Overall, MSMs are a useful tool for studying the kinetics of protein conformational changes in MD simulations, providing a simplified representation of complex protein dynamics that can be used to gain insight into the underlying mechanisms of protein behavior.

## Theoretical Background
Markov state models are mathematical models that describe the transition probabilities between discrete states in a system of discrete time steps. The basic equation for a Markov state model is the following

$$P(x_t = j | x_{t-1} = i) = T_{ij}$$

where $T_{ij}$ is the transition probability from state $i$ to state $j$ at time $t$, and $P(x_t = j | x_{t-1} = i)$ is the probability of the system being in state $j$ at time $t$, given that it was in state $i$ at time $t-1$.

The transition probabilities are usually estimated from time-series data, such as molecular dynamics simulations, and can be organized into a transition matrix $T$

$$T = \{T_{ij}\}$$

where $T_{ij}$ is the $(i, j)$-th element of the matrix. The transition matrix defines the probability of moving from any one state to any other state in a single time step.

The stationary distribution, $\pi$, is a probability distribution over the states that satisfies the following equation

$$\pi T = \pi$$

where the left-hand side is the distribution after one time step and the right-hand side is the distribution at the current time step. The stationary distribution represents the long-term behavior of the system and can be used to calculate various quantities, such as the equilibrium populations of each state.

The following website can be recommended as a good introduction to the topic:
[http://docs.markovmodel.org/](http://docs.markovmodel.org/)

In [7]:
import msmhelper as mh
import numpy as np

# create random trajectory
traj = np.random.randint(4, size=10000)

Create a trajectory instance

In [8]:
traj = mh.StateTraj(traj)

In [9]:
traj

StateTraj([array([1, 3, 0, ..., 3, 2, 3])])

In [10]:
traj.nstates

4

In [11]:
traj.states

array([0, 1, 2, 3])

In [12]:
# estimate Markov state model
traj.estimate_markov_model(2)

(array([[0.2522881 , 0.24990052, 0.25507362, 0.24273776],
        [0.24714679, 0.25462416, 0.24872098, 0.24950807],
        [0.24598716, 0.26083467, 0.24598716, 0.24719101],
        [0.26019576, 0.25081566, 0.24755302, 0.24143556]]),
 array([0, 1, 2, 3]))