# `H2MM_C` tutorial: simulations

The other side of `H2MM_C` are some functions for simulating data.

The simulation functions of `H2MM_C` are all prefixed with '`sim_`'
Within H<sup>2</sup>MM, there are several "levels" of data:
- data times, treated as fixed points, part of normal data
- states of each data point- can be simulated from model and times
- data indexes, are the observables of H<sup>2</sup>MM, and can be simulated with times, a model, and potentially states

> Note:
> These simulations are pure hidden Markov model simulations.
> These are NOT molecular, fluorescence or other more complicated simulation.

#### Import Modules

As before, we start by importing the basic modules
- `os`: for basic file I/O
- [numpy](https://numpy.org) : the core of nearly all scientific computing in python
- [matplotlib](https://matplotlib.org/) : for plotting results, not needed for H<sup>2</sup>MM analysis
- `H2MM_C`: module for photon by photon hidden Markov modeling analysis


In [1]:
import os
import numpy as np
from matplotlib import pyplot as plt

import H2MM_C as hm

## Basic simulation

The most likely simulation function you will use is the `hm.sim_phtraj_from_times()` function, which takes a `h2mm_model` object and an array of arrival times (equivalent to 1 element of the `times` array given in `hm.EM_H2MM_C()`, and generates a simulated set of states, and the accompanying detector indices (`color`)

So let's make a random distribution of times:

In [2]:
time = np.cumsum(np.random.exponential(100, size=50).astype(int))

Then a model (we'll make a rough approximation of the model from the 3 detector setup from the Optimization Tutorial)

In [3]:
# define the arrays
prior = np.array([0.63, 0.03, 0.19, 0.15])
trans = np.array([[0.9997, 0.0001, 0.0001, 0.0001],
                  [2e-5, 1-3.2e-5, 1e-5, 2e-6],
                  [5e-6, 7e-6, 1-2.2e-5, 1e-5],
                  [3e-6, 3e-6, 4e-5, 1-4.6e-5]])
obs = np.array([[0.62, 0.37, 0.01],
                [0.14, 0.29, 0.57],
                [0.44, 0.09, 0.47],
                [0.84, 0.08, 0.08]])

# make the model
sim_model = hm.h2mm_model(prior, trans, obs)

And finally simulate the data to get a set of arrays that are equivalent to the `color` list of arrays from the same tutorial

In [4]:
simstate, simcolor = hm.sim_phtraj_from_times(sim_model, time)
simstate, simcolor

(array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 3], dtype=uint32),
 array([0, 0, 1, 1, 2, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2,
        0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0,
        0, 0, 0, 2, 0, 2], dtype=uint32))

### Recreating Data: Multiple arrays

Now, if we want to simulate a full data set, instead, you will want to do this in a loop, so that many simulated chuncks of data are made, which mimics how data is inputed into `hm.EM_H2MM_C()`

In [5]:
# initiate arrays
simtimes = list()
simstates = list()
simcolors = list()

# loop to create each set
for _ in range(1000):
    # generate new time array
    simtime = np.cumsum(np.random.exponential(100, size=np.random.randint(50,150)).astype(int))
    # simulate data
    simstate, simcolor = hm.sim_phtraj_from_times(sim_model, simtime)
    # append arrays to lists
    simtimes.append(simtime)
    simstates.append(simstate)
    simcolors.append(simcolor)

### Using Existing Times

Another strategy, which can be a way to check the reasonableness of a model, is to use the actual times of an experiment, and the resulting ideal model, and check if the results look qualitatively similar based on some metric.
> That metric is up to you, and should be based on your knowledge of the best way to characterize your system. This will likely involve some re-creation of ratios of the abudance of different indices

So lets load the times from the 3 detector data, and simulate the data from that

In [6]:
##############################################################
# The code here is just for loading the data

# load the data
# color3 = list() # to save memory, we will not load this
times3 = list()

i = 0
with open('sample_data_3det.txt','r') as f:
    for line in f:
        if i % 2 == 0:
            times3.append(np.array([int(x) for x in line.split()],dtype='Q'))
        # No need to load the color, so comment it out
#         else:
#             color3.append(np.array([int(x) for x in line.split()],dtype='L'))
        i += 1
# End of data loading segment
##############################################################

# initiate arrays
simstates3 = list()
simcolors3 = list()

# loop over each time array and simulate each set
for tm3 in times3:
    # conduct the simulation
    simstate3, simcolor3 = hm.sim_phtraj_from_times(sim_model, tm3)
    # append arrays to lists
    simstates3.append(simstate3)
    simcolors3.append(simcolor3)

## Simulating from Components

It is also possible to simulate first states, and then with a separate function the times.
If you only wan tthe state path, you can just do the first step, and then save the memory, computational time and code complexity.

This is done with the `hm.sim_sparsestatepath()` function, and then the `hm.sim_phtraj_from_state()` function.

Their signatures are similar: `hm.sim_sparsestatepath(model: h2mm_model, times: numpy.ndarray)` and `hm.sim_sparsestatepath(model: h2mm_model, states: numpy.ndarray)`

> Note:
> `hm.sim_sparsestatepath()` uses only the initiall probability and transition probability matrix.
> The emission probability matrix is ignored.
>
>Conversely, `hm.sim_ph_traj_from_state()` only uses the emission probability matrix of the `h2mm_model`

### Simulating State Path



In [7]:
simtime = np.cumsum(np.random.exponential(100, size=50).astype(int))

statepath = hm.sim_sparsestatepath(sim_model, simtime)
statepath

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1], dtype=uint32)

### Simulating Data from States

In [8]:
color = hm.sim_phtraj_from_state(sim_model, statepath)
color

array([2, 1, 1, 2, 0, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2,
       2, 2, 1, 2, 0, 2, 2, 2, 1, 1, 2, 1, 0, 2, 0, 1, 2, 2, 2, 2, 1, 1,
       1, 2, 1, 2, 1, 2], dtype=uint32)

## Setting the Random Seed for Reproducability

Since these simulations are based on random number generators, results will be different each time. However, if repeatability is desired, the seed of the random number generator can be set with the keyword argument seed=int. The same syntax is used across all three simulation functions.

> Note:
>
> The random seed is persistent, so there is no need to set it more than once.
> In fact setting it multiple times will reset the random number counter, producing the exact same
> sequence of random results as when it was first set.

Bellow is an example of a use of his keyword argument:

In [9]:
simpath, simcolor = hm.sim_phtraj_from_times(sim_model, time, seed=100)
simpath, simcolor

(array([0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2], dtype=uint32),
 array([0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 2, 2, 2, 0, 0, 2, 0, 2, 2, 0,
        0, 2, 0, 2, 2, 1, 2, 2, 1, 2, 0, 2, 1, 0, 2, 0, 0, 0, 1, 2, 1, 0,
        1, 0, 1, 2, 2, 2], dtype=uint32))