# How to sample data from a model?

One of the things that we can do with each of the three implementations of the HMM of this library (Gaussian, Discrete and Heterogeneous) is to sample data from a model.

To do so, the function *"def sample(self, n_sequences=1, n_samples=1)"* from the *"_BaseHMM"* class (which is the parent class for the three implementations) can be used. This function uses the function *"def _generate_sample_from_state(self, state)"* which is specific for each of the three HMM available models to sample data from a given model.

To illustrate how to use this we are going to use one example in which we are going to define a model with certain parameters (we are not even going to train a model, we will just define the probabilities manually to illustrate how we can generate observations given the emission probabilities, tranisition probabilities, etc that we desire to have).

First, we load some packages:

In [5]:
 # path settings
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import numpy as np
from lib.heterogeneoushmm.heterogeneous import HeterogeneousHMM
from lib.heterogeneoushmm.utils import normalise

As an example, we suppose we want to sample data from a 2 states HeterogeneousHMM with 2 features whose emission probabilities are managed by gaussians distributions and 2 discrete features, being both of them binary. We define these parameters as follows:

In [6]:
n_states=3
n_g_emissions = 2
n_d_emissions = 2
n_d_features = [2, 2]

We can now define the model with the previous parameters:

In [3]:
hhmm = HeterogeneousHMM(
    n_states = n_states,
    n_g_emissions = n_g_emissions,
    n_d_emissions = n_d_emissions,
    n_d_features = n_d_features
)

Now we are going to "customize" the model we want to sample from. In this case, we will use random values for the initial probabilities of each of the states, for the transitions matrix, for the means and covars of the gaussians features and for the emission probabilities of the discrete feature. Anycase, likewise we use a random initialization here, you can just set these parameters to yours desired values.

In [7]:
covariance_type = "diagonal"
mincv = 0.1

prng = np.random.RandomState(10)
pi = prng.rand(n_states)
pi = pi / pi.sum()
A = prng.rand(n_states, n_states)
A /= np.tile(A.sum(axis=1)[:, np.newaxis], (1, n_states))
B = np.asarray(
    [
        np.random.random((n_states, n_d_features[i]))
        for i in range(n_d_emissions)
    ]
)
for i in range(n_d_emissions):
    normalise(B[i], axis=1)
means = prng.randint(-20, 20, (n_states, n_g_emissions))
covars = (mincv + mincv * prng.random_sample((n_states, n_g_emissions))) ** 2

Now that we have already set the values (random for these example) that we want our model to have, we have to set the HeterogeneousHMM model's parameters to those values:

In [8]:
hhmm.pi = pi
hhmm.A = A
hhmm.B = B
# Make sure the means are far apart so posteriors.argmax()
# n_emissionscks the actual component used to generate the observations.
hhmm.means = 20 * means
hhmm.covars = np.maximum(covars, 0.1)

Now we already have the model with the desired parameters, to generate some sequences from that model we just have to do:

In [9]:
n_sequences=5
n_samples=1000 
X, state_sequences = hhmm.sample(n_sequences=n_sequences, n_samples=n_samples, return_states=True)

In [10]:
np.shape(X[0])

(1000, 4)

The function "sample" returns:


- A list containing the generated n_sequences (5 in our case that have been saved in "X", that is a list of 5 elements each of then has dimensions 1000x4 [n_samples x number of features])
- A list containing the state sequences that generated each sample sequence.

**NOTE**: in this case we set the parameters to random values but they should obviously set to the values you want to use for sampling. For example, in your model has 3 states and you have 2 discrete features, being both of them binary, you could set the emission probability matrix of these discrete features to the values you desire doing:

In [11]:
hhmm.B = [np.array([[0.2, 0.8], [0.2, 0.8], [0.8, 0.2]]),
          np.array([[0.8, 0.2], [0.8, 0.2], [0.2, 0.8]])]

Other option is to used a pre-trainned model to generate data, to do so, once you have a trainned model (Gaussian, Discrete or Heterogeneous) it will be enough to do:"X, state_sequences = hhmm.sample(n_sequences=n_sequences, n_samples=n_samples)" to generate new samples