# Guide to tHMM

In [1]:
import numpy as np
import scipy.stats as sp

### Synthesizing Cells (not required by the user)

In [2]:
from lineage.CellVar import CellVar as c
from lineage.CellVar import _double

In [3]:
T = np.array([[1.0, 0.0],
              [0.0, 1.0]])
    
parent_state = 1
parent_cell = c(state=parent_state, left=None, right=None, parent=None, gen=1)
left_cell, right_cell = parent_cell._divide(T)

In [4]:
print(left_cell, parent_cell.left)


 Generation: 2, State: 1, Observation: This cell has no observations to report. 
 Generation: 2, State: 1, Observation: This cell has no observations to report.


In [5]:
print(right_cell, parent_cell.right)


 Generation: 2, State: 1, Observation: This cell has no observations to report. 
 Generation: 2, State: 1, Observation: This cell has no observations to report.


## Creating a synthetic lineage (required by the user) "Two State Model"

In [6]:
from lineage.LineageTree import LineageTree
from lineage.StateDistribution import StateDistribution, get_experiment_time

### Creating a lineage and setting the full lineage (unpruned) as the one to be used

The required probabilities are those that define the tree and act of state switching. This process works by first creating a hidden tree of empty cells. Empty cells are those that have their states set but do not have any observations attached to them. We then draw as many observations from each state distribution and assign those observations to those cells. The $\pi$ and $T$ parameters are easy to define. The number of states is $k$. We require for $\pi$ a $k\times 1$ list of probabilities. These probabilities must add up to $1$ and they should be either in a $1$-dimensional list or a $1$-dimensional numpy array. The $T$ parameter should be a square numpy matrix of size $k\times k$. The rows are the states in which we are transitioning from and the columns are the states in which we are transitioning to. Each row of $T$ should sum to $1$. The columns need not sum to $1$.

In [7]:
# pi: the initial probability vector
pi = np.array([0.6, 0.4], dtype="float")

# T: transition probability matrix
T = np.array([[0.85, 0.15],
              [0.15, 0.85]])

The emission matrix $E$ is a little more complicated to define because this is where the user has complete freedom in defining what type of observation they care about. In particular, the user has to first begin with defining what observation he or she will want in their cells in their synthetic images. For example, if one is observing kinematics or physics, they might want to use Gaaussian distribution observations. In defining the random variables, the user will pull from a Gaussian distribution based on the mean and standard deviation of the different states he or she picks. They can also utilize the Gaussian probability distribution to define the likelihood as well. Furthermore, they can build an analytical estimator for their state distributions that yield the parameter estimates when given a list of observations. Finally, the user can also define a prune rule, which is essentially a boolean function that inspects a cell's observations and returns True if the cell's subtree (all the cells that are related to the cell in question and are of older generation) is to be pruned or False if the cell is safe from pruning. In the Gaussian example, a user can remove a cell's subtree if its observation is higher or lower than some fixed value.

We have already built, as an example, and as bioengineers, a model that resembles lineage trees. In our synthetic model, our emissions are multivariate. This first emission is a Bernoulli observation, $0$ implying death and $1$ implying division. The second and third emissions are continuous and are from exponential and gamma distributions respectively. Though these can be thought of cell lifetime's or periods in a certain cell phase, we want the user to know that these values can really mean anything and they are completely free in choosing what the emissions and their values mean. We define ways to calculate random variables for these multivariate observations and likelihoods of an observations. We also provide as a prune rule, keeping with the cell analogy, that if a cell has a $0$ in its Bernoulli observation, then its subtree is pruned from the full lineage tree. Though this will obviously introduce bias into estimation, we keep both the full tree and the pruned tree in the lineage objects, in the case a user would like to see the effects of analyzing on one versus the other.

Ultimately, $E$ is defined as a $k\times 1$ size list of state distribution objects. These distribution objects are rich in what they can already do, and a user can easily add more to their functionality. They only need to be instantiated by what parameters define that state's distribution.

In [8]:
# E: states are defined as StateDistribution objects

# State 0 parameters "Resistant"
state0 = 0
bern_p0 = 0.99
gamma_loc = 0
gamma_a0 = 20
gamma_scale0 = 5

# State 1 parameters "Susceptible"
state1 = 1
bern_p1 = 0.88
gamma_a1 = 10
gamma_scale1 = 1

state_obj0 = StateDistribution(state0, bern_p0, gamma_a0, gamma_loc, gamma_scale0)
state_obj1 = StateDistribution(state1, bern_p1, gamma_a1, gamma_loc,  gamma_scale1)

E = [state_obj0, state_obj1]

The final required parameters are more obvious. The first is the desired number of cells one would like in their full unpruned lineage tree. This can be any number. The lineage tree is built 'from left to right'. What this means is that, we construct the binary tree by going to the left-most cell, dividing then walking through the generation. For example, if someone requested for

In [9]:
desired_num_cells = 2**7 - 1 
prune_boolean = False # To get the full tree

In [10]:
lineage1 = LineageTree(pi, T, E, desired_num_cells, prune_boolean)
print(lineage1)

This tree is NOT pruned. It is made of 2 states.
 For each state in this tree: 
 	 There are 96 cells of state 0, 
 	 There are 31 cells of state 1.
 This UNpruned tree has 127 cells in total


### Obtaining how long the experiment ran by checking the time length of the longest branch

In [11]:
longest_branch_time = get_experiment_time(lineage1)
print(longest_branch_time)

887.268375398946


### Estimation of distribution parameters using our estimators for full lineage

In [12]:
for state in range(lineage1.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage1.lineage_stats[state].full_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
the number of gamma observations 96
20.564222362752595 0.0 5.064674502740699
                    estimated state State object w/ parameters: 0.9895833333323133, 20.564222362752595, 0.0, 5.064674502740699.
original parameters given for state State object w/ parameters: 0.99, 20, 0, 5.
State 1:
the number of gamma observations 31
10 0.0 1
                    estimated state State object w/ parameters: 0.8709677419330905, 10, 0.0, 1.
original parameters given for state State object w/ parameters: 0.88, 10, 0, 1.


### Estimation of distribution parameters using our estimators for pruned lineage

In [13]:
for state in range(lineage1.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage1.lineage_stats[state].pruned_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
the number of gamma observations 94
20.205780985649763 0.0 5.147886467228911
                    estimated state State object w/ parameters: 0.9893617021266183, 20.205780985649763, 0.0, 5.147886467228911.
original parameters given for state State object w/ parameters: 0.99, 20, 0, 5.
State 1:
the number of gamma observations 29
10 0.0 1
                    estimated state State object w/ parameters: 0.8620689655147443, 10, 0.0, 1.
original parameters given for state State object w/ parameters: 0.88, 10, 0, 1.


### Analyzing our first full lineage

In [14]:
from lineage.Analyze import Analyze, accuracy

X = [lineage1] # population just contains one lineage
states = [cell.state for cell in lineage1.output_lineage]
print(states)
deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 2) # find two states


[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
the number of gamma observations 127
1.8118898902279723 0.0 44.802696241300936
the number of gamma observations 122
2.0474378461811495 0.0 41.083977499329336
the number of gamma observations 5
10 0.0 1
the number of gamma observations 106
4.1954905224191625 0.0 22.731950499488352
the number of gamma observations 21
10 0.0 1
the number of gamma observations 96
20.564222362752595 0.0 5.064674502740699
the number of gamma observations 31
10 0.0 1
the number of gamma observations 96
20.564222362752595 0.0 5.064674502740699
the number of gamma observations 31
10 0.0 1
the number of gamma observations 96
20.564222362

In [15]:
tHMMobj.estimate.E

[State object w/ parameters: 0.9895833333323133, 20.564222362752595, 0.0, 5.064674502740699.,
 State object w/ parameters: 0.8709677419330905, 10, 0.0, 1.]

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [16]:
print(tHMMobj.estimate.pi)

[1.00000000e+000 2.95176501e-239]


In [17]:
print(tHMMobj.estimate.T)

[[0.92708333 0.07291667]
 [0.2        0.8       ]]


In [18]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

State object w/ parameters: 0.9895833333323133, 20.564222362752595, 0.0, 5.064674502740699.
State object w/ parameters: 0.8709677419330905, 10, 0.0, 1.


## Trying another lineage, this time pruning branches with ancestors that die

In [19]:
desired_num_cells = 2**12 -1 
prune_boolean = True # To get pruned tree

In [20]:
lineage2 = LineageTree(pi, T, E, desired_num_cells, prune_boolean)
print(lineage2)

This tree is pruned. It is made of 2 states.
 For each state in this tree: 
 	 There are 1549 cells of state 0, 
 	 There are 1150 cells of state 1.
 This pruned tree has 2699 cells in total


In [21]:
longest2 = get_experiment_time(lineage2)
print(longest2)

1312.5315997785744


### Estimation of distribution parameters using our estimators for pruned lineage

In [22]:
for state in range(lineage2.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage2.lineage_stats[state].pruned_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
the number of gamma observations 1549
20.75446384435585 0.0 4.78051847402949
                    estimated state State object w/ parameters: 0.9890251775338297, 20.75446384435585, 0.0, 4.78051847402949.
original parameters given for state State object w/ parameters: 0.99, 20, 0, 5.
State 1:
the number of gamma observations 1150
9.73888894189015 0.0 1.0420819329980064
                    estimated state State object w/ parameters: 0.8808695652173251, 9.73888894189015, 0.0, 1.0420819329980064.
original parameters given for state State object w/ parameters: 0.88, 10, 0, 1.


### Analyzing a population of lineages

In [23]:
X = [lineage1, lineage2] # population just contains one lineage

deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 2) # find two states

the number of gamma observations 375
10 0.0 1
the number of gamma observations 2451
1.53327054821461 0.0 45.98839001352823
the number of gamma observations 951
10 0.0 1
the number of gamma observations 1875
3.4211200037348055 0.0 25.90906520072425
the number of gamma observations 1168
10 0.0 1
the number of gamma observations 1658
16.376040246665937 0.0 6.0359753929981474
the number of gamma observations 1181
9.83186499191388 0.0 1.0319202599666517
the number of gamma observations 1645
20.685933334021833 0.0 4.81027554136202
the number of gamma observations 1181
9.83186499191388 0.0 1.0319202599666517
the number of gamma observations 1645
20.685933334021833 0.0 4.81027554136202


In [24]:
from lineage.Analyze import accuracy
accuracy(tHMMobj, all_states)

SWITCHING!


[100.0, 0.48165987402741756]

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [25]:
print(tHMMobj.estimate.pi)

[5.32695096e-103 1.00000000e+000]


In [26]:
print(tHMMobj.estimate.T)

[[0.84044524 0.15955476]
 [0.15061724 0.84938276]]


In [27]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

State object w/ parameters: 0.9890577507598189, 20.685933334021833, 0.0, 4.81027554136202.
State object w/ parameters: 0.8806096528365147, 9.83186499191388, 0.0, 1.0319202599666517.


## Creating a synthetic lineage that has three states

Here we generate a lineage with three states, which would be 1) Susciptible 2) Middle State 3) Resistant. The aim here is to show the transition from susciptible to resistant state doesn't happen immediately, and there is a gradual transition which is modeled as a middle state. The point to be considered here is that transition from 1 to 3 or otherwise is not possible so the probability of these transitions are zero, and most likely the initial cells are in susciptible state.

**State 1**: Susceptible

**State 2**: Transition state

**State 3**: Resistant state


In [28]:
# pi: the initial probability vector
pi_3 = np.array([0.5, 0.25, 0.25])

# T: transition probability matrix
T_3 = np.array([[0.65, 0.35, 0.00],
                [0.20, 0.40, 0.40],
                [0.00, 0.10, 0.90]])

In [29]:
# E: states are defined as StateDistribution objects

# State 0 parameters "Susciptible"
state0 = 0
bern_p0 = 0.7
gamma_a0 = 5.0
gamma_scale0 = 1.0

# State 1 parameters "Middle state"
state1 = 1
bern_p1 = 0.85
gamma_a1 = 10.0
gamma_scale1 = 2.0

# State 2 parameters "Resistant"
state2 = 2
bern_p2 = 0.99
gamma_a2 = 15.0
gamma_scale2 = 3.0

state_obj0 = StateDistribution(state0, bern_p0, gamma_a0, gamma_loc, gamma_scale0)
state_obj1 = StateDistribution(state1, bern_p1, gamma_a1, gamma_loc, gamma_scale1)
state_obj2 = StateDistribution(state2, bern_p2, gamma_a2, gamma_loc, gamma_scale2)

E_3 = [state_obj0, state_obj1, state_obj2]

In [30]:
desired_num_cells = 2**13 - 1 
prune_boolean = False # To get the full tree

In [31]:
lineage3 = LineageTree(pi_3, T_3, E_3, desired_num_cells, prune_boolean)
print(lineage3)

This tree is NOT pruned. It is made of 3 states.
 For each state in this tree: 
 	 There are 1109 cells of state 0, 
 	 There are 1648 cells of state 1, 
 	 There are 5434 cells of state 2.
 This UNpruned tree has 8191 cells in total


In [32]:
longest3 = get_experiment_time(lineage3)
print(longest3)

640.975617599531


### Estimation of distribution parameters using our estimators for full lineage (3 state)

In [33]:
for state in range(lineage3.num_states):
    print("State {}:".format(state))
    print("estimated state", E_3[state].estimator(lineage3.lineage_stats[state].full_lin_cells_obs))
    print("estimated state", E_3[state].estimator(lineage3.lineage_stats[state].pruned_lin_cells_obs))
    print("true_____ state", E_3[state])

State 0:
the number of gamma observations 1109
10 0.0 1
estimated state State object w/ parameters: 0.7060414788097014, 10, 0.0, 1.
the number of gamma observations 283
10 0.0 1
estimated state State object w/ parameters: 0.7279151943461287, 10, 0.0, 1.
true_____ state State object w/ parameters: 0.7, 5.0, 0, 1.0.
State 1:
the number of gamma observations 1648
10.021441545616044 0.0 2.0002980087136915
estimated state State object w/ parameters: 0.8592233009708302, 10.021441545616044, 0.0, 2.0002980087136915.
the number of gamma observations 613
10.82797379224745 0.0 1.867953186440943
estimated state State object w/ parameters: 0.8629690048938458, 10.82797379224745, 0.0, 1.867953186440943.
true_____ state State object w/ parameters: 0.85, 10.0, 0, 2.0.
State 2:
the number of gamma observations 5434
15.056487612451072 0.0 2.986826357671454
estimated state State object w/ parameters: 0.9911667280088152, 15.056487612451072, 0.0, 2.986826357671454.
the number of gamma observations 2865
14.9

### Analyzing a three state lineage

In [34]:
X = [lineage3] # population just contains one lineage

deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 3) # find three states

the number of gamma observations 7645
3.6383702058217473 0.0 10.097477602760973
the number of gamma observations 545
10 0.0 1
the number of gamma observations 1
10 0.0 1
the number of gamma observations 7537
3.2550310280705066 0.0 11.354095916498498
the number of gamma observations 564
10 0.0 1
the number of gamma observations 90
10 0.0 1
the number of gamma observations 7438
3.5183135692306644 0.0 10.620086256850609
the number of gamma observations 591
4.403933749817309 0.0 1.4733354759531592
the number of gamma observations 162
4.718713163907496 0.0 1.3661258657456827
the number of gamma observations 7035
6.126052535234789 0.0 6.418005065331432
the number of gamma observations 811
3.5539953174302688 0.0 1.4869425158813823
the number of gamma observations 345
3.5972131811308077 0.0 1.5448204219011206
the number of gamma observations 6953
6.647240936913428 0.0 5.967948876930966
the number of gamma observations 999
3.393822806990745 0.0 1.6406196041425671
the number of gamma observation

In [35]:
accuracy(tHMMobj, all_states)

SWITCHING!


[16.713465999267488]

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [36]:
print(tHMMobj.estimate.pi)

[5.02717934e-26 1.41355937e-31 1.00000000e+00]


In [37]:
print(tHMMobj.estimate.T)

[[0.90758509 0.01529329 0.07712162]
 [0.07825898 0.68428343 0.23745758]
 [0.42946865 0.29276813 0.27776321]]
