# Guide to tHMM

In [1]:
import numpy as np
import scipy.stats as sp

### Synthesizing Cells (not required by the user)

In [2]:
from lineage.CellVar import CellVar as c
from lineage.CellVar import _double

In [3]:
T = np.array([[1.0, 0.0],
              [0.0, 1.0]])
    
parent_state = 1
parent_cell = c(state=parent_state, left=None, right=None, parent=None, gen=1)
left_cell, right_cell = parent_cell._divide(T)

In [4]:
print(left_cell, parent_cell.left)


 Generation: 2, State: 1, Observation: This cell has no observations to report. 
 Generation: 2, State: 1, Observation: This cell has no observations to report.


In [5]:
print(right_cell, parent_cell.right)


 Generation: 2, State: 1, Observation: This cell has no observations to report. 
 Generation: 2, State: 1, Observation: This cell has no observations to report.


## Creating a synthetic lineage (required by the user) "Two State Model"

In [6]:
from lineage.LineageTree import LineageTree
from lineage.StateDistribution import StateDistribution, get_experiment_time

### Creating a lineage and setting the full lineage (unpruned) as the one to be used

The required probabilities are those that define the tree and act of state switching. This process works by first creating a hidden tree of empty cells. Empty cells are those that have their states set but do not have any observations attached to them. We then draw as many observations from each state distribution and assign those observations to those cells. The $\pi$ and $T$ parameters are easy to define. The number of states is $k$. We require for $\pi$ a $k\times 1$ list of probabilities. These probabilities must add up to $1$ and they should be either in a $1$-dimensional list or a $1$-dimensional numpy array. The $T$ parameter should be a square numpy matrix of size $k\times k$. The rows are the states in which we are transitioning from and the columns are the states in which we are transitioning to. Each row of $T$ should sum to $1$. The columns need not sum to $1$.

In [29]:
# pi: the initial probability vector
pi = np.array([0.6, 0.4], dtype="float")

# T: transition probability matrix
T = np.array([[0.85, 0.15],
              [0.15, 0.85]], dtype="float")

The emission matrix $E$ is a little more complicated to define because this is where the user has complete freedom in defining what type of observation they care about. In particular, the user has to first begin with defining what observation he or she will want in their cells in their synthetic images. For example, if one is observing kinematics or physics, they might want to use Gaaussian distribution observations. In defining the random variables, the user will pull from a Gaussian distribution based on the mean and standard deviation of the different states he or she picks. They can also utilize the Gaussian probability distribution to define the likelihood as well. Furthermore, they can build an analytical estimator for their state distributions that yield the parameter estimates when given a list of observations. Finally, the user can also define a prune rule, which is essentially a boolean function that inspects a cell's observations and returns True if the cell's subtree (all the cells that are related to the cell in question and are of older generation) is to be pruned or False if the cell is safe from pruning. In the Gaussian example, a user can remove a cell's subtree if its observation is higher or lower than some fixed value.

We have already built, as an example, and as bioengineers, a model that resembles lineage trees. In our synthetic model, our emissions are multivariate. This first emission is a Bernoulli observation, $0$ implying death and $1$ implying division. The second and third emissions are continuous and are from exponential and gamma distributions respectively. Though these can be thought of cell lifetime's or periods in a certain cell phase, we want the user to know that these values can really mean anything and they are completely free in choosing what the emissions and their values mean. We define ways to calculate random variables for these multivariate observations and likelihoods of an observations. We also provide as a prune rule, keeping with the cell analogy, that if a cell has a $0$ in its Bernoulli observation, then its subtree is pruned from the full lineage tree. Though this will obviously introduce bias into estimation, we keep both the full tree and the pruned tree in the lineage objects, in the case a user would like to see the effects of analyzing on one versus the other.

Ultimately, $E$ is defined as a $k\times 1$ size list of state distribution objects. These distribution objects are rich in what they can already do, and a user can easily add more to their functionality. They only need to be instantiated by what parameters define that state's distribution.

In [30]:
# E: states are defined as StateDistribution objects

# State 0 parameters "Resistant"
state0 = 0
bern_p0 = 0.8
exp_a0 = 20.0

# State 1 parameters "Susciptible"
state1 = 1
bern_p1 = 0.97
exp_a1 = 80.0

state_obj0 = StateDistribution(state0, bern_p0, exp_a0)
state_obj1 = StateDistribution(state1, bern_p1, exp_a1)

E = [state_obj0, state_obj1]

The final required parameters are more obvious. The first is the desired number of cells one would like in their full unpruned lineage tree. This can be any number. The lineage tree is built 'from left to right'. What this means is that, we construct the binary tree by going to the left-most cell, dividing then walking through the generation. For example, if someone requested for

In [31]:
desired_num_cells = 2**7 - 1 
prune_boolean = False # To get the full tree

In [32]:
lineage1 = LineageTree(pi, T, E, desired_num_cells, prune_boolean)
print(lineage1)

This tree is NOT pruned. It is made of 2 states.
 For each state in this tree: 
 	 There are 51 cells of state 0, 
 	 There are 76 cells of state 1.
 This UNpruned tree has 127 cells in total


### Obtaining how long the experiment ran by checking the time length of the longest branch

In [33]:
longest_branch_time = get_experiment_time(lineage1)
print(longest_branch_time)

964.4953626519934


### Estimation of distribution parameters using our estimators for full lineage

In [34]:
for state in range(lineage1.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage1.lineage_stats[state].full_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
                    estimated state State object w/ parameters: 0.7254901960784313, 20.76443587530297.
original parameters given for state State object w/ parameters: 0.8, 20.0.
State 1:
                    estimated state State object w/ parameters: 0.9868421052631579, 77.83219936813123.
original parameters given for state State object w/ parameters: 0.97, 80.0.


### Estimation of distribution parameters using our estimators for pruned lineage

In [35]:
for state in range(lineage1.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage1.lineage_stats[state].pruned_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
                    estimated state State object w/ parameters: 0.7105263157894737, 21.193230930247967.
original parameters given for state State object w/ parameters: 0.8, 20.0.
State 1:
                    estimated state State object w/ parameters: 0.9859154929577465, 76.49496577076829.
original parameters given for state State object w/ parameters: 0.97, 80.0.


### Analyzing our first full lineage

In [36]:
from lineage.Analyze import Analyze

X = [lineage1] # population just contains one lineage
states = [cell.state for cell in lineage1.output_lineage]
deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 2) # find two states

0.0 0 1.0 0.012132726131525877 21.96990752677638 55.46432605879291
0.0 0 1.0 0.010750393446316061 28.679099092660042 55.46432605879291
0.0 0 1.0 0.008226847309023868 43.51799633816882 55.46432605879291
0.0 0 1.0 0.0153512692896922 8.919610856446344 55.46432605879291
0.0 0 1.0 0.006670306776784614 55.15091857377687 55.46432605879291
0.0 0 1.0 0.016948640056166704 3.4292284221028684 55.46432605879291
0.0 0 1.0 0.011855080889470767 23.253901011384464 55.46432605879291
0.0 0 1.0 0.01570830534728672 7.644407518523332 55.46432605879291
0.0 0 1.0 0.015226110909908176 9.37366309429395 55.46432605879291
0.0 0 1.0 0.01696871120058083 3.363584579830037 55.46432605879291
0.0 0 1.0 0.008236106906645202 43.4556044504102 55.46432605879291
0.0 0 1.0 0.011013841909508075 27.336280937262643 55.46432605879291
0.0 0 1.0 0.015242983405804152 9.31223549163801 55.46432605879291
0.0 0 1.0 0.01706534227225735 3.0486299316578807 55.46432605879291
0.0 0 1.0 3.517727535768694e-06 473.7737952086851 55.464326058792

In [38]:
for num, lineageObj in enumerate(X):
    lin_estimated_states = all_states[num]
    lin_true_states = [cell.state for cell in lineageObj.output_lineage]
    total = len(lin_estimated_states)
    assert total == len(lin_true_states)
    counter = [1 if a==b else 0 for (a,b) in zip(lin_estimated_states,lin_true_states)]
    print("Accuracy or 1-Accuracy is {}".format(sum(counter)/total))
    
obs = [cell.obs for cell in lineage1.output_lineage]
for idx, ob in enumerate(obs):
    print(ob, states[idx], all_states[0][idx], states[idx]==all_states[0][idx])

Accuracy or 1-Accuracy is 0.2992125984251969
(1, 18.528758316542298) 1 0 False
(1, 48.871441163534506) 1 0 False
(1, 81.12717077392125) 1 0 False
(1, 15.948461231770478) 1 0 False
(1, 67.60636329047597) 1 0 False
(1, 9.884680743559615) 0 0 True
(1, 243.70741315144363) 1 0 False
(1, 54.68570360009962) 1 0 False
(1, 17.824335828490025) 1 0 False
(1, 3.8801551004703123) 1 0 False
(0, 21.96990752677638) 0 1 False
(1, 10.941516422150162) 0 0 True
(1, 32.245781529296) 0 0 True
(1, 45.657597782967805) 1 0 False
(1, 4.879237467028458) 1 0 False
(1, 0.48198426805003713) 1 0 False
(1, 36.3293484549017) 1 0 False
(1, 4.465050003544926) 1 0 False
(1, 13.484345291617208) 0 0 True
(1, 120.09774676152526) 1 0 False
(1, 75.55383162576523) 1 0 False
(1, 9.059360055686067) 0 0 True
(1, 5.45366097605839) 1 0 False
(1, 28.217904275242248) 1 0 False
(1, 3.5768975481857543) 0 0 True
(1, 67.45048110064144) 1 0 False
(1, 97.09168818792395) 0 0 True
(1, 27.933469858958624) 0 0 True
(1, 34.9909085572096) 0 0 Tr

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [16]:
print(tHMMobj.estimate.pi)

[1. 0.]


In [17]:
print(tHMMobj.estimate.T)

[[0.9    0.1   ]
 [0.9375 0.0625]]


In [18]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

State object w/ parameters: 1.0, 35.04143516267316.
State object w/ parameters: 0.0, 18.653586821519855.


## Trying another lineage, this time pruning branches with ancestors that die

In [19]:
desired_num_cells = 2**12 -1 
prune_boolean = False # To get pruned tree

In [20]:
lineage2 = LineageTree(pi, T, E, desired_num_cells, prune_boolean)
print(lineage2)

This tree is NOT pruned. It is made of 2 states.
 For each state in this tree: 
 	 There are 1934 cells of state 0, 
 	 There are 2161 cells of state 1.
 This UNpruned tree has 4095 cells in total


In [21]:
longest2 = get_experiment_time(lineage2)
print(longest2)

1783.8868780974724


### Estimation of distribution parameters using our estimators for pruned lineage

In [22]:
for state in range(lineage2.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage2.lineage_stats[state].pruned_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
                    estimated state State object w/ parameters: 0.7963163596966414, 18.758691392503607.
original parameters given for state State object w/ parameters: 0.8, 20.0.
State 1:
                    estimated state State object w/ parameters: 0.9680306905370843, 79.23531169417365.
original parameters given for state State object w/ parameters: 0.97, 80.0.


### Analyzing a population of lineages

In [23]:
for num, lineageObj in enumerate(X):
    lin_estimated_states = all_states[num]
    lin_true_states = [cell.state for cell in lineageObj.output_lineage]
    total = len(lin_estimated_states)
    assert total == len(lin_true_states)
    counter = [1 if a==b else 0 for (a,b) in zip(lin_estimated_states,lin_true_states)]
    print("Accuracy or 1-Accuracy is {}".format(sum(counter)/total))

Accuracy or 1-Accuracy is 0.6299212598425197


### Estimated Markov parameters ($\pi$, $T$, $E$)

In [24]:
print(tHMMobj.estimate.pi)

[1. 0.]


In [25]:
print(tHMMobj.estimate.T)

[[0.9    0.1   ]
 [0.9375 0.0625]]


In [26]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

State object w/ parameters: 1.0, 35.04143516267316.
State object w/ parameters: 0.0, 18.653586821519855.


## Creating a synthetic lineage that has three states

Here we generate a lineage with three states, which would be 1) Susciptible 2) Middle State 3) Resistant. The aim here is to show the transition from susciptible to resistant state doesn't happen immediately, and there is a gradual transition which is modeled as a middle state. The point to be considered here is that transition from 1 to 3 or otherwise is not possible so the probability of these transitions are zero, and most likely the initial cells are in susciptible state.

**State 1**: Susceptible

**State 2**: Transition state

**State 3**: Resistant state


In [27]:
# pi: the initial probability vector
pi_3 = np.array([0.5, 0.25, 0.25])

# T: transition probability matrix
T_3 = np.array([[0.65, 0.35, 0.00],
                [0.20, 0.40, 0.40],
                [0.00, 0.10, 0.90]])

In [28]:
# E: states are defined as StateDistribution objects

# State 0 parameters "Susciptible"
state0 = 0
bern_p0 = 0.7
expon_scale_beta0 = 20
gamma_a0 = 5.0
gamma_scale0 = 1.0

# State 1 parameters "Middle state"
state1 = 1
bern_p1 = 0.85
expon_scale_beta1 = 60
gamma_a1 = 10.0
gamma_scale1 = 2.0

# State 2 parameters "Resistant"
state2 = 2
bern_p2 = 0.99
expon_scale_beta2 = 80
gamma_a2 = 15.0
gamma_scale2 = 3.0

state_obj0 = StateDistribution(state0, bern_p0, gamma_a0, gamma_scale0)
state_obj1 = StateDistribution(state1, bern_p1, gamma_a1, gamma_scale1)
state_obj2 = StateDistribution(state2, bern_p2, gamma_a2, gamma_scale2)

E_3 = [state_obj0, state_obj1, state_obj2]

TypeError: __init__() takes 4 positional arguments but 5 were given

In [None]:
desired_num_cells = 2**13 - 1 
prune_boolean = False # To get the full tree

In [None]:
lineage3 = LineageTree(pi_3, T_3, E_3, desired_num_cells, prune_boolean)
print(lineage3)

In [None]:
longest3 = get_experiment_time(lineage3)
print(longest3)

### Estimation of distribution parameters using our estimators for full lineage (3 state)

In [None]:
for state in range(lineage3.num_states):
    print("State {}:".format(state))
    print("estimated state", E_3[state].estimator(lineage3.lineage_stats[state].full_lin_cells_obs))
    print("estimated state", E_3[state].estimator(lineage3.lineage_stats[state].pruned_lin_cells_obs))
    print("true_____ state", E_3[state])

### Analyzing a three state lineage

In [None]:
X = [lineage3] # population just contains one lineage

deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 3) # find three states

In [None]:
for num, lineageObj in enumerate(X):
    lin_estimated_states = all_states[num]
    lin_true_states = [cell.state for cell in lineageObj.output_lineage]
    total = len(lin_estimated_states)
    assert total == len(lin_true_states)
    counter = [1 if a==b else 0 for (a,b) in zip(lin_estimated_states,lin_true_states)]
    print("Accuracy {}".format(sum(counter)/total))

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [None]:
print(tHMMobj.estimate.pi)

In [None]:
print(tHMMobj.estimate.T)

In [None]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
A = sp.gamma.rvs(a=20,scale=4,size=2)
B = sp.gamma.rvs(a=5,scale=1,size=1000)
plt.hist(A)
plt.hist(B)

In [None]:
from lineage.StateDistribution import gamma_estimator 
a, b = gamma_estimator(A)

In [None]:
import scipy.stats as sp
gamma_ll = sp.gamma.pdf(x=A, a=a, scale=b)  # gamma likelihood
print(gamma_ll)

In [None]:
import scipy.stats as sp

exp_ll = sp.expon.pdf(80.12274208215076, 80.87972524126674)
print(exp_ll)
