# __tHMM__
#### Authors: Shakthi Visagan, Farnaz Mohammadi, Nikan Namiri, and Aaron Meyer, PhD

In [8]:
import numpy as np
import scipy.stats as sp
import matplotlib.pyplot as plt
%matplotlib inline

### Synthesizing Cells (not required by the user)

In [2]:
from lineage.CellVar import CellVar as c
from lineage.CellVar import _double

In [3]:
T = np.array([[1.0, 0.0],
              [0.0, 1.0]])
    
parent_state = 1
parent_cell = c(state=parent_state, left=None, right=None, parent=None, gen=1)
left_cell, right_cell = parent_cell._divide(T)

In [4]:
print(left_cell, parent_cell.left)


 Generation: 2, State: 1, Observation: This cell has no observations to report. 
 Generation: 2, State: 1, Observation: This cell has no observations to report.


In [5]:
print(right_cell, parent_cell.right)


 Generation: 2, State: 1, Observation: This cell has no observations to report. 
 Generation: 2, State: 1, Observation: This cell has no observations to report.


## Creating a synthetic lineage (required by the user) "Heterogeneous Two-State Model"

In [6]:
from lineage.LineageTree import LineageTree
from lineage.StateDistribution import StateDistribution, get_experiment_time

### Creating an unpruned two-state lineage

#### Defining the $\pi$ initial probability vector and $T$ stochastic transition rate matrix

The required probabilities are those that define the tree and act of state switching. This process works by first creating a hidden tree of empty cells. Empty cells are those that have their states set but do not have any observations attached to them. We then draw as many observations from each state distribution and assign those observations to those cells. The $\pi$ and $T$ parameters are easy to define. The number of states is $k$. We require for $\pi$ a $k\times 1$ list of probabilities. These probabilities must add up to $1$ and they should be either in a $1$-dimensional list or a $1$-dimensional numpy array. The $T$ parameter should be a square numpy matrix of size $k\times k$. The rows are the states in which we are transitioning from and the columns are the states in which we are transitioning to. Each row of $T$ should sum to $1$. The columns need not sum to $1$.

In [7]:
# pi: the initial probability vector
pi = np.array([0.6, 0.4], dtype="float")

# T: transition probability matrix
T = np.array([[0.75, 0.25],
              [0.15, 0.85]], dtype="float")

#### Defining the $E$ emissions matrix using state distributions

The emission matrix $E$ is a little more complicated to define because this is where the user has complete freedom in defining what type of observation they care about. In particular, the user has to first begin with defining what physical observation she will want extract from images of their cells, or test on synthetically created lineages. For example, if one is observing kinematics or physics, they might want to use the Gaussian distribution parameterized by a mean and covaraince to model their observations (velocity, acceleration, etc.). 

Ultimately, the user needs to provide three things based on the phenotype they wish to observe, model, and predict:

1. a probability distribution function: a function that returns a likelihood when given a sample and parameters describing the distribution
2. a random variable: a function that returns samples from the distribution when given parameters describing the distribution
3. a estimator: a function that returns parameters that describe a distribution when given samples 

An optional boolean function can be provided to "prune" cells based on the observation. In our example, cells with a Bernoulli observation of $0$, which implies that the cell died, are pruned from the tree.

We have already built, as an example, and as bioengineers, a model that resembles lineage trees. In our synthetic model, our emissions are multivariate. This first emission is a Bernoulli observation, $0$ implying death and $1$ implying division. The second emission is continuous and are gamma distributed. Though these can be thought of cell lifetime's or periods in a certain cell phase, we want the user to know that these values can really mean anything and they are completely free in choosing what the emissions and their values mean. We provide, as mentioned above, a probability distribution function that takes in as input multivariate samples, a Bernoulli rate parameter, and three parameters that define the gamma distribution, and returns a likelihood. We also define a random variable that takes in a Bernoulli parameter and three gamma parameters and returns multivariate samples. We also define estimators for these observations as well. Finally, we also define a prune rule, as explained previously.


Ultimately, $E$ is defined as a $k\times 1$ size list of state distribution objects. These distribution objects are rich in what they can already do, and a user can easily add more to their functionality. They only need to be instantiated by what parameters define that state's distribution.

In [8]:
# E: states are defined as StateDistribution objects

# State 0 parameters "Resistant"
state0 = 0
bern_p0 = 0.99
gamma_a0 = 20
gamma_loc = 0
gamma_scale0 = 5

# State 1 parameters "Susceptible"
state1 = 1
bern_p1 = 0.88
gamma_a1 = 10
gamma_scale1 = 1

state_obj0 = StateDistribution(state0, bern_p0, gamma_a0, gamma_loc, gamma_scale0)
state_obj1 = StateDistribution(state1, bern_p1, gamma_a1, gamma_loc,  gamma_scale1)

E = [state_obj0, state_obj1]

The final required parameters are more obvious. The first is the desired number of cells one would like in their full unpruned lineage tree. This can be any number. The lineage tree is built 'from left to right'. What this means is that, we construct the binary tree by going to the left-most cell, dividing then walking through the generation. For example, if someone requested for

In [9]:
desired_num_cells = 2**7 - 1 
prune_boolean = False # To get the full tree

In [10]:
lineage1 = LineageTree(pi, T, E, desired_num_cells, prune_boolean)
print(lineage1)

This tree is pruned. It is made of 2 states.
 For each state in this tree: 
 	 There are 1 cells of state 0, 
 	 There are 0 cells of state 1.
 This pruned tree has 1 many cells in total


### Obtaining how long the experiment ran by checking the time length of the longest branch

In [11]:
longest_branch_time = get_experiment_time(lineage1)
print(longest_branch_time)

108.17456674467809


### Estimation of distribution parameters using our estimators for full lineage

In [12]:
for state in range(lineage1.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage1.lineage_stats[state].full_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
                    estimated state State object w/ parameters: 0.9999999999991072, 23.610033857076065, 0.0, 4.102388197224048.
original parameters given for state State object w/ parameters: 0.99, 20, 0, 5.
State 1:
                    estimated state State object w/ parameters: 0.8666666666617778, 6.5459201743141175, 0.0, 2.370165086328703.
original parameters given for state State object w/ parameters: 0.88, 15, 0, 1.


### Estimation of distribution parameters using our estimators for pruned lineage

In [13]:
for state in range(lineage1.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage1.lineage_stats[state].pruned_lin_cells_obs))
    print("original parameters given for state", E[state])

State 0:
                    estimated state State object w/ parameters: 0.9999999999, 10, 0.0, 1.
original parameters given for state State object w/ parameters: 0.99, 20, 0, 5.
State 1:
                    estimated state State object w/ parameters: 0.5, 10, 0, 1.
original parameters given for state State object w/ parameters: 0.88, 15, 0, 1.


### Analyzing our first full lineage

In [14]:
from lineage.Analyze import Analyze, accuracy

X = [lineage1] # population just contains one lineage
states = [cell.state for cell in lineage1.output_lineage]
print(states)
deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 2) # find two states


[0]
0.0
[[0.45620071 0.54379929]]


ZeroDivisionError: float division by zero

In [None]:
tHMMobj.estimate.E

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [None]:
print(tHMMobj.estimate.pi)

In [None]:
print(tHMMobj.estimate.T)

In [None]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

## Trying another lineage, this time pruning branches with ancestors that die

In [None]:
desired_num_cells = 2**12 -1 
prune_boolean = True # To get pruned tree

In [None]:
lineage2 = LineageTree(pi, T, E, desired_num_cells, prune_boolean)
print(lineage2)

In [None]:
longest2 = get_experiment_time(lineage2)
print(longest2)

### Estimation of distribution parameters using our estimators for pruned lineage

In [None]:
for state in range(lineage2.num_states):
    print("State {}:".format(state))
    print("                    estimated state", E[state].estimator(lineage2.lineage_stats[state].pruned_lin_cells_obs))
    print("original parameters given for state", E[state])

### Analyzing a population of lineages

In [None]:
X = [lineage1, lineage2] # population just contains one lineage

deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 2) # find two states

In [None]:
from lineage.Analyze import accuracy
accuracy(tHMMobj, all_states)

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [None]:
print(tHMMobj.estimate.pi)

In [None]:
print(tHMMobj.estimate.T)

In [None]:
for state in range(tHMMobj.numStates):
    print(tHMMobj.estimate.E[state])

## Creating a synthetic lineage that has three states

Here we generate a lineage with three states, which would be 1) Susciptible 2) Middle State 3) Resistant. The aim here is to show the transition from susciptible to resistant state doesn't happen immediately, and there is a gradual transition which is modeled as a middle state. The point to be considered here is that transition from 1 to 3 or otherwise is not possible so the probability of these transitions are zero, and most likely the initial cells are in susciptible state.

**State 1**: Susceptible

**State 2**: Transition state

**State 3**: Resistant state


In [None]:
# pi: the initial probability vector
pi_3 = np.array([0.5, 0.25, 0.25])

# T: transition probability matrix
T_3 = np.array([[0.65, 0.35, 0.00],
                [0.20, 0.40, 0.40],
                [0.00, 0.10, 0.90]])

In [None]:
# E: states are defined as StateDistribution objects

# State 0 parameters "Susciptible"
state0 = 0
bern_p0 = 0.7
gamma_a0 = 5.0
gamma_scale0 = 1.0

# State 1 parameters "Middle state"
state1 = 1
bern_p1 = 0.85
gamma_a1 = 10.0
gamma_scale1 = 2.0

# State 2 parameters "Resistant"
state2 = 2
bern_p2 = 0.99
gamma_a2 = 15.0
gamma_scale2 = 3.0

state_obj0 = StateDistribution(state0, bern_p0, gamma_a0, gamma_loc, gamma_scale0)
state_obj1 = StateDistribution(state1, bern_p1, gamma_a1, gamma_loc, gamma_scale1)
state_obj2 = StateDistribution(state2, bern_p2, gamma_a2, gamma_loc, gamma_scale2)

E_3 = [state_obj0, state_obj1, state_obj2]

In [None]:
desired_num_cells = 2**13 - 1 
prune_boolean = False # To get the full tree

In [None]:
lineage3 = LineageTree(pi_3, T_3, E_3, desired_num_cells, prune_boolean)
print(lineage3)

In [None]:
longest3 = get_experiment_time(lineage3)
print(longest3)

### Estimation of distribution parameters using our estimators for full lineage (3 state)

In [None]:
for state in range(lineage3.num_states):
    print("State {}:".format(state))
    print("estimated state", E_3[state].estimator(lineage3.lineage_stats[state].full_lin_cells_obs))
    print("estimated state", E_3[state].estimator(lineage3.lineage_stats[state].pruned_lin_cells_obs))
    print("true_____ state", E_3[state])

### Analyzing a three state lineage

In [None]:
X = [lineage3] # population just contains one lineage

deltas, state_ptrs, all_states, tHMMobj, NF, LL = Analyze(X, 3) # find three states

In [None]:
accuracy(tHMMobj, all_states)

### Estimated Markov parameters ($\pi$, $T$, $E$)

In [None]:
print(tHMMobj.estimate.pi)

In [None]:
print(tHMMobj.estimate.T)